• Open

    Neuro-symbolic computing with spiking neural networks. (arXiv:2208.02576v1 [cs.NE])
    Knowledge graphs are an expressive and widely used data structure due to their ability to integrate data from different domains in a sensible and machine-readable way. Thus, they can be used to model a variety of systems such as molecules and social networks. However, it still remains an open question how symbolic reasoning could be realized in spiking systems and, therefore, how spiking neural networks could be applied to such graph data. Here, we extend previous work on spike-based graph algorithms by demonstrating how symbolic and multi-relational information can be encoded using spiking neurons, allowing reasoning over symbolic structures like knowledge graphs with spiking neural networks. The introduced framework is enabled by combining the graph embedding paradigm and the recent progress in training spiking neural networks using error backpropagation. The presented methods are applicable to a variety of spiking neuron models and can be trained end-to-end in combination with other differentiable network architectures, which we demonstrate by implementing a spiking relational graph neural network.  ( 2 min )
    Glance and Focus Networks for Dynamic Visual Recognition. (arXiv:2201.03014v2 [cs.CV] UPDATED)
    Spatial redundancy widely exists in visual recognition tasks, i.e., discriminative features in an image or video frame usually correspond to only a subset of pixels, while the remaining regions are irrelevant to the task at hand. Therefore, static models which process all the pixels with an equal amount of computation result in considerable redundancy in terms of time and space consumption. In this paper, we formulate the image recognition problem as a sequential coarse-to-fine feature learning process, mimicking the human visual system. Specifically, the proposed Glance and Focus Network (GFNet) first extracts a quick global representation of the input image at a low resolution scale, and then strategically attends to a series of salient (small) regions to learn finer features. The sequential process naturally facilitates adaptive inference at test time, as it can be terminated once the model is sufficiently confident about its prediction, avoiding further redundant computation. It is worth noting that the problem of locating discriminant regions in our model is formulated as a reinforcement learning task, thus requiring no additional manual annotations other than classification labels. GFNet is general and flexible as it is compatible with any off-the-shelf backbone models (such as MobileNets, EfficientNets and TSM), which can be conveniently deployed as the feature extractor. Extensive experiments on a variety of image classification and video recognition tasks and with various backbone models demonstrate the remarkable efficiency of our method. For example, it reduces the average latency of the highly efficient MobileNet-V3 on an iPhone XS Max by 1.3x without sacrificing accuracy. Code and pre-trained models are available at https://github.com/blackfeather-wang/GFNet-Pytorch.  ( 3 min )
    Generalization Analysis of Message Passing Neural Networks on Large Random Graphs. (arXiv:2202.00645v6 [cs.LG] UPDATED)
    Message passing neural networks (MPNN) have seen a steep rise in popularity since their introduction as generalizations of convolutional neural networks to graph-structured data, and are now considered state-of-the-art tools for solving a large variety of graph-focused problems. We study the generalization error of MPNNs in graph classification and regression. We assume that graphs of different classes are sampled from different random graph models. We show that, when training a MPNN on a dataset sampled from such a distribution, the generalization gap increases in the complexity of the MPNN, and decreases, not only with respect to the number of training samples, but also with the average number of nodes in the graphs. This shows how a MPNN with high complexity can generalize from a small dataset of graphs, as long as the graphs are large. The generalization bound is derived from a uniform convergence result, that shows that any MPNN, applied on a graph, approximates the MPNN applied on the geometric model that the graph discretizes.  ( 3 min )
    DoubleML -- An Object-Oriented Implementation of Double Machine Learning in R. (arXiv:2103.09603v3 [stat.ML] UPDATED)
    The R package DoubleML implements the double/debiased machine learning framework of Chernozhukov et al. (2018). It provides functionalities to estimate parameters in causal models based on machine learning methods. The double machine learning framework consist of three key ingredients: Neyman orthogonality, high-quality machine learning estimation and sample splitting. Estimation of nuisance components can be performed by various state-of-the-art machine learning methods that are available in the mlr3 ecosystem. DoubleML makes it possible to perform inference in a variety of causal models, including partially linear and interactive regression models and their extensions to instrumental variable estimation. The object-oriented implementation of DoubleML enables a high flexibility for the model specification and makes it easily extendable. This paper serves as an introduction to the double machine learning framework and the R package DoubleML. In reproducible code examples with simulated and real data sets, we demonstrate how DoubleML users can perform valid inference based on machine learning methods.  ( 2 min )
    Bayesian regularization of empirical MDPs. (arXiv:2208.02362v1 [cs.LG])
    In most applications of model-based Markov decision processes, the parameters for the unknown underlying model are often estimated from the empirical data. Due to noise, the policy learnedfrom the estimated model is often far from the optimal policy of the underlying model. When applied to the environment of the underlying model, the learned policy results in suboptimal performance, thus calling for solutions with better generalization performance. In this work we take a Bayesian perspective and regularize the objective function of the Markov decision process with prior information in order to obtain more robust policies. Two approaches are proposed, one based on $L^1$ regularization and the other on relative entropic regularization. We evaluate our proposed algorithms on synthetic simulations and on real-world search logs of a large scale online shopping store. Our results demonstrate the robustness of regularized MDP policies against the noise present in the models.  ( 2 min )
    Invariant Representations with Stochastically Quantized Neural Networks. (arXiv:2208.02656v1 [cs.LG])
    Representation learning algorithms offer the opportunity to learn invariant representations of the input data with regard to nuisance factors. Many authors have leveraged such strategies to learn fair representations, i.e., vectors where information about sensitive attributes is removed. These methods are attractive as they may be interpreted as minimizing the mutual information between a neural layer's activations and a sensitive attribute. However, the theoretical grounding of such methods relies either on the computation of infinitely accurate adversaries or on minimizing a variational upper bound of a mutual information estimate. In this paper, we propose a methodology for direct computation of the mutual information between a neural layer and a sensitive attribute. We employ stochastically-activated binary neural networks, which lets us treat neurons as random variables. We are then able to compute (not bound) the mutual information between a layer and a sensitive attribute and use this information as a regularization factor during gradient descent. We show that this method compares favorably with the state of the art in fair representation learning and that the learned representations display a higher level of invariance compared to full-precision neural networks.  ( 2 min )
    Counterfactual Image Synthesis for Discovery of Personalized Predictive Image Markers. (arXiv:2208.02311v1 [cs.CV])
    The discovery of patient-specific imaging markers that are predictive of future disease outcomes can help us better understand individual-level heterogeneity of disease evolution. In fact, deep learning models that can provide data-driven personalized markers are much more likely to be adopted in medical practice. In this work, we demonstrate that data-driven biomarker discovery can be achieved through a counterfactual synthesis process. We show how a deep conditional generative model can be used to perturb local imaging features in baseline images that are pertinent to subject-specific future disease evolution and result in a counterfactual image that is expected to have a different future outcome. Candidate biomarkers, therefore, result from examining the set of features that are perturbed in this process. Through several experiments on a large-scale, multi-scanner, multi-center multiple sclerosis (MS) clinical trial magnetic resonance imaging (MRI) dataset of relapsing-remitting (RRMS) patients, we demonstrate that our model produces counterfactuals with changes in imaging features that reflect established clinical markers predictive of future MRI lesional activity at the population level. Additional qualitative results illustrate that our model has the potential to discover novel and subject-specific predictive markers of future activity.  ( 3 min )
    Feature selection with gradient descent on two-layer networks in low-rotation regimes. (arXiv:2208.02789v1 [cs.LG])
    This work establishes low test error of gradient flow (GF) and stochastic gradient descent (SGD) on two-layer ReLU networks with standard initialization, in three regimes where key sets of weights rotate little (either naturally due to GF and SGD, or due to an artificial constraint), and making use of margins as the core analytic technique. The first regime is near initialization, specifically until the weights have moved by $\mathcal{O}(\sqrt m)$, where $m$ denotes the network width, which is in sharp contrast to the $\mathcal{O}(1)$ weight motion allowed by the Neural Tangent Kernel (NTK); here it is shown that GF and SGD only need a network width and number of samples inversely proportional to the NTK margin, and moreover that GF attains at least the NTK margin itself, which suffices to establish escape from bad KKT points of the margin objective, whereas prior work could only establish nondecreasing but arbitrarily small margins. The second regime is the Neural Collapse (NC) setting, where data lies in extremely-well-separated groups, and the sample complexity scales with the number of groups; here the contribution over prior work is an analysis of the entire GF trajectory from initialization. Lastly, if the inner layer weights are constrained to change in norm only and can not rotate, then GF with large widths achieves globally maximal margins, and its sample complexity scales with their inverse; this is in contrast to prior work, which required infinite width and a tricky dual convergence assumption. As purely technical contributions, this work develops a variety of potential functions and other tools which will hopefully aid future work.  ( 3 min )
    Dynamic Planning in Open-Ended Dialogue using Reinforcement Learning. (arXiv:2208.02294v1 [cs.CL])
    Despite recent advances in natural language understanding and generation, and decades of research on the development of conversational bots, building automated agents that can carry on rich open-ended conversations with humans "in the wild" remains a formidable challenge. In this work we develop a real-time, open-ended dialogue system that uses reinforcement learning (RL) to power a bot's conversational skill at scale. Our work pairs the succinct embedding of the conversation state generated using SOTA (supervised) language models with RL techniques that are particularly suited to a dynamic action space that changes as the conversation progresses. Trained using crowd-sourced data, our novel system is able to substantially exceeds the (strong) baseline supervised model with respect to several metrics of interest in a live experiment with real users of the Google Assistant.  ( 2 min )
    A Nonlinear PID-Enhanced Adaptive Latent Factor Analysis Model. (arXiv:2208.02513v1 [cs.LG])
    High-dimensional and incomplete (HDI) data holds tremendous interactive information in various industrial applications. A latent factor (LF) model is remarkably effective in extracting valuable information from HDI data with stochastic gradient decent (SGD) algorithm. However, an SGD-based LFA model suffers from slow convergence since it only considers the current learning error. To address this critical issue, this paper proposes a Nonlinear PID-enhanced Adaptive Latent Factor (NPALF) model with two-fold ideas: 1) rebuilding the learning error via considering the past learning errors following the principle of a nonlinear PID controller; b) implementing all parameters adaptation effectively following the principle of a particle swarm optimization (PSO) algorithm. Experience results on four representative HDI datasets indicate that compared with five state-of-the-art LFA models, the NPALF model achieves better convergence rate and prediction accuracy for missing data of an HDI data.  ( 2 min )
    Gradient-based Bi-level Optimization for Deep Learning: A Survey. (arXiv:2207.11719v2 [cs.LG] UPDATED)
    Bi-level optimization, especially the gradient-based category, has been widely used in the deep learning community including hyperparameter optimization and meta knowledge extraction. Bi-level optimization embeds one problem within another and the gradient-based category solves the outer level task by computing the hypergradient, which is much more efficient than classical methods such as the evolutionary algorithm. In this survey, we first give a formal definition of the gradient-based bi-level optimization. Secondly, we illustrate how to formulate a research problem as a bi-level optimization problem, which is of great practical use for beginners. More specifically, there are two formulations: the single-task formulation to optimize hyperparameters such as regularization parameters and the distilled data, and the multi-task formulation to extract meta knowledge such as the model initialization. With a bi-level formulation, we then discuss four bi-level optimization solvers to update the outer variable including explicit gradient update, proxy update, implicit function update, and closed-form update. Last but not least, we conclude the survey by pointing out the great potential of gradient-based bi-level optimization on science problems (AI4Science).  ( 2 min )
    Privacy Safe Representation Learning via Frequency Filtering Encoder. (arXiv:2208.02482v1 [cs.CV])
    Deep learning models are increasingly deployed in real-world applications. These models are often deployed on the server-side and receive user data in an information-rich representation to solve a specific task, such as image classification. Since images can contain sensitive information, which users might not be willing to share, privacy protection becomes increasingly important. Adversarial Representation Learning (ARL) is a common approach to train an encoder that runs on the client-side and obfuscates an image. It is assumed, that the obfuscated image can safely be transmitted and used for the task on the server without privacy concerns. However, in this work, we find that training a reconstruction attacker can successfully recover the original image of existing ARL methods. To this end, we introduce a novel ARL method enhanced through low-pass filtering, limiting the available information amount to be encoded in the frequency domain. Our experimental results reveal that our approach withstands reconstruction attacks while outperforming previous state-of-the-art methods regarding the privacy-utility trade-off. We further conduct a user study to qualitatively assess our defense of the reconstruction attack.  ( 2 min )
    Robust Adaptive Submodular Maximization. (arXiv:2107.11333v3 [cs.DS] UPDATED)
    The goal of a sequential decision making problem is to design an interactive policy that adaptively selects a group of items, each selection is based on the feedback from the past, in order to maximize the expected utility of selected items. It has been shown that the utility functions of many real-world applications are adaptive submodular. However, most of existing studies on adaptive submodular optimization focus on the average-case. Unfortunately, a policy that has a good average-case performance may have very poor performance under the worst-case realization. In this study, we propose to study two variants of adaptive submodular optimization problems, namely, worst-case adaptive submodular maximization and robust submodular maximization. The first problem aims to find a policy that maximizes the worst-case utility and the latter one aims to find a policy, if any, that achieves both near optimal average-case utility and worst-case utility simultaneously. We introduce a new class of stochastic functions, called \emph{worst-case submodular function}. For the worst-case adaptive submodular maximization problem subject to a $p$-system constraint, we develop an adaptive worst-case greedy policy that achieves a $\frac{1}{p+1}$ approximation ratio against the optimal worst-case utility if the utility function is worst-case submodular. For the robust adaptive submodular maximization problem subject to cardinality constraints (resp. partition matroid constraints), if the utility function is both worst-case submodular and adaptive submodular, we develop a hybrid adaptive policy that achieves an approximation close to $1-e^{-\frac{1}{2}}$ (resp. $1/3$) under both worst- and average-case settings simultaneously. We also describe several applications of our theoretical results, including pool-base active learning, stochastic submodular set cover and adaptive viral marketing.  ( 3 min )
    How Much Privacy Does Federated Learning with Secure Aggregation Guarantee?. (arXiv:2208.02304v1 [cs.LG])
    Federated learning (FL) has attracted growing interest for enabling privacy-preserving machine learning on data stored at multiple users while avoiding moving the data off-device. However, while data never leaves users' devices, privacy still cannot be guaranteed since significant computations on users' training data are shared in the form of trained local models. These local models have recently been shown to pose a substantial privacy threat through different privacy attacks such as model inversion attacks. As a remedy, Secure Aggregation (SA) has been developed as a framework to preserve privacy in FL, by guaranteeing the server can only learn the global aggregated model update but not the individual model updates. While SA ensures no additional information is leaked about the individual model update beyond the aggregated model update, there are no formal guarantees on how much privacy FL with SA can actually offer; as information about the individual dataset can still potentially leak through the aggregated model computed at the server. In this work, we perform a first analysis of the formal privacy guarantees for FL with SA. Specifically, we use Mutual Information (MI) as a quantification metric and derive upper bounds on how much information about each user's dataset can leak through the aggregated model update. When using the FedSGD aggregation algorithm, our theoretical bounds show that the amount of privacy leakage reduces linearly with the number of users participating in FL with SA. To validate our theoretical bounds, we use an MI Neural Estimator to empirically evaluate the privacy leakage under different FL setups on both the MNIST and CIFAR10 datasets. Our experiments verify our theoretical bounds for FedSGD, which show a reduction in privacy leakage as the number of users and local batch size grow, and an increase in privacy leakage with the number of training rounds.  ( 3 min )
    Backward Imitation and Forward Reinforcement Learning via Bi-directional Model Rollouts. (arXiv:2208.02434v1 [cs.LG])
    Traditional model-based reinforcement learning (RL) methods generate forward rollout traces using the learnt dynamics model to reduce interactions with the real environment. The recent model-based RL method considers the way to learn a backward model that specifies the conditional probability of the previous state given the previous action and the current state to additionally generate backward rollout trajectories. However, in this type of model-based method, the samples derived from backward rollouts and those from forward rollouts are simply aggregated together to optimize the policy via the model-free RL algorithm, which may decrease both the sample efficiency and the convergence rate. This is because such an approach ignores the fact that backward rollout traces are often generated starting from some high-value states and are certainly more instructive for the agent to improve the behavior. In this paper, we propose the backward imitation and forward reinforcement learning (BIFRL) framework where the agent treats backward rollout traces as expert demonstrations for the imitation of excellent behaviors, and then collects forward rollout transitions for policy reinforcement. Consequently, BIFRL empowers the agent to both reach to and explore from high-value states in a more efficient manner, and further reduces the real interactions, making it potentially more suitable for real-robot learning. Moreover, a value-regularized generative adversarial network is introduced to augment the valuable states which are infrequently received by the agent. Theoretically, we provide the condition where BIFRL is superior to the baseline methods. Experimentally, we demonstrate that BIFRL acquires the better sample efficiency and produces the competitive asymptotic performance on various MuJoCo locomotion tasks compared against state-of-the-art model-based methods.  ( 3 min )
    OCFR 2022: Competition on Occluded Face Recognition From Synthetically Generated Structure-Aware Occlusions. (arXiv:2208.02760v1 [cs.CV])
    This work summarizes the IJCB Occluded Face Recognition Competition 2022 (IJCB-OCFR-2022) embraced by the 2022 International Joint Conference on Biometrics (IJCB 2022). OCFR-2022 attracted a total of 3 participating teams, from academia. Eventually, six valid submissions were submitted and then evaluated by the organizers. The competition was held to address the challenge of face recognition in the presence of severe face occlusions. The participants were free to use any training data and the testing data was built by the organisers by synthetically occluding parts of the face images using a well-known dataset. The submitted solutions presented innovations and performed very competitively with the considered baseline. A major output of this competition is a challenging, realistic, and diverse, and publicly available occluded face recognition benchmark with well defined evaluation protocols.  ( 2 min )
    Modular Grammatical Evolution for the Generation of Artificial Neural Networks. (arXiv:2208.02787v1 [cs.NE])
    This paper presents a novel method, called Modular Grammatical Evolution (MGE), towards validating the hypothesis that restricting the solution space of NeuroEvolution to modular and simple neural networks enables the efficient generation of smaller and more structured neural networks while providing acceptable (and in some cases superior) accuracy on large data sets. MGE also enhances the state-of-the-art Grammatical Evolution (GE) methods in two directions. First, MGE's representation is modular in that each individual has a set of genes, and each gene is mapped to a neuron by grammatical rules. Second, the proposed representation mitigates two important drawbacks of GE, namely the low scalability and weak locality of representation, towards generating modular and multi-layer networks with a high number of neurons. We define and evaluate five different forms of structures with and without modularity using MGE and find single-layer modules with no coupling more productive. Our experiments demonstrate that modularity helps in finding better neural networks faster. We have validated the proposed method using ten well-known classification benchmarks with different sizes, feature counts, and output class count. Our experimental results indicate that MGE provides superior accuracy with respect to existing NeuroEvolution methods and returns classifiers that are significantly simpler than other machine learning generated classifiers. Finally, we empirically demonstrate that MGE outperforms other GE methods in terms of locality and scalability properties.  ( 3 min )
    Image-based Detection of Surface Defects in Concrete during Construction. (arXiv:2208.02313v1 [cs.CV])
    Defects increase the cost and duration of construction projects. Automating defect detection would reduce documentation efforts that are necessary to decrease the risk of defects delaying construction projects. Since concrete is a widely used construction material, this work focuses on detecting honeycombs, a substantial defect in concrete structures that may even affect structural integrity. First, images were compared that were either scraped from the web or obtained from actual practice. The results demonstrate that web images represent just a selection of honeycombs and do not capture the complete variance. Second, Mask R-CNN and EfficientNet-B0 were trained for honeycomb detection to evaluate instance segmentation and patch-based classification, respectively achieving 47.7% precision and 34.2% recall as well as 68.5% precision and 55.7% recall. Although the performance of those models is not sufficient for completely automated defect detection, the models could be used for active learning integrated into defect documentation systems. In conclusion, CNNs can assist detecting honeycombs in concrete.  ( 2 min )
    Visually Evaluating Generative Adversarial Networks Using Itself under Multivariate Time Series. (arXiv:2208.02649v1 [cs.LG])
    Visually evaluating the goodness of generated Multivariate Time Series (MTS) are difficult to implement, especially in the case that the generative model is Generative Adversarial Networks (GANs). We present a general framework named Gaussian GANs to visually evaluate GANs using itself under the MTS generation task. Firstly, we attempt to find the transformation function in the multivariate Kolmogorov Smirnov (MKS) test by explicitly reconstructing the architecture of GANs. Secondly, we conduct the normality test of transformed MST where the Gaussian GANs serves as the transformation function in the MKS test. In order to simplify the normality test, an efficient visualization is proposed using the chi square distribution. In the experiment, we use the UniMiB dataset and provide empirical evidence showing that the normality test using Gaussian GANs and chi sqaure visualization is effective and credible.  ( 2 min )
    Risk and optimal policies in bandit experiments. (arXiv:2112.06363v8 [econ.EM] UPDATED)
    We provide a decision theoretic analysis of bandit experiments. Working within the framework of diffusion asymptotics, we define suitable notions of asymptotic Bayes and minimax risk for these experiments. For normally distributed rewards, the minimal Bayes risk can be characterized as the solution to a second-order partial differential equation (PDE). Using a limit of experiments approach, we show that this PDE characterization also holds asymptotically under both parametric and non-parametric distributions of the rewards. The approach further describes the state variables it is asymptotically sufficient to restrict attention to, and thereby suggests a practical strategy for dimension reduction. The PDEs characterizing minimal Bayes risk can be solved efficiently using sparse matrix routines. We derive the optimal Bayes and minimax policies from their numerical solutions. These optimal policies substantially dominate existing methods such as Thompson sampling and UCB, often by a factor of two. The framework also covers time discounting and pure exploration.
    Data Collection and Quality Challenges in Deep Learning: A Data-Centric AI Perspective. (arXiv:2112.06409v2 [cs.LG] UPDATED)
    Data-centric AI is at the center of a fundamental shift in software engineering where machine learning becomes the new software, powered by big data and computing infrastructure. Here software engineering needs to be re-thought where data becomes a first-class citizen on par with code. One striking observation is that a significant portion of the machine learning process is spent on data preparation. Without good data, even the best machine learning algorithms cannot perform well. As a result, data-centric AI practices are now becoming mainstream. Unfortunately, many datasets in the real world are small, dirty, biased, and even poisoned. In this survey, we study the research landscape for data collection and data quality primarily for deep learning applications. Data collection is important because there is lesser need for feature engineering for recent deep learning approaches, but instead more need for large amounts of data. For data quality, we study data validation, cleaning, and integration techniques. Even if the data cannot be fully cleaned, we can still cope with imperfect data during model training using robust model training techniques. In addition, while bias and fairness have been less studied in traditional data management research, these issues become essential topics in modern machine learning applications. We thus study fairness measures and unfairness mitigation techniques that can be applied before, during, or after model training. We believe that the data management community is well poised to solve these problems.
    Transformers as Meta-Learners for Implicit Neural Representations. (arXiv:2208.02801v1 [cs.LG])
    Implicit Neural Representations (INRs) have emerged and shown their benefits over discrete representations in recent years. However, fitting an INR to the given observations usually requires optimization with gradient descent from scratch, which is inefficient and does not generalize well with sparse observations. To address this problem, most of the prior works train a hypernetwork that generates a single vector to modulate the INR weights, where the single vector becomes an information bottleneck that limits the reconstruction precision of the output INR. Recent work shows that the whole set of weights in INR can be precisely inferred without the single-vector bottleneck by gradient-based meta-learning. Motivated by a generalized formulation of gradient-based meta-learning, we propose a formulation that uses Transformers as hypernetworks for INRs, where it can directly build the whole set of INR weights with Transformers specialized as set-to-set mapping. We demonstrate the effectiveness of our method for building INRs in different tasks and domains, including 2D image regression and view synthesis for 3D objects. Our work draws connections between the Transformer hypernetworks and gradient-based meta-learning algorithms and we provide further analysis for understanding the generated INRs. The project page with code is at \url{https://yinboc.github.io/trans-inr/} .
    Design of secure and robust cognitive system for malware detection. (arXiv:2208.02310v1 [cs.CR])
    Machine learning based malware detection techniques rely on grayscale images of malware and tends to classify malware based on the distribution of textures in graycale images. Albeit the advancement and promising results shown by machine learning techniques, attackers can exploit the vulnerabilities by generating adversarial samples. Adversarial samples are generated by intelligently crafting and adding perturbations to the input samples. There exists majority of the software based adversarial attacks and defenses. To defend against the adversaries, the existing malware detection based on machine learning and grayscale images needs a preprocessing for the adversarial data. This can cause an additional overhead and can prolong the real-time malware detection. So, as an alternative to this, we explore RRAM (Resistive Random Access Memory) based defense against adversaries. Therefore, the aim of this thesis is to address the above mentioned critical system security issues. The above mentioned challenges are addressed by demonstrating proposed techniques to design a secure and robust cognitive system. First, a novel technique to detect stealthy malware is proposed. The technique uses malware binary images and then extract different features from the same and then employ different ML-classifiers on the dataset thus obtained. Results demonstrate that this technique is successful in differentiating classes of malware based on the features extracted. Secondly, I demonstrate the effects of adversarial attacks on a reconfigurable RRAM-neuromorphic architecture with different learning algorithms and device characteristics. I also propose an integrated solution for mitigating the effects of the adversarial attack using the reconfigurable RRAM architecture.
    A Hybrid Framework for Sequential Data Prediction with End-to-End Optimization. (arXiv:2203.13787v2 [stat.ML] UPDATED)
    We investigate nonlinear prediction in an online setting and introduce a hybrid model that effectively mitigates, via an end-to-end architecture, the need for hand-designed features and manual model selection issues of conventional nonlinear prediction/regression methods. In particular, we use recursive structures to extract features from sequential signals, while preserving the state information, i.e., the history, and boosted decision trees to produce the final output. The connection is in an end-to-end fashion and we jointly optimize the whole architecture using stochastic gradient descent, for which we also provide the backward pass update equations. In particular, we employ a recurrent neural network (LSTM) for adaptive feature extraction from sequential data and a gradient boosting machinery (soft GBDT) for effective supervised regression. Our framework is generic so that one can use other deep learning architectures for feature extraction (such as RNNs and GRUs) and machine learning algorithms for decision making as long as they are differentiable. We demonstrate the learning behavior of our algorithm on synthetic data and the significant performance improvements over the conventional methods over various real life datasets. Furthermore, we openly share the source code of the proposed method to facilitate further research.
    Analyzing Data-Centric Properties for Contrastive Learning on Graphs. (arXiv:2208.02810v1 [cs.LG])
    Recent analyses of self-supervised learning (SSL) find the following data-centric properties to be critical for learning good representations: invariance to task-irrelevant semantics, separability of classes in some latent space, and recoverability of labels from augmented samples. However, given their discrete, non-Euclidean nature, graph datasets and graph SSL methods are unlikely to satisfy these properties. This raises the question: how do graph SSL methods, such as contrastive learning (CL), work well? To systematically probe this question, we perform a generalization analysis for CL when using generic graph augmentations (GGAs), with a focus on data-centric properties. Our analysis yields formal insights into the limitations of GGAs and the necessity of task-relevant augmentations. As we empirically show, GGAs do not induce task-relevant invariances on common benchmark datasets, leading to only marginal gains over naive, untrained baselines. Our theory motivates a synthetic data generation process that enables control over task-relevant information and boasts pre-defined optimal augmentations. This flexible benchmark helps us identify yet unrecognized limitations in advanced augmentation techniques (e.g., automated methods). Overall, our work rigorously contextualizes, both empirically and theoretically, the effects of data-centric properties on augmentation strategies and learning paradigms for graph SSL.
    Towards Understanding Mixture of Experts in Deep Learning. (arXiv:2208.02813v1 [cs.LG])
    The Mixture-of-Experts (MoE) layer, a sparsely-activated model controlled by a router, has achieved great success in deep learning. However, the understanding of such architecture remains elusive. In this paper, we formally study how the MoE layer improves the performance of neural network learning and why the mixture model will not collapse into a single model. Our empirical results suggest that the cluster structure of the underlying problem and the non-linearity of the expert are pivotal to the success of MoE. To further understand this, we consider a challenging classification problem with intrinsic cluster structures, which is hard to learn using a single expert. Yet with the MoE layer, by choosing the experts as two-layer nonlinear convolutional neural networks (CNNs), we show that the problem can be learned successfully. Furthermore, our theory shows that the router can learn the cluster-center features, which helps divide the input complex problem into simpler linear classification sub-problems that individual experts can conquer. To our knowledge, this is the first result towards formally understanding the mechanism of the MoE layer for deep learning.
    Distilling Knowledge from Reader to Retriever for Question Answering. (arXiv:2012.04584v2 [cs.CL] UPDATED)
    The task of information retrieval is an important component of many natural language processing systems, such as open domain question answering. While traditional methods were based on hand-crafted features, continuous representations based on neural networks recently obtained competitive results. A challenge of using such methods is to obtain supervised data to train the retriever model, corresponding to pairs of query and support documents. In this paper, we propose a technique to learn retriever models for downstream tasks, inspired by knowledge distillation, and which does not require annotated pairs of query and documents. Our approach leverages attention scores of a reader model, used to solve the task based on retrieved documents, to obtain synthetic labels for the retriever. We evaluate our method on question answering, obtaining state-of-the-art results.
    Bayesian Optimization with Informative Covariance. (arXiv:2208.02704v1 [cs.LG])
    Bayesian Optimization is a methodology for global optimization of unknown and expensive objectives. It combines a surrogate Bayesian regression model with an acquisition function to decide where to evaluate the objective. Typical regression models are Gaussian processes with stationary covariance functions, which, however, are unable to express prior input-dependent information, in particular information about possible locations of the optimum. The ubiquity of stationary models has led to the common practice of exploiting prior information via informative mean functions. In this paper, we highlight that these models can lead to poor performance, especially in high dimensions. We propose novel informative covariance functions that leverage nonstationarity to encode preferences for certain regions of the search space and adaptively promote local exploration during the optimization. We demonstrate that they can increase the sample efficiency of the optimization in high dimensions, even under weak prior information.
    Towards Generating Large Synthetic Phytoplankton Datasets for Efficient Monitoring of Harmful Algal Blooms. (arXiv:2208.02332v1 [cs.CV])
    Climate change is increasing the frequency and severity of harmful algal blooms (HABs), which cause significant fish deaths in aquaculture farms. This contributes to ocean pollution and greenhouse gas (GHG) emissions since dead fish are either dumped into the ocean or taken to landfills, which in turn negatively impacts the climate. Currently, the standard method to enumerate harmful algae and other phytoplankton is to manually observe and count them under a microscope. This is a time-consuming, tedious and error-prone process, resulting in compromised management decisions by farmers. Hence, automating this process for quick and accurate HAB monitoring is extremely helpful. However, this requires large and diverse datasets of phytoplankton images, and such datasets are hard to produce quickly. In this work, we explore the feasibility of generating novel high-resolution photorealistic synthetic phytoplankton images, containing multiple species in the same image, given a small dataset of real images. To this end, we employ Generative Adversarial Networks (GANs) to generate synthetic images. We evaluate three different GAN architectures: ProjectedGAN, FastGAN, and StyleGANv2 using standard image quality metrics. We empirically show the generation of high-fidelity synthetic phytoplankton images using a training dataset of only 961 real images. Thus, this work demonstrates the ability of GANs to create large synthetic datasets of phytoplankton from small training datasets, accomplishing a key step towards sustainable systematic monitoring of harmful algal blooms.
    Sparse Continuous Distributions and Fenchel-Young Losses. (arXiv:2108.01988v2 [cs.LG] UPDATED)
    Exponential families are widely used in machine learning, including many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, recent work on sparse alternatives to softmax (e.g., sparsemax, $\alpha$-entmax, and fusedmax), has led to distributions with varying support. This paper develops sparse alternatives to continuous distributions, based on several technical contributions: First, we define $\Omega$-regularized prediction maps and Fenchel-Young losses for arbitrary domains (possibly countably infinite or continuous). For linearly parametrized families, we show that minimization of Fenchel-Young losses is equivalent to moment matching of the statistics, generalizing a fundamental property of exponential families. When $\Omega$ is a Tsallis negentropy with parameter $\alpha$, we obtain ``deformed exponential families,'' which include $\alpha$-entmax and sparsemax ($\alpha=2$) as particular cases. For quadratic energy functions, the resulting densities are $\beta$-Gaussians, an instance of elliptical distributions that contain as particular cases the Gaussian, biweight, triweight, and Epanechnikov densities, and for which we derive closed-form expressions for the variance, Tsallis entropy, and Fenchel-Young loss. When $\Omega$ is a total variation or Sobolev regularizer, we obtain a continuous version of the fusedmax. Finally, we introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for $\alpha \in \{1, 4/3, 3/2, 2\}$. Using these algorithms, we demonstrate our sparse continuous distributions for attention-based audio classification and visual question answering, showing that they allow attending to time intervals and compact regions.
    Learning Green's functions associated with time-dependent partial differential equations. (arXiv:2204.12789v2 [math.NA] UPDATED)
    Neural operators are a popular technique in scientific machine learning to learn a mathematical model of the behavior of unknown physical systems from data. Neural operators are especially useful to learn solution operators associated with partial differential equations (PDEs) from pairs of forcing functions and solutions when numerical solvers are not available or the underlying physics is poorly understood. In this work, we attempt to provide theoretical foundations to understand the amount of training data needed to learn time-dependent PDEs. Given input-output pairs from a parabolic PDE in any spatial dimension $n\geq 1$, we derive the first theoretically rigorous scheme for learning the associated solution operator, which takes the form of a convolution with a Green's function $G$. Until now, rigorously learning Green's functions associated with time-dependent PDEs has been a major challenge in the field of scientific machine learning because $G$ may not be square-integrable when $n>1$, and time-dependent PDEs have transient dynamics. By combining the hierarchical low-rank structure of $G$ together with randomized numerical linear algebra, we construct an approximant to $G$ that achieves a relative error of $\smash{\mathcal{O}(\Gamma_\epsilon^{-1/2}\epsilon)}$ in the $L^1$-norm with high probability by using at most $\smash{\mathcal{O}(\epsilon^{-\frac{n+2}{2}}\log(1/\epsilon))}$ input-output training pairs, where $\Gamma_\epsilon$ is a measure of the quality of the training dataset for learning $G$, and $\epsilon>0$ is sufficiently small.
    Unsupervised Graph Spectral Feature Denoising for Crop Yield Prediction. (arXiv:2208.02714v1 [cs.LG])
    Prediction of annual crop yields at a county granularity is important for national food production and price stability. In this paper, towards the goal of better crop yield prediction, leveraging recent graph signal processing (GSP) tools to exploit spatial correlation among neighboring counties, we denoise relevant features via graph spectral filtering that are inputs to a deep learning prediction model. Specifically, we first construct a combinatorial graph with edge weights that encode county-to-county similarities in soil and location features via metric learning. We then denoise features via a maximum a posteriori (MAP) formulation with a graph Laplacian regularizer (GLR). We focus on the challenge to estimate the crucial weight parameter $\mu$, trading off the fidelity term and GLR, that is a function of noise variance in an unsupervised manner. We first estimate noise variance directly from noise-corrupted graph signals using a graph clique detection (GCD) procedure that discovers locally constant regions. We then compute an optimal $\mu$ minimizing an approximate mean square error function via bias-variance analysis. Experimental results from collected USDA data show that using denoised features as input, performance of a crop yield prediction model can be improved noticeably.
    Fusing Sentence Embeddings Into LSTM-based Autoregressive Language Models. (arXiv:2208.02402v1 [cs.CL])
    Although masked language models are highly performant and widely adopted by NLP practitioners, they can not be easily used for autoregressive language modelling (next word prediction and sequence probability estimation). We present an LSTM-based autoregressive language model which uses prefix embeddings (from a pretrained masked language model) via fusion (e.g. concatenation) to obtain a richer context representation for language modelling. We find that fusion helps reliably in lowering the perplexity (16.74 $\rightarrow$ 15.80), which is even preserved after a transfer to a dataset from a different domain than the training data. We also evaluate the best-performing fusion model by correlating its next word surprisal estimates with human reading times. Contradicting our expectation, and despite the improvement in perplexity overall, the correlation remains the same as for the baseline model. Lastly, while we focus on language models pre-trained on text as the sources for the fusion, our approach can be possibly extended to fuse any information represented as a fixed-size vector into an auto-regressive language model. These include e.g. sentence external information retrieved for a knowledge base or representations of multi-modal encoders.
    Benchmark Static API Call Datasets for Malware Family Classification. (arXiv:2111.15205v2 [cs.CR] UPDATED)
    Nowadays, malware and malware incidents are increasing daily, even with various antivirus systems and malware detection or classification methodologies. Machine learning techniques have been the main focus of the security experts to detect malware and determine their families. Many static, dynamic, and hybrid techniques have been presented for that purpose. In this study, the static analysis technique has been applied to malware samples to extract API calls, which is one of the most used features in machine/deep learning models as it represents the behavior of malware samples. Since the rapid increase and continuous evolution of malware affect the detection capacity of antivirus scanners, recent and updated datasets of malicious software became necessary to overcome this drawback. This paper introduces two new datasets: One with 14,616 samples obtained and compiled from VirusShare and one with 9,795 samples from VirusSample. In addition, benchmark results based on static API calls of malware samples are presented using several machine and deep learning models on these datasets. We believe that these two datasets and benchmark results enable researchers to test and validate their methods and approaches in this field.
    CFARnet: deep learning for target detection with constant false alarm rate. (arXiv:2208.02474v1 [cs.LG])
    We consider the problem of learning detectors with a Constant False Alarm Rate (CFAR). Classical model-based solutions to composite hypothesis testing are sensitive to imperfect models and are often computationally expensive. In contrast, data-driven machine learning is often more robust and yields classifiers with fixed computational complexity. Learned detectors usually do not have a CFAR as required in many applications. To close this gap, we introduce CFARnet where the loss function is penalized to promote similar distributions of the detector under any null hypothesis scenario. Asymptotic analysis in the case of linear models with general Gaussian noise reveals that the classical generalized likelihood ratio test (GLRT) is actually a minimizer of the CFAR constrained Bayes risk. Experiments in both synthetic data and real hyper-spectral images show that CFARnet leads to near CFAR detectors with similar accuracy as their competitors.
    Unifying physical systems' inductive biases in neural ODE using dynamics constraints. (arXiv:2208.02632v1 [cs.LG])
    Conservation of energy is at the core of many physical phenomena and dynamical systems. There have been a significant number of works in the past few years aimed at predicting the trajectory of motion of dynamical systems using neural networks while adhering to the law of conservation of energy. Most of these works are inspired by classical mechanics such as Hamiltonian and Lagrangian mechanics as well as Neural Ordinary Differential Equations. While these works have been shown to work well in specific domains respectively, there is a lack of a unifying method that is more generally applicable without requiring significant changes to the neural network architectures. In this work, we aim to address this issue by providing a simple method that could be applied to not just energy-conserving systems, but also dissipative systems, by including a different inductive bias in different cases in the form of a regularisation term in the loss function. The proposed method does not require changing the neural network architecture and could form the basis to validate a novel idea, therefore showing promises to accelerate research in this direction.
    DL-DRL: A double-layer deep reinforcement learning approach for large-scale task scheduling of multi-UAV. (arXiv:2208.02447v1 [cs.LG])
    This paper studies deep reinforcement learning (DRL) for the task scheduling problem of multiple unmanned aerial vehicles (UAVs). Current approaches generally use exact and heuristic algorithms to solve the problem, while the computation time rapidly increases as the task scale grows and heuristic rules need manual design. As a self-learning method, DRL can obtain a high-quality solution quickly without hand-engineered rules. However, the huge decision space makes the training of DRL models becomes unstable in situations with large-scale tasks. In this work, to address the large-scale problem, we develop a divide and conquer-based framework (DCF) to decouple the original problem into a task allocation and a UAV route planning subproblems, which are solved in the upper and lower layers, respectively. Based on DCF, a double-layer deep reinforcement learning approach (DL-DRL) is proposed, where an upper-layer DRL model is designed to allocate tasks to appropriate UAVs and a lower-layer DRL model [i.e., the widely used attention model (AM)] is applied to generate viable UAV routes. Since the upper-layer model determines the input data distribution of the lower-layer model, and its reward is calculated via the lower-layer model during training, we develop an interactive training strategy (ITS), where the whole training process consists of pre-training, intensive training, and alternate training processes. Experimental results show that our DL-DRL outperforms mainstream learning-based and most traditional methods, and is competitive with the state-of-the-art heuristic method [i.e., OR-Tools], especially on large-scale problems. The great generalizability of DL-DRL is also verified by testing the model learned for a problem size to larger ones. Furthermore, an ablation study demonstrates that our ITS can reach a compromise between the model performance and training duration.
    ZeroFL: Efficient On-Device Training for Federated Learning with Local Sparsity. (arXiv:2208.02507v1 [cs.LG])
    When the available hardware cannot meet the memory and compute requirements to efficiently train high performing machine learning models, a compromise in either the training quality or the model complexity is needed. In Federated Learning (FL), nodes are orders of magnitude more constrained than traditional server-grade hardware and are often battery powered, severely limiting the sophistication of models that can be trained under this paradigm. While most research has focused on designing better aggregation strategies to improve convergence rates and in alleviating the communication costs of FL, fewer efforts have been devoted to accelerating on-device training. Such stage, which repeats hundreds of times (i.e. every round) and can involve thousands of devices, accounts for the majority of the time required to train federated models and, the totality of the energy consumption at the client side. In this work, we present the first study on the unique aspects that arise when introducing sparsity at training time in FL workloads. We then propose ZeroFL, a framework that relies on highly sparse operations to accelerate on-device training. Models trained with ZeroFL and 95% sparsity achieve up to 2.3% higher accuracy compared to competitive baselines obtained from adapting a state-of-the-art sparse training framework to the FL setting.
    Impact Makes a Sound and Sound Makes an Impact: Sound Guides Representations and Explorations. (arXiv:2208.02680v1 [cs.RO])
    Sound is one of the most informative and abundant modalities in the real world while being robust to sense without contacts by small and cheap sensors that can be placed on mobile devices. Although deep learning is capable of extracting information from multiple sensory inputs, there has been little use of sound for the control and learning of robotic actions. For unsupervised reinforcement learning, an agent is expected to actively collect experiences and jointly learn representations and policies in a self-supervised way. We build realistic robotic manipulation scenarios with physics-based sound simulation and propose the Intrinsic Sound Curiosity Module (ISCM). The ISCM provides feedback to a reinforcement learner to learn robust representations and to reward a more efficient exploration behavior. We perform experiments with sound enabled during pre-training and disabled during adaptation, and show that representations learned by ISCM outperform the ones by vision-only baselines and pre-trained policies can accelerate the learning process when applied to downstream tasks.
    Theoretical Analysis of Primal-Dual Algorithm for Non-Convex Stochastic Decentralized Optimization. (arXiv:2205.11979v2 [math.OC] UPDATED)
    In recent years, decentralized learning has emerged as a powerful tool not only for large-scale machine learning, but also for preserving privacy. One of the key challenges in decentralized learning is that the data distribution held by each node is statistically heterogeneous. To address this challenge, the primal-dual algorithm called the Edge-Consensus Learning (ECL) was proposed and was experimentally shown to be robust to the heterogeneity of data distributions. However, the convergence rate of the ECL is provided only when the objective function is convex, and has not been shown in a standard machine learning setting where the objective function is non-convex. Furthermore, the intuitive reason why the ECL is robust to the heterogeneity of data distributions has not been investigated. In this work, we first investigate the relationship between the ECL and Gossip algorithm and show that the update formulas of the ECL can be regarded as correcting the local stochastic gradient in the Gossip algorithm. Then, we propose the Generalized ECL (G-ECL), which contains the ECL as a special case, and provide the convergence rates of the G-ECL in both (strongly) convex and non-convex settings, which do not depend on the heterogeneity of data distributions. Through synthetic experiments, we demonstrate that the numerical results of both the G-ECL and ECL coincide with the convergence rate of the G-ECL.
    Implicit Semantic Augmentation for Distance Metric Learning in Domain Generalization. (arXiv:2208.02803v1 [cs.LG])
    Domain generalization (DG) aims to learn a model on one or more different but related source domains that could be generalized into an unseen target domain. Existing DG methods try to prompt the diversity of source domains for the model's generalization ability, while they may have to introduce auxiliary networks or striking computational costs. On the contrary, this work applies the implicit semantic augmentation in feature space to capture the diversity of source domains. Concretely, an additional loss function of distance metric learning (DML) is included to optimize the local geometry of data distribution. Besides, the logits from cross entropy loss with infinite augmentations is adopted as input features for the DML loss in lieu of the deep features. We also provide a theoretical analysis to show that the logits can approximate the distances defined on original features well. Further, we provide an in-depth analysis of the mechanism and rational behind our approach, which gives us a better understanding of why leverage logits in lieu of features can help domain generalization. The proposed DML loss with the implicit augmentation is incorporated into a recent DG method, that is, Fourier Augmented Co-Teacher framework (FACT). Meanwhile, our method also can be easily plugged into various DG methods. Extensive experiments on three benchmarks (Digits-DG, PACS and Office-Home) have demonstrated that the proposed method is able to achieve the state-of-the-art performance.
    Adaptive Latent Factor Analysis via Generalized Momentum-Incorporated Particle Swarm Optimization. (arXiv:2208.02423v1 [cs.NE])
    Stochastic gradient descent (SGD) algorithm is an effective learning strategy to build a latent factor analysis (LFA) model on a high-dimensional and incomplete (HDI) matrix. A particle swarm optimization (PSO) algorithm is commonly adopted to make an SGD-based LFA model's hyper-parameters, i.e, learning rate and regularization coefficient, self-adaptation. However, a standard PSO algorithm may suffer from accuracy loss caused by premature convergence. To address this issue, this paper incorporates more historical information into each particle's evolutionary process for avoiding premature convergence following the principle of a generalized-momentum (GM) method, thereby innovatively achieving a novel GM-incorporated PSO (GM-PSO). With it, a GM-PSO-based LFA (GMPL) model is further achieved to implement efficient self-adaptation of hyper-parameters. The experimental results on three HDI matrices demonstrate that the GMPL model achieves a higher prediction accuracy for missing data estimation in industrial applications.
    GROWN+UP: A Graph Representation Of a Webpage Network Utilizing Pre-training. (arXiv:2208.02252v1 [cs.LG])
    Large pre-trained neural networks are ubiquitous and critical to the success of many downstream tasks in natural language processing and computer vision. However, within the field of web information retrieval, there is a stark contrast in the lack of similarly flexible and powerful pre-trained models that can properly parse webpages. Consequently, we believe that common machine learning tasks like content extraction and information mining from webpages have low-hanging gains that yet remain untapped. We aim to close the gap by introducing an agnostic deep graph neural network feature extractor that can ingest webpage structures, pre-train self-supervised on massive unlabeled data, and fine-tune to arbitrary tasks on webpages effectually. Finally, we show that our pre-trained model achieves state-of-the-art results using multiple datasets on two very different benchmarks: webpage boilerplate removal and genre classification, thus lending support to its potential application in diverse downstream tasks.
    Simulation and application of COVID-19 compartment model using physic-informed neural network. (arXiv:2208.02433v1 [q-bio.QM])
    In this work, SVEIDR model and its variants (Aged, Vaccination-structured models) are introduced to encode the effect of social contact for different age groups and vaccination status. Then we implement the Physic-Informed Neural Network on both simulation and real-world data. Results including the spread and forecasting analysis of COVID-19 learned from the neural network are shown in the paper.
    Using Mixed-Effects Models to Learn Bayesian Networks from Related Data Sets. (arXiv:2206.03743v2 [stat.ML] UPDATED)
    We commonly assume that data are a homogeneous set of observations when learning the structure of Bayesian networks. However, they often comprise different data sets that are related but not homogeneous because they have been collected in different ways or from different populations. In our previous work (Azzimonti, Corani and Scutari, 2021), we proposed a closed-form Bayesian Hierarchical Dirichlet score for discrete data that pools information across related data sets to learn a single encompassing network structure, while taking into account the differences in their probabilistic structures. In this paper, we provide an analogous solution for learning a Bayesian network from continuous data using mixed-effects models to pool information across the related data sets. We study its structural, parametric, predictive and classification accuracy and we show that it outperforms both conditional Gaussian Bayesian networks (that do not perform any pooling) and classical Gaussian Bayesian networks (that disregard the heterogeneous nature of the data). The improvement is marked for low sample sizes and for unbalanced data sets.
    Leveraging the HW/SW Optimizations and Ecosystems that Drive the AI Revolution. (arXiv:2208.02808v1 [cs.LG])
    This paper presents a state-of-the-art overview on how to architect, design, and optimize Deep Neural Networks (DNNs) such that performance is improved and accuracy is preserved. The paper covers a set of optimizations that span the entire Machine Learning processing pipeline. We introduce two types of optimizations. The first alters the DNN model and requires NN re-training, while the second does not. We focus on GPU optimizations, but we believe the presented techniques can be used with other AI inference platforms. To demonstrate the DNN model optimizations, we improve one of the most advanced deep network architectures for optical flow, RAFT arXiv:2003.12039, on a popular edge AI inference platform (Nvidia Jetson AGX Xavier).
    Pattern Spotting and Image Retrieval in Historical Documents using Deep Hashing. (arXiv:2208.02397v1 [cs.CV])
    This paper presents a deep learning approach for image retrieval and pattern spotting in digital collections of historical documents. First, a region proposal algorithm detects object candidates in the document page images. Next, deep learning models are used for feature extraction, considering two distinct variants, which provide either real-valued or binary code representations. Finally, candidate images are ranked by computing the feature similarity with a given input query. A robust experimental protocol evaluates the proposed approach considering each representation scheme (real-valued and binary code) on the DocExplore image database. The experimental results show that the proposed deep models compare favorably to the state-of-the-art image retrieval approaches for images of historical documents, outperforming other deep models by 2.56 percentage points using the same techniques for pattern spotting. Besides, the proposed approach also reduces the search time by up to 200x and the storage cost up to 6,000x when compared to related works based on real-valued representations.
    LaneSNNs: Spiking Neural Networks for Lane Detection on the Loihi Neuromorphic Processor. (arXiv:2208.02253v1 [cs.NE])
    Autonomous Driving (AD) related features represent important elements for the next generation of mobile robots and autonomous vehicles focused on increasingly intelligent, autonomous, and interconnected systems. The applications involving the use of these features must provide, by definition, real-time decisions, and this property is key to avoid catastrophic accidents. Moreover, all the decision processes must require low power consumption, to increase the lifetime and autonomy of battery-driven systems. These challenges can be addressed through efficient implementations of Spiking Neural Networks (SNNs) on Neuromorphic Chips and the use of event-based cameras instead of traditional frame-based cameras. In this paper, we present a new SNN-based approach, called LaneSNN, for detecting the lanes marked on the streets using the event-based camera input. We develop four novel SNN models characterized by low complexity and fast response, and train them using an offline supervised learning rule. Afterward, we implement and map the learned SNNs models onto the Intel Loihi Neuromorphic Research Chip. For the loss function, we develop a novel method based on the linear composition of Weighted binary Cross Entropy (WCE) and Mean Squared Error (MSE) measures. Our experimental results show a maximum Intersection over Union (IoU) measure of about 0.62 and very low power consumption of about 1 W. The best IoU is achieved with an SNN implementation that occupies only 36 neurocores on the Loihi processor while providing a low latency of less than 8 ms to recognize an image, thereby enabling real-time performance. The IoU measures provided by our networks are comparable with the state-of-the-art, but at a much low power consumption of 1 W.
    Backpropagation at the Infinitesimal Inference Limit of Energy-Based Models: Unifying Predictive Coding, Equilibrium Propagation, and Contrastive Hebbian Learning. (arXiv:2206.02629v3 [cs.LG] UPDATED)
    How the brain performs credit assignment is a fundamental unsolved problem in neuroscience. Many `biologically plausible' algorithms have been proposed, which compute gradients that approximate those computed by backpropagation (BP), and which operate in ways that more closely satisfy the constraints imposed by neural circuitry. Many such algorithms utilize the framework of energy-based models (EBMs), in which all free variables in the model are optimized to minimize a global energy function. However, in the literature, these algorithms exist in isolation and no unified theory exists linking them together. Here, we provide a comprehensive theory of the conditions under which EBMs can approximate BP, which lets us unify many of the BP approximation results in the literature (namely, predictive coding, equilibrium propagation, and contrastive Hebbian learning) and demonstrate that their approximation to BP arises from a simple and general mathematical property of EBMs at free-phase equilibrium. This property can then be exploited in different ways with different energy functions, and these specific choices yield a family of BP-approximating algorithms, which both includes the known results in the literature and can be used to derive new ones.
    Membership Inference Attacks and Defenses in Neural Network Pruning. (arXiv:2202.03335v2 [cs.CR] UPDATED)
    Neural network pruning has been an essential technique to reduce the computation and memory requirements for using deep neural networks for resource-constrained devices. Most existing research focuses primarily on balancing the sparsity and accuracy of a pruned neural network by strategically removing insignificant parameters and retraining the pruned model. Such efforts on reusing training samples pose serious privacy risks due to increased memorization, which, however, has not been investigated yet. In this paper, we conduct the first analysis of privacy risks in neural network pruning. Specifically, we investigate the impacts of neural network pruning on training data privacy, i.e., membership inference attacks. We first explore the impact of neural network pruning on prediction divergence, where the pruning process disproportionately affects the pruned model's behavior for members and non-members. Meanwhile, the influence of divergence even varies among different classes in a fine-grained manner. Enlighten by such divergence, we proposed a self-attention membership inference attack against the pruned neural networks. Extensive experiments are conducted to rigorously evaluate the privacy impacts of different pruning approaches, sparsity levels, and adversary knowledge. The proposed attack shows the higher attack performance on the pruned models when compared with eight existing membership inference attacks. In addition, we propose a new defense mechanism to protect the pruning process by mitigating the prediction divergence based on KL-divergence distance, whose effectiveness has been experimentally demonstrated to effectively mitigate the privacy risks while maintaining the sparsity and accuracy of the pruned models.
    Unsupervised Domain Adaptation with Contrastive Learning for OCT Segmentation. (arXiv:2203.03664v2 [cs.CV] UPDATED)
    Accurate segmentation of retinal fluids in 3D Optical Coherence Tomography images is key for diagnosis and personalized treatment of eye diseases. While deep learning has been successful at this task, trained supervised models often fail for images that do not resemble labeled examples, e.g. for images acquired using different devices. We hereby propose a novel semi-supervised learning framework for segmentation of volumetric images from new unlabeled domains. We jointly use supervised and contrastive learning, also introducing a contrastive pairing scheme that leverages similarity between nearby slices in 3D. In addition, we propose channel-wise aggregation as an alternative to conventional spatial-pooling aggregation for contrastive feature map projection. We evaluate our methods for domain adaptation from a (labeled) source domain to an (unlabeled) target domain, each containing images acquired with different acquisition devices. In the target domain, our method achieves a Dice coefficient 13.8% higher than SimCLR (a state-of-the-art contrastive framework), and leads to results comparable to an upper bound with supervised training in that domain. In the source domain, our model also improves the results by 5.4% Dice, by successfully leveraging information from many unlabeled images.
    Local versions of sum-of-norms clustering. (arXiv:2109.09589v3 [cs.LG] UPDATED)
    Sum-of-norms clustering is a convex optimization problem whose solution can be used for the clustering of multivariate data. We propose and study a localized version of this method, and show in particular that it can separate arbitrarily close balls in the stochastic ball model. More precisely, we prove a quantitative bound on the error incurred in the clustering of disjoint connected sets. Our bound is expressed in terms of the number of datapoints and the localization length of the functional.
    Noise-aware Physics-informed Machine Learning for Robust PDE Discovery. (arXiv:2206.12901v5 [math.NA] UPDATED)
    This work is concerned with discovering the governing partial differential equation (PDE) of a physical system. Existing methods have demonstrated the PDE identification from finite observations but failed to maintain satisfying results against noisy data, partly owing to suboptimal estimated derivatives and found PDE coefficients. We address the issues by introducing a noise-aware physics-informed machine learning (nPIML) framework to discover the governing PDE from data following arbitrary distributions. We propose training a couple of neural networks, namely solver and preselector, in a multi-task learning paradigm, which yields important scores of basis candidates that constitute the hidden physical constraint. After they are jointly trained, the solver network estimates potential candidates, e.g., partial derivatives, for the sparse regression algorithm to initially unveil the most likely parsimonious PDE, decided according to the information criterion. We also propose the denoising physics-informed neural networks (dPINNs), based on Discrete Fourier Transform (DFT), to deliver a set of the optimal finetuned PDE coefficients respecting the noise-reduced variables. The denoising PINNs are structured into forefront projection networks and a PINN, by which the formerly learned solver initializes. Our extensive experiments on five canonical PDEs affirm that the proposed framework presents a robust and interpretable approach for PDE discovery, applicable to a wide range of systems, possibly complicated by noise.
    Reliability analysis of discrete-state performance functions via adaptive sequential sampling with detection of failure surfaces. (arXiv:2208.02475v1 [cs.CE])
    The paper presents a new efficient and robust method for rare event probability estimation for computational models of an engineering product or a process returning categorical information only, for example, either success or failure. For such models, most of the methods designed for the estimation of failure probability, which use the numerical value of the outcome to compute gradients or to estimate the proximity to the failure surface, cannot be applied. Even if the performance function provides more than just binary output, the state of the system may be a non-smooth or even a discontinuous function defined in the domain of continuous input variables. In these cases, the classical gradient-based methods usually fail. We propose a simple yet efficient algorithm, which performs a sequential adaptive selection of points from the input domain of random variables to extend and refine a simple distance-based surrogate model. Two different tasks can be accomplished at any stage of sequential sampling: (i) estimation of the failure probability, and (ii) selection of the best possible candidate for the subsequent model evaluation if further improvement is necessary. The proposed criterion for selecting the next point for model evaluation maximizes the expected probability classified by using the candidate. Therefore, the perfect balance between global exploration and local exploitation is maintained automatically. The method can estimate the probabilities of multiple failure types. Moreover, when the numerical value of model evaluation can be used to build a smooth surrogate, the algorithm can accommodate this information to increase the accuracy of the estimated probabilities. Lastly, we define a new simple yet general geometrical measure of the global sensitivity of the rare-event probability to individual variables, which is obtained as a by-product of the proposed algorithm.
    Improving Personalised Physical Activity Recommendation on the mHealth Information Service Using Deep Reinforcement Learning. (arXiv:2204.00961v2 [cs.LG] UPDATED)
    Recently has seen the growth in the use of mobile health (mHealth) information services, which have rich guides on improving physical activity. These rich guides evolved from the consideration of various personal behavioural factors, which often deviate from the user's health conditions. The behavioural factors include changing fitness preferences, adherence issues, and uncertainty about future fitness outcomes, which may all lead to a decline in the quality of the mHealth information services. Many of these mHealth information services provide limited fitness guidance owing to the dynamics of the user's health conditions. This paper seeks an adaptive method using deep reinforcement learning to make personalised physical activity recommendations, which is learnt from retrospective physical activity data and can simulate realistic behaviour trajectories. We construct a real-time interaction model for the mHealth information service system based on scientific knowledge about physical activity to evaluate its exercise performance. The physical activity performance evaluation model is used to find the optimal exercise intensity considering the fitness and fatigue effects to avoid the lack of exercise or overload. The short-term activity plans are made using deep reinforcement learning and personal health conditions that change over time. Using this method, we can dynamically update the physical activity recommendation policy in accordance with the real implementation behaviour. Our DRL-based recommender policy was validated by comparison to other benchmark policies. Experimental results show that this adaptive learning algorithm can improve recommendation performance over 4.13 percent.
    Edge-centric Optimization of Multi-modal ML-driven eHealth Applications. (arXiv:2208.02597v1 [cs.LG])
    Smart eHealth applications deliver personalized and preventive digital healthcare services to clients through remote sensing, continuous monitoring, and data analytics. Smart eHealth applications sense input data from multiple modalities, transmit the data to edge and/or cloud nodes, and process the data with compute intensive machine learning (ML) algorithms. Run-time variations with continuous stream of noisy input data, unreliable network connection, computational requirements of ML algorithms, and choice of compute placement among sensor-edge-cloud layers affect the efficiency of ML-driven eHealth applications. In this chapter, we present edge-centric techniques for optimized compute placement, exploration of accuracy-performance trade-offs, and cross-layered sense-compute co-optimization for ML-driven eHealth applications. We demonstrate the practical use cases of smart eHealth applications in everyday settings, through a sensor-edge-cloud framework for an objective pain assessment case study.
    On Gap-dependent Bounds for Offline Reinforcement Learning. (arXiv:2206.00177v2 [cs.LG] UPDATED)
    This paper presents a systematic study on gap-dependent sample complexity in offline reinforcement learning. Prior work showed when the density ratio between an optimal policy and the behavior policy is upper bounded (the optimal policy coverage assumption), then the agent can achieve an $O\left(\frac{1}{\epsilon^2}\right)$ rate, which is also minimax optimal. We show under the optimal policy coverage assumption, the rate can be improved to $O\left(\frac{1}{\epsilon}\right)$ when there is a positive sub-optimality gap in the optimal $Q$-function. Furthermore, we show when the visitation probabilities of the behavior policy are uniformly lower bounded for states where an optimal policy's visitation probabilities are positive (the uniform optimal policy coverage assumption), the sample complexity of identifying an optimal policy is independent of $\frac{1}{\epsilon}$. Lastly, we present nearly-matching lower bounds to complement our gap-dependent upper bounds.
    Risk-Aware Linear Bandits: Theory and Applications in Smart Order Routing. (arXiv:2208.02389v1 [cs.LG])
    Motivated by practical considerations in machine learning for financial decision-making, such as risk-aversion and large action space, we initiate the study of risk-aware linear bandits. Specifically, we consider regret minimization under the mean-variance measure when facing a set of actions whose rewards can be expressed as linear functions of (initially) unknown parameters. Driven by the variance-minimizing G-optimal design, we propose the Risk-Aware Explore-then-Commit (RISE) algorithm and the Risk-Aware Successive Elimination (RISE++) algorithm. Then, we rigorously analyze their regret upper bounds to show that, by leveraging the linear structure, the algorithms can dramatically reduce the regret when compared to existing methods. Finally, we demonstrate the performance of the algorithms by conducting extensive numerical experiments in a synthetic smart order routing setup. Our results show that both RISE and RISE++ can outperform the competing methods, especially in complex decision-making scenarios.
    Implicit Neural Representations for Image Compression. (arXiv:2112.04267v2 [eess.IV] UPDATED)
    Recently Implicit Neural Representations (INRs) gained attention as a novel and effective representation for various data types. Thus far, prior work mostly focused on optimizing their reconstruction performance. This work investigates INRs from a novel perspective, i.e., as a tool for image compression. To this end, we propose the first comprehensive compression pipeline based on INRs including quantization, quantization-aware retraining and entropy coding. Encoding with INRs, i.e. overfitting to a data sample, is typically orders of magnitude slower. To mitigate this drawback, we leverage meta-learned initializations based on MAML to reach the encoding in fewer gradient updates which also generally improves rate-distortion performance of INRs. We find that our approach to source compression with INRs vastly outperforms similar prior work, is competitive with common compression algorithms designed specifically for images and closes the gap to state-of-the-art learned approaches based on Rate-Distortion Autoencoders. Moreover, we provide an extensive ablation study on the importance of individual components of our method which we hope facilitates future research on this novel approach to image compression.
    Word-Level Fine-Grained Story Visualization. (arXiv:2208.02341v1 [cs.CV])
    Story visualization aims to generate a sequence of images to narrate each sentence in a multi-sentence story with a global consistency across dynamic scenes and characters. Current works still struggle with output images' quality and consistency, and rely on additional semantic information or auxiliary captioning networks. To address these challenges, we first introduce a new sentence representation, which incorporates word information from all story sentences to mitigate the inconsistency problem. Then, we propose a new discriminator with fusion features and further extend the spatial attention to improve image quality and story consistency. Extensive experiments on different datasets and human evaluation demonstrate the superior performance of our approach, compared to state-of-the-art methods, neither using segmentation masks nor auxiliary captioning networks.
    Agnostic Learning of General ReLU Activation Using Gradient Descent. (arXiv:2208.02711v1 [cs.LG])
    We provide a convergence analysis of gradient descent for the problem of agnostically learning a single ReLU function under Gaussian distributions. Unlike prior work that studies the setting of zero bias, we consider the more challenging scenario when the bias of the ReLU function is non-zero. Our main result establishes that starting from random initialization, in a polynomial number of iterations gradient descent outputs, with high probability, a ReLU function that achieves a competitive error guarantee when compared to the error of the best ReLU function. We also provide finite sample guarantees, and these techniques generalize to a broader class of marginal distributions beyond Gaussians.
    Customs Import Declaration Datasets. (arXiv:2208.02484v1 [cs.LG])
    Given the huge volume of cross-border flows, effective and efficient control of trades becomes more crucial in protecting people and society from illicit trades while facilitating legitimate trades. However, limited accessibility of the transaction-level trade datasets hinders the progress of open research, and lots of customs administrations have not benefited from the recent progress in data-based risk management. In this paper, we introduce an import declarations dataset to facilitate the collaboration between the domain experts in customs administrations and data science researchers. The dataset contains 54,000 artificially generated trades with 22 key attributes, and it is synthesized with CTGAN while maintaining correlated features. Synthetic data has several advantages. First, releasing the dataset is free from restrictions that do not allow disclosing the original import data. Second, the fabrication step minimizes the possible identity risk which may exist in trade statistics. Lastly, the published data follow a similar distribution to the source data so that it can be used in various downstream tasks. With the provision of data and its generation process, we open baseline codes for fraud detection tasks, as we empirically show that more advanced algorithms can better detect frauds.
    Constructing Balance from Imbalance for Long-tailed Image Recognition. (arXiv:2208.02567v1 [cs.CV])
    Long-tailed image recognition presents massive challenges to deep learning systems since the imbalance between majority (head) classes and minority (tail) classes severely skews the data-driven deep neural networks. Previous methods tackle with data imbalance from the viewpoints of data distribution, feature space, and model design, etc.In this work, instead of directly learning a recognition model, we suggest confronting the bottleneck of head-to-tail bias before classifier learning, from the previously omitted perspective of balancing label space. To alleviate the head-to-tail bias, we propose a concise paradigm by progressively adjusting label space and dividing the head classes and tail classes, dynamically constructing balance from imbalance to facilitate the classification. With flexible data filtering and label space mapping, we can easily embed our approach to most classification models, especially the decoupled training methods. Besides, we find the separability of head-tail classes varies among different features with different inductive biases. Hence, our proposed model also provides a feature evaluation method and paves the way for long-tailed feature learning. Extensive experiments show that our method can boost the performance of state-of-the-arts of different types on widely-used benchmarks. Code is available at https://github.com/silicx/DLSA.
    A New Kind of Adversarial Example. (arXiv:2208.02430v1 [cs.CV])
    Almost all adversarial attacks are formulated to add an imperceptible perturbation to an image in order to fool a model. Here, we consider the opposite which is adversarial examples that can fool a human but not a model. A large enough and perceptible perturbation is added to an image such that a model maintains its original decision, whereas a human will most likely make a mistake if forced to decide (or opt not to decide at all). Existing targeted attacks can be reformulated to synthesize such adversarial examples. Our proposed attack, dubbed NKE, is similar in essence to the fooling images, but is more efficient since it uses gradient descent instead of evolutionary algorithms. It also offers a new and unified perspective into the problem of adversarial vulnerability. Experimental results over MNIST and CIFAR-10 datasets show that our attack is quite efficient in fooling deep neural networks. Code is available at https://github.com/aliborji/NKE.
    Neural network accelerator for quantum control. (arXiv:2208.02645v1 [quant-ph])
    Efficient quantum control is necessary for practical quantum computing implementations with current technologies. Conventional algorithms for determining optimal control parameters are computationally expensive, largely excluding them from use outside of the simulation. Existing hardware solutions structured as lookup tables are imprecise and costly. By designing a machine learning model to approximate the results of traditional tools, a more efficient method can be produced. Such a model can then be synthesized into a hardware accelerator for use in quantum systems. In this study, we demonstrate a machine learning algorithm for predicting optimal pulse parameters. This algorithm is lightweight enough to fit on a low-resource FPGA and perform inference with a latency of 175 ns and pipeline interval of 5 ns with $~>~$0.99 gate fidelity. In the long term, such an accelerator could be used near quantum computing hardware where traditional computers cannot operate, enabling quantum control at a reasonable cost at low latencies without incurring large data bandwidths outside of the cryogenic environment.
    Visual Analysis and Detection of Contrails in Aircraft Engine Simulations. (arXiv:2208.02321v1 [cs.HC])
    Contrails are condensation trails generated from emitted particles by aircraft engines, which perturb Earth's radiation budget. Simulation modeling is used to interpret the formation and development of contrails. These simulations are computationally intensive and rely on high-performance computing solutions, and the contrail structures are not well defined. We propose a visual computing system to assist in defining contrails and their characteristics, as well as in the analysis of parameters for computer-generated aircraft engine simulations. The back-end of our system leverages a contrail-formation criterion and clustering methods to detect contrails' shape and evolution and identify similar simulation runs. The front-end system helps analyze contrails and their parameters across multiple simulation runs. The evaluation with domain experts shows this approach successfully aids in contrail data investigation.
    Disentangled Representation Learning for RF Fingerprint Extraction under Unknown Channel Statistics. (arXiv:2208.02724v1 [eess.SP])
    Deep learning (DL) applied to a device's radio-frequency fingerprint~(RFF) has attracted significant attention in physical-layer authentications due to its extraordinary classification performance. Conventional DL-RFF techniques, trained by adopting maximum likelihood estimation~(MLE), tend to overfit the channel statistics embedded in the training dataset. This restricts their practical applications as it is challenging to collect sufficient training data capturing the characteristics of all possible wireless channel environments. To address this challenge, we propose a DL framework of disentangled representation learning~(DRL) that first learns to factor the input signals into a device-relevant component and a device-irrelevant component via adversarial learning. Then, it synthesizes a set of augmented signals by shuffling these two parts within a given training dataset for training of subsequent RFF extractor. The implicit data augmentation in the proposed framework imposes a regularization on the RFF extractor to avoid the possible overfitting of device-irrelevant channel statistics, without collecting additional data from unknown channels. Experiments validate that the proposed approach, referred to as DR-RFF, outperforms conventional methods in terms of generalizability to unknown complicated propagation environments, e.g., dispersive multipath fading channels, even though all the training data are collected in a simple environment with dominated direct line-of-sight~(LoS) propagation paths.
    Serving and Optimizing Machine Learning Workflows on Heterogeneous Infrastructures. (arXiv:2205.04713v2 [cs.LG] UPDATED)
    With the advent of ubiquitous deployment of smart devices and the Internet of Things, data sources for machine learning inference have increasingly moved to the edge of the network. Existing machine learning inference platforms typically assume a homogeneous infrastructure and do not take into account the more complex and tiered computing infrastructure that includes edge devices, local hubs, edge datacenters, and cloud datacenters. On the other hand, recent AutoML efforts have provided viable solutions for model compression, pruning and quantization for heterogeneous environments; for a machine learning model, now we may easily find or even generate a series of models with different tradeoffs between accuracy and efficiency. We design and implement JellyBean, a system for serving and optimizing machine learning inference workflows on heterogeneous infrastructures. Given service-level objectives (e.g., throughput, accuracy), JellyBean picks the most cost-efficient models that meet the accuracy target and decides how to deploy them across different tiers of infrastructures. Evaluations show that JellyBean reduces the total serving cost of visual question answering by up to 58%, and vehicle tracking from the NVIDIA AI City Challenge by up to 36% compared with state-of-the-art model selection and worker assignment solutions. JellyBean also outperforms prior ML serving systems (e.g., Spark on the cloud) up to 5x in serving costs.
    On-Demand Resource Management for 6G Wireless Networks Using Knowledge-Assisted Dynamic Neural Networks. (arXiv:2208.01785v1 [eess.SY] CROSS LISTED)
    On-demand service provisioning is a critical yet challenging issue in 6G wireless communication networks, since emerging services have significantly diverse requirements and the network resources become increasingly heterogeneous and dynamic. In this paper, we study the on-demand wireless resource orchestration problem with the focus on the computing delay in orchestration decision-making process. Specifically, we take the decision-making delay into the optimization problem. Then, a dynamic neural network (DyNN)-based method is proposed, where the model complexity can be adjusted according to the service requirements. We further build a knowledge base representing the relationship among the service requirements, available computing resources, and the resource allocation performance. By exploiting the knowledge, the width of DyNN can be selected in a timely manner, further improving the performance of orchestration. Simulation results show that the proposed scheme significantly outperforms the traditional static neural network, and also shows sufficient flexibility in on-demand service provisioning.
    Privacy-Preserving Chaotic Extreme Learning Machine with Fully Homomorphic Encryption. (arXiv:2208.02587v1 [cs.LG])
    The Machine Learning and Deep Learning Models require a lot of data for the training process, and in some scenarios, there might be some sensitive data, such as customer information involved, which the organizations might be hesitant to outsource for model building. Some of the privacy-preserving techniques such as Differential Privacy, Homomorphic Encryption, and Secure Multi-Party Computation can be integrated with different Machine Learning and Deep Learning algorithms to provide security to the data as well as the model. In this paper, we propose a Chaotic Extreme Learning Machine and its encrypted form using Fully Homomorphic Encryption where the weights and biases are generated using a logistic map instead of uniform distribution. Our proposed method has performed either better or similar to the Traditional Extreme Learning Machine on most of the datasets.
    FedDRL: Deep Reinforcement Learning-based Adaptive Aggregation for Non-IID Data in Federated Learning. (arXiv:2208.02442v1 [cs.LG])
    The uneven distribution of local data across different edge devices (clients) results in slow model training and accuracy reduction in federated learning. Naive federated learning (FL) strategy and most alternative solutions attempted to achieve more fairness by weighted aggregating deep learning models across clients. This work introduces a novel non-IID type encountered in real-world datasets, namely cluster-skew, in which groups of clients have local data with similar distributions, causing the global model to converge to an over-fitted solution. To deal with non-IID data, particularly the cluster-skewed data, we propose FedDRL, a novel FL model that employs deep reinforcement learning to adaptively determine each client's impact factor (which will be used as the weights in the aggregation process). Extensive experiments on a suite of federated datasets confirm that the proposed FedDRL improves favorably against FedAvg and FedProx methods, e.g., up to 4.05% and 2.17% on average for the CIFAR-100 dataset, respectively.
    Diffusion-Based Voice Conversion with Fast Maximum Likelihood Sampling Scheme. (arXiv:2109.13821v2 [cs.SD] UPDATED)
    Voice conversion is a common speech synthesis task which can be solved in different ways depending on a particular real-world scenario. The most challenging one often referred to as one-shot many-to-many voice conversion consists in copying the target voice from only one reference utterance in the most general case when both source and target speakers do not belong to the training dataset. We present a scalable high-quality solution based on diffusion probabilistic modeling and demonstrate its superior quality compared to state-of-the-art one-shot voice conversion approaches. Moreover, focusing on real-time applications, we investigate general principles which can make diffusion models faster while keeping synthesis quality at a high level. As a result, we develop a novel Stochastic Differential Equations solver suitable for various diffusion model types and generative tasks as shown through empirical studies and justify it by theoretical analysis.
    A Class of Dimension-free Metrics for the Convergence of Empirical Measures. (arXiv:2104.12036v3 [math.PR] UPDATED)
    This paper concerns the convergence of empirical measures in high dimensions. We propose a new class of metrics and show that under such metrics, the convergence is free of the curse of dimensionality (CoD). Such a feature is critical for high-dimensional analysis and stands in contrast to classical metrics ({\it e.g.}, the Wasserstein distance). The proposed metrics originate from the maximum mean discrepancy, which we generalize by proposing specific criteria for selecting test function spaces to guarantee the property of being free of CoD. Therefore, we call this class of metrics the generalized maximum mean discrepancy (GMMD). Examples of the selected test function spaces include the reproducing kernel Hilbert space, Barron space, and flow-induced function spaces. Three applications of the proposed metrics are presented: 1. The convergence of empirical measure in the case of random variables; 2. The convergence of $n$-particle system to the solution to McKean-Vlasov stochastic differential equation; 3. The construction of an $\varepsilon$-Nash equilibrium for a homogeneous $n$-player game by its mean-field limit. As a byproduct, we prove that, given a distribution close to the target distribution measured by GMMD and a certain representation of the target distribution, we can generate a distribution close to the target one in terms of the Wasserstein distance and relative entropy. Overall, we show that the proposed class of metrics is a powerful tool to analyze the convergence of empirical measures in high dimensions without CoD.
    Fully Automated 2D and 3D Convolutional Neural Networks Pipeline for Video Segmentation and Myocardial Infarction Detection in Echocardiography. (arXiv:2103.14734v2 [eess.IV] UPDATED)
    Cardiac imaging known as echocardiography is a non-invasive tool utilized to produce data including images and videos, which cardiologists use to diagnose cardiac abnormalities in general and myocardial infarction (MI) in particular. Echocardiography machines can deliver abundant amounts of data that need to be quickly analyzed by cardiologists to help them make a diagnosis and treat cardiac conditions. However, the acquired data quality varies depending on the acquisition conditions and the patient's responsiveness to the setup instructions. These constraints are challenging to doctors especially when patients are facing MI and their lives are at stake. In this paper, we propose an innovative real-time end-to-end fully automated model based on convolutional neural networks (CNN) to detect MI depending on regional wall motion abnormalities (RWMA) of the left ventricle (LV) from videos produced by echocardiography. Our model is implemented as a pipeline consisting of a 2D CNN that performs data preprocessing by segmenting the LV chamber from the apical four-chamber (A4C) view, followed by a 3D CNN that performs a binary classification to detect if the segmented echocardiography shows signs of MI. We trained both CNNs on a dataset composed of 165 echocardiography videos each acquired from a distinct patient. The 2D CNN achieved an accuracy of 97.18% on data segmentation while the 3D CNN achieved 90.9% of accuracy, 100% of precision and 95% of recall on MI detection. Our results demonstrate that creating a fully automated system for MI detection is feasible and propitious.
    On the Learnability of Physical Concepts: Can a Neural Network Understand What's Real?. (arXiv:2207.12186v2 [cs.LG] UPDATED)
    We revisit the classic signal-to-symbol barrier in light of the remarkable ability of deep neural networks to generate realistic synthetic data. DeepFakes and spoofing highlight the feebleness of the link between physical reality and its abstract representation, whether learned by a digital computer or a biological agent. Starting from a widely applicable definition of abstract concept, we show that standard feed-forward architectures cannot capture but trivial concepts, regardless of the number of weights and the amount of training data, despite being extremely effective classifiers. On the other hand, architectures that incorporate recursion can represent a significantly larger class of concepts, but may still be unable to learn them from a finite dataset. We qualitatively describe the class of concepts that can be "understood" by modern architectures trained with variants of stochastic gradient descent, using a (free energy) Lagrangian to measure information complexity. Even if a concept has been understood, however, a network has no means of communicating its understanding to an external agent, except through continuous interaction and validation. We then characterize physical objects as abstract concepts and use the previous analysis to show that physical objects can be encoded by finite architectures. However, to understand physical concepts, sensors must provide persistently exciting observations, for which the ability to control the data acquisition process is essential (active perception). The importance of control depends on the modality, benefiting visual more than acoustic or chemical perception. Finally, we conclude that binding physical entities to digital identities is possible in finite time with finite resources, solving in principle the signal-to-symbol barrier problem, but we highlight the need for continuous validation.
    A Robust graph attention network with dynamic adjusted Graph. (arXiv:2009.13038v3 [cs.LG] UPDATED)
    Graph Attention Networks(GATs) are useful deep learning models to deal with the graph data. However, recent works show that the classical GAT is vulnerable to adversarial attacks. It degrades dramatically with slight perturbations. Therefore, how to enhance the robustness of GAT is a critical problem. Robust GAT(RoGAT) is proposed in this paper to improve the robustness of GAT based on the revision of the attention mechanism. Different from the original GAT, which uses the attention mechanism for different edges but is still sensitive to the perturbation, RoGAT adds an extra dynamic attention score progressively and improves the robustness. Firstly, RoGAT revises the edges weight based on the smoothness assumption which is quite common for ordinary graphs. Secondly, RoGAT further revises the features to suppress features' noise. Then, an extra attention score is generated by the dynamic edge's weight and can be used to reduce the impact of adversarial attacks. Different experiments against targeted and untargeted attacks on citation data on citation data demonstrate that RoGAT outperforms most of the recent defensive methods.
    Differentiable Predictive Control with Safety Guarantees: A Control Barrier Function Approach. (arXiv:2208.02319v1 [eess.SY])
    We develop a novel form of differentiable predictive control (DPC) with safety and robustness guarantees based on control barrier functions. DPC is an unsupervised learning-based method for obtaining approximate solutions to explicit model predictive control (MPC) problems. In DPC, the predictive control policy parametrized by a neural network is optimized offline via direct policy gradients obtained by automatic differentiation of the MPC problem. The proposed approach exploits a new form of sampled-data barrier function to enforce offline and online safety requirements in DPC settings while only interrupting the neural network-based controller near the boundary of the safe set. The effectiveness of the proposed approach is demonstrated in simulation.
    Reinforcement Learning for Joint V2I Network Selection and Autonomous Driving Policies. (arXiv:2208.02249v1 [cs.LG])
    Vehicle-to-Infrastructure (V2I) communication is becoming critical for the enhanced reliability of autonomous vehicles (AVs). However, the uncertainties in the road-traffic and AVs' wireless connections can severely impair timely decision-making. It is thus critical to simultaneously optimize the AVs' network selection and driving policies in order to minimize road collisions while maximizing the communication data rates. In this paper, we develop a reinforcement learning (RL) framework to characterize efficient network selection and autonomous driving policies in a multi-band vehicular network (VNet) operating on conventional sub-6GHz spectrum and Terahertz (THz) frequencies. The proposed framework is designed to (i) maximize the traffic flow and minimize collisions by controlling the vehicle's motion dynamics (i.e., speed and acceleration) from autonomous driving perspective, and (ii) maximize the data rates and minimize handoffs by jointly controlling the vehicle's motion dynamics and network selection from telecommunication perspective. We cast this problem as a Markov Decision Process (MDP) and develop a deep Q-learning based solution to optimize the actions such as acceleration, deceleration, lane-changes, and AV-base station assignments for a given AV's state. The AV's state is defined based on the velocities and communication channel states of AVs. Numerical results demonstrate interesting insights related to the inter-dependency of vehicle's motion dynamics, handoffs, and the communication data rate. The proposed policies enable AVs to adopt safe driving behaviors with improved connectivity.
    Graph Neural Networks Extract High-Resolution Cultivated Land Maps from Sentinel-2 Image Series. (arXiv:2208.02349v1 [cs.CV])
    Maintaining farm sustainability through optimizing the agricultural management practices helps build more planet-friendly environment. The emerging satellite missions can acquire multi- and hyperspectral imagery which captures more detailed spectral information concerning the scanned area, hence allows us to benefit from subtle spectral features during the analysis process in agricultural applications. We introduce an approach for extracting 2.5 m cultivated land maps from 10 m Sentinel-2 multispectral image series which benefits from a compact graph convolutional neural network. The experiments indicate that our models not only outperform classical and deep machine learning techniques through delivering higher-quality segmentation maps, but also dramatically reduce the memory footprint when compared to U-Nets (almost 8k trainable parameters of our models, with up to 31M parameters of U-Nets). Such memory frugality is pivotal in the missions which allow us to uplink a model to the AI-powered satellite once it is in orbit, as sending large nets is impossible due to the time constraints.
    Risk-sensitive Reinforcement Learning via Distortion Risk Measures. (arXiv:2107.04422v5 [cs.LG] UPDATED)
    We address the problem of control in a risk-sensitive reinforcement learning (RL) context via distortion risk measures (DRM). We propose policy gradient algorithms, which maximize the DRM of the cumulative reward in an episodic Markov decision process in on-policy as well as off-policy RL settings. We employ two different approaches in devising the policy gradient algorithms. In the first approach, we derive a variant of the policy gradient theorem that caters to the DRM objective, and use this theorem in conjunction with a likelihood ratio-based gradient estimation scheme. In the second approach, we estimate the DRM from the empirical distribution of cumulative rewards, and use this estimation scheme along with a smoothed functional-based gradient estimation scheme. For policy gradient algorithms using either approach, we derive non-asymptotic bounds that establish the convergence to an approximate stationary point of the DRM objective.
    Hydra: A System for Large Multi-Model Deep Learning. (arXiv:2110.08633v7 [cs.DC] UPDATED)
    Scaling up model depth and size is now a common approach to raise accuracy in many deep learning (DL) applications, as evidenced by the widespread success of multi-billion or even trillion parameter models in natural language processing (NLP) research. Despite success in DL research and at major technology companies, broader practical adoption of such large models among domain scientists and businesses is still bottlenecked by GPU memory limits, high training costs, and low GPU availability, even on public clouds. Model selection needs further compound these resource challenges: users often need to compare dozens of models with different hyper-parameters or neural architectures to suit their specific task and dataset. In this paper, we present Hydra, a system designed to tackle such challenges by enabling out-of-the-box scaling for multi-large-model DL workloads on even commodity GPUs in a resource-efficient manner. Hydra is the first approach to holistically optimize the execution of multi-model workloads for large DL models. We do this by adapting prior "model-parallel" execution schemes to work with scalable parameter offloading across the memory hierarchy and further hybridizing this approach with task-parallel job scheduling techniques. Hydra decouples scalability of model parameters from parallelism of execution, thus enabling DL users to train even a 6-billion parameter model on a single commodity GPU. It also fully exploits the speedup potential of task parallelism in multi-GPU setups, yielding near-linear strong scaling and making rigorous model selection perhaps more practical for such models. We evaluate end-to-end performance by fine-tuning GPT-2 for language modeling. We find that Hydra offers between 50% and 100% higher training throughput than even the best settings of state-of-the-art industrial frameworks such as DeepSpeed and GPipe for multi-large-model training.
    Topological Signal Processing using the Weighted Ordinal Partition Network. (arXiv:2205.08349v2 [stat.ML] UPDATED)
    One of the most important problems arising in time series analysis is that of bifurcation, or change point detection. That is, given a collection of time series over a varying parameter, when has the structure of the underlying dynamical system changed? For this task, we turn to the field of topological data analysis (TDA), which encodes information about the shape and structure of data. The idea of utilizing tools from TDA for signal processing tasks, known as topological signal processing (TSP), has gained much attention in recent years, largely through a standard pipeline that computes the persistent homology of the point cloud generated by the Takens' embedding. However, this procedure is limited by computation time since the simplicial complex generated in this case is large, but also has a great deal of redundant data. For this reason, we turn to a more recent method for encoding the structure of the attractor, which constructs an ordinal partition network (OPN) representing information about when the dynamical system has passed between certain regions of state space. The result is a weighted graph whose structure encodes information about the underlying attractor. Our previous work began to find ways to package the information of the OPN in a manner that is amenable to TDA; however, that work only used the network structure and did nothing to encode the additional weighting information. In this paper, we take the next step: building a pipeline to analyze the weighted OPN with TDA and showing that this framework provides more resilience to noise or perturbations in the system and improves the accuracy of the dynamic state detection.
    QC-ODKLA: Quantized and Communication-Censored Online Decentralized Kernel Learning via Linearized ADMM. (arXiv:2208.02777v1 [cs.LG])
    This paper focuses on online kernel learning over a decentralized network. Each agent in the network receives continuous streaming data locally and works collaboratively to learn a nonlinear prediction function that is globally optimal in the reproducing kernel Hilbert space with respect to the total instantaneous costs of all agents. In order to circumvent the curse of dimensionality issue in traditional online kernel learning, we utilize random feature (RF) mapping to convert the non-parametric kernel learning problem into a fixed-length parametric one in the RF space. We then propose a novel learning framework named Online Decentralized Kernel learning via Linearized ADMM (ODKLA) to efficiently solve the online decentralized kernel learning problem. To further improve the communication efficiency, we add the quantization and censoring strategies in the communication stage and develop the Quantized and Communication-censored ODKLA (QC-ODKLA) algorithm. We theoretically prove that both ODKLA and QC-ODKLA can achieve the optimal sublinear regret $\mathcal{O}(\sqrt{T})$ over $T$ time slots. Through numerical experiments, we evaluate the learning effectiveness, communication, and computation efficiencies of the proposed methods.
    NoiLIn: Improving Adversarial Training and Correcting Stereotype of Noisy Labels. (arXiv:2105.14676v2 [cs.LG] UPDATED)
    Adversarial training (AT) formulated as the minimax optimization problem can effectively enhance the model's robustness against adversarial attacks. The existing AT methods mainly focused on manipulating the inner maximization for generating quality adversarial variants or manipulating the outer minimization for designing effective learning objectives. However, empirical results of AT always exhibit the robustness at odds with accuracy and the existence of the cross-over mixture problem, which motivates us to study some label randomness for benefiting the AT. First, we thoroughly investigate noisy labels (NLs) injection into AT's inner maximization and outer minimization, respectively and obtain the observations on when NL injection benefits AT. Second, based on the observations, we propose a simple but effective method -- NoiLIn that randomly injects NLs into training data at each training epoch and dynamically increases the NL injection rate once robust overfitting occurs. Empirically, NoiLIn can significantly mitigate the AT's undesirable issue of robust overfitting and even further improve the generalization of the state-of-the-art AT methods. Philosophically, NoiLIn sheds light on a new perspective of learning with NLs: NLs should not always be deemed detrimental, and even in the absence of NLs in the training set, we may consider injecting them deliberately. Codes are available in https://github.com/zjfheart/NoiLIn.  ( 3 min )
    Membership Inference Attacks Against Self-supervised Speech Models. (arXiv:2111.05113v3 [cs.CR] UPDATED)
    Recently, adapting the idea of self-supervised learning (SSL) on continuous speech has started gaining attention. SSL models pre-trained on a huge amount of unlabeled audio can generate general-purpose representations that benefit a wide variety of speech processing tasks. Despite their ubiquitous deployment, however, the potential privacy risks of these models have not been well investigated. In this paper, we present the first privacy analysis on several SSL speech models using Membership Inference Attacks (MIA) under black-box access. The experiment results show that these pre-trained models are vulnerable to MIA and prone to membership information leakage with high Area Under the Curve (AUC) in both utterance-level and speaker-level. Furthermore, we also conduct several ablation studies to understand the factors that contribute to the success of MIA.  ( 2 min )
    Improving Meta-Learning Generalization with Activation-Based Early-Stopping. (arXiv:2208.02377v1 [cs.LG])
    Meta-Learning algorithms for few-shot learning aim to train neural networks capable of generalizing to novel tasks using only a few examples. Early-stopping is critical for performance, halting model training when it reaches optimal generalization to the new task distribution. Early-stopping mechanisms in Meta-Learning typically rely on measuring the model performance on labeled examples from a meta-validation set drawn from the training (source) dataset. This is problematic in few-shot transfer learning settings, where the meta-test set comes from a different target dataset (OOD) and can potentially have a large distributional shift with the meta-validation set. In this work, we propose Activation Based Early-stopping (ABE), an alternative to using validation-based early-stopping for meta-learning. Specifically, we analyze the evolution, during meta-training, of the neural activations at each hidden layer, on a small set of unlabelled support examples from a single task of the target tasks distribution, as this constitutes a minimal and justifiably accessible information from the target problem. Our experiments show that simple, label agnostic statistics on the activations offer an effective way to estimate how the target generalization evolves over time. At each hidden layer, we characterize the activation distributions, from their first and second order moments, then further summarized along the feature dimensions, resulting in a compact yet intuitive characterization in a four-dimensional space. Detecting when, throughout training time, and at which layer, the target activation trajectory diverges from the activation trajectory of the source data, allows us to perform early-stopping and improve generalization in a large array of few-shot transfer learning settings, across different algorithms, source and target datasets.  ( 3 min )
    Degenerate Gaussian factors for probabilistic inference. (arXiv:2104.15010v2 [cs.LG] UPDATED)
    In this paper, we propose a parametrised factor that enables inference on Gaussian networks where linear dependencies exist among the random variables. Our factor representation is effectively a generalisation of traditional Gaussian parametrisations where the positive-definite constraint of the covariance matrix has been relaxed. For this purpose, we derive various statistical operations and results (such as marginalisation, multiplication and affine transformations of random variables) that extend the capabilities of Gaussian factors to these degenerate settings. By using this principled factor definition, degeneracies can be accommodated accurately and automatically at little additional computational cost. As illustration, we apply our methodology to a representative example involving recursive state estimation of cooperative mobile robots.  ( 2 min )
    A Theoretical Framework for Inference and Learning in Predictive Coding Networks. (arXiv:2207.12316v2 [cs.NE] UPDATED)
    Predictive coding (PC) is an influential theory in computational neuroscience, which argues that the cortex forms unsupervised world models by implementing a hierarchical process of prediction error minimization. PC networks (PCNs) are trained in two phases. First, neural activities are updated to optimize the network's response to external stimuli. Second, synaptic weights are updated to consolidate this change in activity -- an algorithm called \emph{prospective configuration}. While previous work has shown how in various limits, PCNs can be found to approximate backpropagation (BP), recent work has demonstrated that PCNs operating in this standard regime, which does not approximate BP, nevertheless obtain competitive training and generalization performance to BP-trained networks while outperforming them on tasks such as online, few-shot, and continual learning, where brains are known to excel. Despite this promising empirical performance, little is understood theoretically about the properties and dynamics of PCNs in this regime. In this paper, we provide a comprehensive theoretical analysis of the properties of PCNs trained with prospective configuration. We first derive analytical results concerning the inference equilibrium for PCNs and a previously unknown close connection relationship to target propagation (TP). Secondly, we provide a theoretical analysis of learning in PCNs as a variant of generalized expectation-maximization and use that to prove the convergence of PCNs to critical points of the BP loss function, thus showing that deep PCNs can, in theory, achieve the same generalization performance as BP, while maintaining their unique advantages.  ( 3 min )
    Neural-network preconditioners for solving the Dirac equation in lattice gauge theory. (arXiv:2208.02728v1 [hep-lat])
    This work develops neural-network--based preconditioners to accelerate solution of the Wilson-Dirac normal equation in lattice quantum field theories. The approach is implemented for the two-flavor lattice Schwinger model near the critical point. In this system, neural-network preconditioners are found to accelerate the convergence of the conjugate gradient solver compared with the solution of unpreconditioned systems or those preconditioned with conventional approaches based on even-odd or incomplete Cholesky decompositions, as measured by reductions in the number of iterations and/or complex operations required for convergence. It is also shown that a preconditioner trained on ensembles with small lattice volumes can be used to construct preconditioners for ensembles with many times larger lattice volumes, with minimal degradation of performance. This volume-transferring technique amortizes the training cost and presents a pathway towards scaling such preconditioners to lattice field theory calculations with larger lattice volumes and in four dimensions.  ( 2 min )
    A Lightweight, Efficient and Explainable-by-Design Convolutional Neural Network for Internet Traffic Classification. (arXiv:2202.05535v2 [cs.LG] UPDATED)
    Traffic classification, i.e. the identification of the type of applications flowing in a network, is a strategic task for numerous activities (e.g., intrusion detection, routing). This task faces some critical challenges that current deep learning approaches do not address. The design of current approaches do not take into consideration the fact that networking hardware (e.g., routers) often runs with limited computational resources. Further, they do not meet the need for faithful explainability highlighted by regulatory bodies. Finally, these traffic classifiers are evaluated on small datasets which fail to reflect the diversity of applications in real-world settings. Therefore, this paper introduces a Lightweight, Efficient and eXplainable-by-design convolutional neural network (LEXNet) for Internet traffic classification, which relies on a new residual block (for lightweight and efficiency purposes) and prototype layer (for explainability). Based on a commercial-grade dataset, our evaluation shows that LEXNet succeeds to maintain the same accuracy as the best performing state-of-the-art neural network, while providing the additional features previously mentioned. Moreover, we illustrate the explainability feature of our approach, which stems from the communication of detected application prototypes to the end-user, and we highlight the faithfulness of LEXNet explanations through a comparison with post hoc methods.  ( 3 min )
    A Benchmark and Empirical Analysis for Replay Strategies in Continual Learning. (arXiv:2208.02660v1 [cs.LG])
    With the capacity of continual learning, humans can continuously acquire knowledge throughout their lifespan. However, computational systems are not, in general, capable of learning tasks sequentially. This long-standing challenge for deep neural networks (DNNs) is called catastrophic forgetting. Multiple solutions have been proposed to overcome this limitation. This paper makes an in-depth evaluation of the memory replay methods, exploring the efficiency, performance, and scalability of various sampling strategies when selecting replay data. All experiments are conducted on multiple datasets under various domains. Finally, a practical solution for selecting replay methods for various data distributions is provided.  ( 2 min )
    Neural Network Optimal Feedback Control with Guaranteed Local Stability. (arXiv:2205.00394v2 [math.OC] UPDATED)
    Recent research shows that supervised learning can be an effective tool for designing optimal feedback controllers for high-dimensional nonlinear dynamic systems. But the behavior of neural network controllers is still not well understood. In particular, some neural networks with high test accuracy can fail to even locally stabilize the dynamic system. To address this challenge we propose several novel neural network architectures, which we show guarantee local asymptotic stability while retaining the approximation capacity to learn the optimal feedback policy semi-globally. The proposed architectures are compared against standard neural network feedback controllers through numerical simulations of two high-dimensional nonlinear optimal control problems: stabilization of an unstable Burgers-type partial differential equation, and altitude and course tracking for an unmanned aerial vehicle. The simulations demonstrate that standard neural networks can fail to stabilize the dynamics even when trained well, while the proposed architectures are always at least locally stabilizing. Moreover, the proposed controllers are found to be nearly optimal in testing.  ( 2 min )
    Open-world Contrastive Learning. (arXiv:2208.02764v1 [cs.LG])
    Recent advance in contrastive learning has shown remarkable performance. However, the vast majority of approaches are limited to the closed-world setting. In this paper, we enrich the landscape of representation learning by tapping into an open-world setting, where unlabeled samples from novel classes can naturally emerge in the wild. To bridge the gap, we introduce a new learning framework, open-world contrastive learning (OpenCon). OpenCon tackles the challenges of learning compact representations for both known and novel classes, and facilitates novelty discovery along the way. We demonstrate the effectiveness of OpenCon on challenging benchmark datasets and establish competitive performance. On the ImageNet dataset, OpenCon significantly outperforms the current best method by 11.9% and 7.4% on novel and overall classification accuracy, respectively. We hope that our work will open up new doors for future work to tackle this important problem.  ( 2 min )
    Explaining Classifiers Trained on Raw Hierarchical Multiple-Instance Data. (arXiv:2208.02694v1 [stat.ML])
    Learning from raw data input, thus limiting the need for feature engineering, is a component of many successful applications of machine learning methods in various domains. While many problems naturally translate into a vector representation directly usable in standard classifiers, a number of data sources have the natural form of structured data interchange formats (e.g., security logs in JSON/XML format). Existing methods, such as in Hierarchical Multiple Instance Learning (HMIL), allow learning from such data in their raw form. However, the explanation of the classifiers trained on raw structured data remains largely unexplored. By treating these models as sub-set selections problems, we demonstrate how interpretable explanations, with favourable properties, can be generated using computationally efficient algorithms. We compare to an explanation technique adopted from graph neural networks showing an order of magnitude speed-up and higher-quality explanations.  ( 2 min )
    Cluster-to-adapt: Few Shot Domain Adaptation for Semantic Segmentation across Disjoint Labels. (arXiv:2208.02804v1 [cs.CV])
    Domain adaptation for semantic segmentation across datasets consisting of the same categories has seen several recent successes. However, a more general scenario is when the source and target datasets correspond to non-overlapping label spaces. For example, categories in segmentation datasets change vastly depending on the type of environment or application, yet share many valuable semantic relations. Existing approaches based on feature alignment or discrepancy minimization do not take such category shift into account. In this work, we present Cluster-to-Adapt (C2A), a computationally efficient clustering-based approach for domain adaptation across segmentation datasets with completely different, but possibly related categories. We show that such a clustering objective enforced in a transformed feature space serves to automatically select categories across source and target domains that can be aligned for improving the target performance, while preventing negative transfer for unrelated categories. We demonstrate the effectiveness of our approach through experiments on the challenging problem of outdoor to indoor adaptation for semantic segmentation in few-shot as well as zero-shot settings, with consistent improvements in performance over existing approaches and baselines in all cases.  ( 2 min )
    DDOS: A MOS Prediction Framework utilizing Domain Adaptive Pre-training and Distribution of Opinion Scores. (arXiv:2204.03219v2 [eess.AS] UPDATED)
    Mean opinion score (MOS) is a typical subjective evaluation metric for speech synthesis systems. Since collecting MOS is time-consuming, it would be desirable if there are accurate MOS prediction models for automatic evaluation. In this work, we propose DDOS, a novel MOS prediction model. DDOS utilizes domain adaptive pre-training to further pre-train self-supervised learning models on synthetic speech. And a proposed module is added to model the opinion score distribution of each utterance. With the proposed components, DDOS outperforms previous works on BVCC dataset. And the zero shot transfer result on BC2019 dataset is significantly improved. DDOS also wins second place in Interspeech 2022 VoiceMOS challenge in terms of system-level score.  ( 2 min )
    Development and Validation of ML-DQA -- a Machine Learning Data Quality Assurance Framework for Healthcare. (arXiv:2208.02670v1 [stat.ML])
    The approaches by which the machine learning and clinical research communities utilize real world data (RWD), including data captured in the electronic health record (EHR), vary dramatically. While clinical researchers cautiously use RWD for clinical investigations, ML for healthcare teams consume public datasets with minimal scrutiny to develop new algorithms. This study bridges this gap by developing and validating ML-DQA, a data quality assurance framework grounded in RWD best practices. The ML-DQA framework is applied to five ML projects across two geographies, different medical conditions, and different cohorts. A total of 2,999 quality checks and 24 quality reports were generated on RWD gathered on 247,536 patients across the five projects. Five generalizable practices emerge: all projects used a similar method to group redundant data element representations; all projects used automated utilities to build diagnosis and medication data elements; all projects used a common library of rules-based transformations; all projects used a unified approach to assign data quality checks to data elements; and all projects used a similar approach to clinical adjudication. An average of 5.8 individuals, including clinicians, data scientists, and trainees, were involved in implementing ML-DQA for each project and an average of 23.4 data elements per project were either transformed or removed in response to ML-DQA. This study demonstrates the importance role of ML-DQA in healthcare projects and provides teams a framework to conduct these essential activities.  ( 3 min )
    A similarity-based Bayesian mixture-of-experts model. (arXiv:2012.02130v4 [stat.ML] UPDATED)
    We present a new nonparametric mixture-of-experts model for multivariate regression problems, inspired by the probabilistic k-nearest neighbors algorithm. Using a conditionally specified model, predictions for out-of-sample inputs are based on similarities to each observed data point, yielding predictive distributions represented by Gaussian mixtures. Posterior inference is performed on the parameters of the mixture components as well as the distance metric using a mean-field variational Bayes algorithm accompanied with a stochastic gradient-based optimization procedure. The proposed method is especially advantageous in settings where inputs are of relatively high dimension in comparison to the data size, where input-output relationships are complex, and where predictive distributions may be skewed or multimodal. Computational studies on five datasets, of which two are synthetically generated, illustrate clear advantages of our mixture-of-experts method for high-dimensional inputs, outperforming competitor models both in terms of validation metrics and visual inspection.  ( 2 min )
    HiCu: Leveraging Hierarchy for Curriculum Learning in Automated ICD Coding. (arXiv:2208.02301v1 [cs.LG])
    There are several opportunities for automation in healthcare that can improve clinician throughput. One such example is assistive tools to document diagnosis codes when clinicians write notes. We study the automation of medical code prediction using curriculum learning, which is a training strategy for machine learning models that gradually increases the hardness of the learning tasks from easy to difficult. One of the challenges in curriculum learning is the design of curricula -- i.e., in the sequential design of tasks that gradually increase in difficulty. We propose Hierarchical Curriculum Learning (HiCu), an algorithm that uses graph structure in the space of outputs to design curricula for multi-label classification. We create curricula for multi-label classification models that predict ICD diagnosis and procedure codes from natural language descriptions of patients. By leveraging the hierarchy of ICD codes, which groups diagnosis codes based on various organ systems in the human body, we find that our proposed curricula improve the generalization of neural network-based predictive models across recurrent, convolutional, and transformer-based architectures. Our code is available at https://github.com/wren93/HiCu-ICD.  ( 2 min )
    A new class of generative classifiers based on staged tree models. (arXiv:2012.13798v2 [cs.AI] UPDATED)
    Generative models for classification use the joint probability distribution of the class variable and the features to construct a decision rule. Among generative models, Bayesian networks and naive Bayes classifiers are the most commonly used and provide a clear graphical representation of the relationship among all variables. However, these have the disadvantage of highly restricting the type of relationships that could exist, by not allowing for context-specific independences. Here we introduce a new class of generative classifiers, called staged tree classifiers, which formally account for context-specific independence. They are constructed by a partitioning of the vertices of an event tree from which conditional independence can be formally read. The naive staged tree classifier is also defined, which extends the classic naive Bayes classifier whilst retaining the same complexity. An extensive simulation study shows that the classification accuracy of staged tree classifiers is competitive with that of state-of-the-art classifiers and an example showcases their use in practice.  ( 2 min )
    Communication Beyond Transmitting Bits: Semantics-Guided Source and Channel Coding. (arXiv:2208.02481v1 [cs.IT])
    Classical communication paradigms focus on accurately transmitting bits over a noisy channel, and Shannon theory provides a fundamental theoretical limit on the rate of reliable communications. In this approach, bits are treated equally, and the communication system is oblivious to what meaning these bits convey or how they would be used. Future communications towards intelligence and conciseness will predictably play a dominant role, and the proliferation of connected intelligent agents requires a radical rethinking of coded transmission paradigm to support the new communication morphology on the horizon. The recent concept of "semantic communications" offers a promising research direction. Injecting semantic guidance into the coded transmission design to achieve semantics-aware communications shows great potential for further breakthrough in effectiveness and reliability. This article sheds light on semantics-guided source and channel coding as a transmission paradigm of semantic communications, which exploits both data semantics diversity and wireless channel diversity together to boost the whole system performance. We present the general system architecture and key techniques, and indicate some open issues on this topic.  ( 2 min )
    PyDTS: A Python Package for Discrete-Time Survival (Regularized) Regression with Competing Risks. (arXiv:2204.05731v3 [stat.ML] UPDATED)
    Time-to-event analysis (survival analysis) is used when the outcome or the response of interest is the time until a pre-specified event occurs. Time-to-event data are sometimes discrete either because time itself is discrete or due to grouping of failure times into intervals or rounding off measurements. In addition, the failure of an individual could be one of several distinct failure types; known as competing risks (events). This work focuses on discrete-time regression with competing events. We emphasize the main difference between the continuous and discrete settings with competing events, develop a faster estimation algorithm, and present PyDTS, an open source Python package which implements our procedure and other tools for discrete-time-survival analysis with competing risks.  ( 2 min )
    Learning Interaction Variables and Kernels from Observations of Agent-Based Systems. (arXiv:2208.02758v1 [cs.LG])
    Dynamical systems across many disciplines are modeled as interacting particles or agents, with interaction rules that depend on a very small number of variables (e.g. pairwise distances, pairwise differences of phases, etc...), functions of the state of pairs of agents. Yet, these interaction rules can generate self-organized dynamics, with complex emergent behaviors (clustering, flocking, swarming, etc.). We propose a learning technique that, given observations of states and velocities along trajectories of the agents, yields both the variables upon which the interaction kernel depends and the interaction kernel itself, in a nonparametric fashion. This yields an effective dimension reduction which avoids the curse of dimensionality from the high-dimensional observation data (states and velocities of all the agents). We demonstrate the learning capability of our method to a variety of first-order interacting systems.  ( 2 min )
    P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting. (arXiv:2208.02812v1 [cs.CV])
    Nowadays, pre-training big models on large-scale datasets has become a crucial topic in deep learning. The pre-trained models with high representation ability and transferability achieve a great success and dominate many downstream tasks in natural language processing and 2D vision. However, it is non-trivial to promote such a pretraining-tuning paradigm to the 3D vision, given the limited training data that are relatively inconvenient to collect. In this paper, we provide a new perspective of leveraging pre-trained 2D knowledge in 3D domain to tackle this problem, tuning pre-trained image models with the novel Point-to-Pixel prompting for point cloud analysis at a minor parameter cost. Following the principle of prompting engineering, we transform point clouds into colorful images with geometry-preserved projection and geometry-aware coloring to adapt to pre-trained image models, whose weights are kept frozen during the end-to-end optimization of point cloud analysis tasks. We conduct extensive experiments to demonstrate that cooperating with our proposed Point-to-Pixel Prompting, better pre-trained image model will lead to consistently better performance in 3D vision. Enjoying prosperous development from image pre-training field, our method attains 89.3% accuracy on the hardest setting of ScanObjectNN, surpassing conventional point cloud models with much fewer trainable parameters. Our framework also exhibits very competitive performance on ModelNet classification and ShapeNet Part Segmentation. Code is available at https://github.com/wangzy22/P2P
    Max-Affine Spline Insights Into Deep Network Pruning. (arXiv:2101.02338v3 [cs.LG] UPDATED)
    In this paper, we study the importance of pruning in Deep Networks (DNs) and the yin & yang relationship between (1) pruning highly overparametrized DNs that have been trained from random initialization and (2) training small DNs that have been "cleverly" initialized. As in most cases practitioners can only resort to random initialization, there is a strong need to develop a grounded understanding of DN pruning. Current literature remains largely empirical, lacking a theoretical understanding of how pruning affects DNs' decision boundary, how to interpret pruning, and how to design corresponding principled pruning techniques. To tackle those questions, we propose to employ recent advances in the theoretical analysis of Continuous Piecewise Affine (CPA) DNs. From this perspective, we will be able to detect the early-bird (EB) ticket phenomenon, provide interpretability into current pruning techniques, and develop a principled pruning strategy. In each step of our study, we conduct extensive experiments supporting our claims and results; while our main goal is to enhance the current understanding towards DN pruning instead of developing a new pruning method, our spline pruning criteria in terms of layerwise and global pruning is on par with or even outperforms state-of-the-art pruning methods.
    Transferable Multi-Agent Reinforcement Learning with Dynamic Participating Agents. (arXiv:2208.02424v1 [cs.LG])
    We study multi-agent reinforcement learning (MARL) with centralized training and decentralized execution. During the training, new agents may join, and existing agents may unexpectedly leave the training. In such situations, a standard deep MARL model must be trained again from scratch, which is very time-consuming. To tackle this problem, we propose a special network architecture with a few-shot learning algorithm that allows the number of agents to vary during centralized training. In particular, when a new agent joins the centralized training, our few-shot learning algorithm trains its policy network and value network using a small number of samples; when an agent leaves the training, the training process of the remaining agents is not affected. Our experiments show that using the proposed network architecture and algorithm, model adaptation when new agents join can be 100+ times faster than the baseline. Our work is applicable to any setting, including cooperative, competitive, and mixed.  ( 2 min )
    Tokyo Kion-On: Query-Based Generative Sonification of Atmospheric Data. (arXiv:2208.02494v1 [cs.SD])
    Amid growing environmental concerns, interactive displays of data constitute an important tool for exploring and understanding the impact of climate change on the planet's ecosystemic integrity. This paper presents Tokyo kion-on, a query-based sonification model of Tokyo's air temperature from 1876 to 2021. The system uses a recurrent neural network architecture known as LSTM with attention trained on a small dataset of Japanese melodies and conditioned upon said atmospheric data. After describing the model's implementation, a brief comparative illustration of the musical results is presented, along with a discussion on how the exposed hyper-parameters can promote active and non-linear exploration of the data.  ( 2 min )
    AACC: Asymmetric Actor-Critic in Contextual Reinforcement Learning. (arXiv:2208.02376v1 [cs.LG])
    Reinforcement Learning (RL) techniques have drawn great attention in many challenging tasks, but their performance deteriorates dramatically when applied to real-world problems. Various methods, such as domain randomization, have been proposed to deal with such situations by training agents under different environmental setups, and therefore they can be generalized to different environments during deployment. However, they usually do not incorporate the underlying environmental factor information that the agents interact with properly and thus can be overly conservative when facing changes in the surroundings. In this paper, we first formalize the task of adapting to changing environmental dynamics in RL as a generalization problem using Contextual Markov Decision Processes (CMDPs). We then propose the Asymmetric Actor-Critic in Contextual RL (AACC) as an end-to-end actor-critic method to deal with such generalization tasks. We demonstrate the essential improvements in the performance of AACC over existing baselines experimentally in a range of simulated environments.  ( 2 min )
    Node Copying: A Random Graph Model for Effective Graph Sampling. (arXiv:2208.02435v1 [stat.ML])
    There has been an increased interest in applying machine learning techniques on relational structured-data based on an observed graph. Often, this graph is not fully representative of the true relationship amongst nodes. In these settings, building a generative model conditioned on the observed graph allows to take the graph uncertainty into account. Various existing techniques either rely on restrictive assumptions, fail to preserve topological properties within the samples or are prohibitively expensive for larger graphs. In this work, we introduce the node copying model for constructing a distribution over graphs. Sampling of a random graph is carried out by replacing each node's neighbors by those of a randomly sampled similar node. The sampled graphs preserve key characteristics of the graph structure without explicitly targeting them. Additionally, sampling from this model is extremely simple and scales linearly with the nodes. We show the usefulness of the copying model in three tasks. First, in node classification, a Bayesian formulation based on node copying achieves higher accuracy in sparse data settings. Second, we employ our proposed model to mitigate the effect of adversarial attacks on the graph topology. Last, incorporation of the model in a recommendation system setting improves recall over state-of-the-art methods.  ( 3 min )
    Conformal Risk Control. (arXiv:2208.02814v1 [stat.ME])
    We extend conformal prediction to control the expected value of any monotone loss function. The algorithm generalizes split conformal prediction together with its coverage guarantee. Like conformal prediction, the conformal risk control procedure is tight up to an $\mathcal{O}(1/n)$ factor. Worked examples from computer vision and natural language processing demonstrate the usage of our algorithm to bound the false negative rate, graph distance, and token-level F1-score.  ( 2 min )
  • Open

    Conformal Risk Control. (arXiv:2208.02814v1 [stat.ME])
    We extend conformal prediction to control the expected value of any monotone loss function. The algorithm generalizes split conformal prediction together with its coverage guarantee. Like conformal prediction, the conformal risk control procedure is tight up to an $\mathcal{O}(1/n)$ factor. Worked examples from computer vision and natural language processing demonstrate the usage of our algorithm to bound the false negative rate, graph distance, and token-level F1-score.  ( 2 min )
    Feature selection with gradient descent on two-layer networks in low-rotation regimes. (arXiv:2208.02789v1 [cs.LG])
    This work establishes low test error of gradient flow (GF) and stochastic gradient descent (SGD) on two-layer ReLU networks with standard initialization, in three regimes where key sets of weights rotate little (either naturally due to GF and SGD, or due to an artificial constraint), and making use of margins as the core analytic technique. The first regime is near initialization, specifically until the weights have moved by $\mathcal{O}(\sqrt m)$, where $m$ denotes the network width, which is in sharp contrast to the $\mathcal{O}(1)$ weight motion allowed by the Neural Tangent Kernel (NTK); here it is shown that GF and SGD only need a network width and number of samples inversely proportional to the NTK margin, and moreover that GF attains at least the NTK margin itself, which suffices to establish escape from bad KKT points of the margin objective, whereas prior work could only establish nondecreasing but arbitrarily small margins. The second regime is the Neural Collapse (NC) setting, where data lies in extremely-well-separated groups, and the sample complexity scales with the number of groups; here the contribution over prior work is an analysis of the entire GF trajectory from initialization. Lastly, if the inner layer weights are constrained to change in norm only and can not rotate, then GF with large widths achieves globally maximal margins, and its sample complexity scales with their inverse; this is in contrast to prior work, which required infinite width and a tricky dual convergence assumption. As purely technical contributions, this work develops a variety of potential functions and other tools which will hopefully aid future work.
    Sparse Continuous Distributions and Fenchel-Young Losses. (arXiv:2108.01988v2 [cs.LG] UPDATED)
    Exponential families are widely used in machine learning, including many distributions in continuous and discrete domains (e.g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation). Distributions in each of these families have fixed support. In contrast, for finite domains, recent work on sparse alternatives to softmax (e.g., sparsemax, $\alpha$-entmax, and fusedmax), has led to distributions with varying support. This paper develops sparse alternatives to continuous distributions, based on several technical contributions: First, we define $\Omega$-regularized prediction maps and Fenchel-Young losses for arbitrary domains (possibly countably infinite or continuous). For linearly parametrized families, we show that minimization of Fenchel-Young losses is equivalent to moment matching of the statistics, generalizing a fundamental property of exponential families. When $\Omega$ is a Tsallis negentropy with parameter $\alpha$, we obtain ``deformed exponential families,'' which include $\alpha$-entmax and sparsemax ($\alpha=2$) as particular cases. For quadratic energy functions, the resulting densities are $\beta$-Gaussians, an instance of elliptical distributions that contain as particular cases the Gaussian, biweight, triweight, and Epanechnikov densities, and for which we derive closed-form expressions for the variance, Tsallis entropy, and Fenchel-Young loss. When $\Omega$ is a total variation or Sobolev regularizer, we obtain a continuous version of the fusedmax. Finally, we introduce continuous-domain attention mechanisms, deriving efficient gradient backpropagation algorithms for $\alpha \in \{1, 4/3, 3/2, 2\}$. Using these algorithms, we demonstrate our sparse continuous distributions for attention-based audio classification and visual question answering, showing that they allow attending to time intervals and compact regions.
    A Hybrid Framework for Sequential Data Prediction with End-to-End Optimization. (arXiv:2203.13787v2 [stat.ML] UPDATED)
    We investigate nonlinear prediction in an online setting and introduce a hybrid model that effectively mitigates, via an end-to-end architecture, the need for hand-designed features and manual model selection issues of conventional nonlinear prediction/regression methods. In particular, we use recursive structures to extract features from sequential signals, while preserving the state information, i.e., the history, and boosted decision trees to produce the final output. The connection is in an end-to-end fashion and we jointly optimize the whole architecture using stochastic gradient descent, for which we also provide the backward pass update equations. In particular, we employ a recurrent neural network (LSTM) for adaptive feature extraction from sequential data and a gradient boosting machinery (soft GBDT) for effective supervised regression. Our framework is generic so that one can use other deep learning architectures for feature extraction (such as RNNs and GRUs) and machine learning algorithms for decision making as long as they are differentiable. We demonstrate the learning behavior of our algorithm on synthetic data and the significant performance improvements over the conventional methods over various real life datasets. Furthermore, we openly share the source code of the proposed method to facilitate further research.  ( 3 min )
    Modeling Cell Populations Measured By Flow Cytometry With Covariates Using Sparse Mixture of Regressions. (arXiv:2008.11251v2 [stat.AP] UPDATED)
    The ocean is filled with microscopic microalgae called phytoplankton, which together are responsible for as much photosynthesis as all plants on land combined. Our ability to predict their response to the warming ocean relies on understanding how the dynamics of phytoplankton populations is influenced by changes in environmental conditions. One powerful technique to study the dynamics of phytoplankton is flow cytometry, which measures the optical properties of thousands of individual cells per second. Today, oceanographers are able to collect flow cytometry data in real-time onboard a moving ship, providing them with fine-scale resolution of the distribution of phytoplankton across thousands of kilometers. One of the current challenges is to understand how these small and large scale variations relate to environmental conditions, such as nutrient availability, temperature, light and ocean currents. In this paper, we propose a novel sparse mixture of multivariate regressions model to estimate the time-varying phytoplankton subpopulations while simultaneously identifying the specific environmental covariates that are predictive of the observed changes to these subpopulations. We demonstrate the usefulness and interpretability of the approach using both synthetic data and real observations collected on an oceanographic cruise conducted in the north-east Pacific in the spring of 2017.  ( 3 min )
    Bayesian Optimization with Informative Covariance. (arXiv:2208.02704v1 [cs.LG])
    Bayesian Optimization is a methodology for global optimization of unknown and expensive objectives. It combines a surrogate Bayesian regression model with an acquisition function to decide where to evaluate the objective. Typical regression models are Gaussian processes with stationary covariance functions, which, however, are unable to express prior input-dependent information, in particular information about possible locations of the optimum. The ubiquity of stationary models has led to the common practice of exploiting prior information via informative mean functions. In this paper, we highlight that these models can lead to poor performance, especially in high dimensions. We propose novel informative covariance functions that leverage nonstationarity to encode preferences for certain regions of the search space and adaptively promote local exploration during the optimization. We demonstrate that they can increase the sample efficiency of the optimization in high dimensions, even under weak prior information.  ( 2 min )
    Membership Inference Attacks and Defenses in Neural Network Pruning. (arXiv:2202.03335v2 [cs.CR] UPDATED)
    Neural network pruning has been an essential technique to reduce the computation and memory requirements for using deep neural networks for resource-constrained devices. Most existing research focuses primarily on balancing the sparsity and accuracy of a pruned neural network by strategically removing insignificant parameters and retraining the pruned model. Such efforts on reusing training samples pose serious privacy risks due to increased memorization, which, however, has not been investigated yet. In this paper, we conduct the first analysis of privacy risks in neural network pruning. Specifically, we investigate the impacts of neural network pruning on training data privacy, i.e., membership inference attacks. We first explore the impact of neural network pruning on prediction divergence, where the pruning process disproportionately affects the pruned model's behavior for members and non-members. Meanwhile, the influence of divergence even varies among different classes in a fine-grained manner. Enlighten by such divergence, we proposed a self-attention membership inference attack against the pruned neural networks. Extensive experiments are conducted to rigorously evaluate the privacy impacts of different pruning approaches, sparsity levels, and adversary knowledge. The proposed attack shows the higher attack performance on the pruned models when compared with eight existing membership inference attacks. In addition, we propose a new defense mechanism to protect the pruning process by mitigating the prediction divergence based on KL-divergence distance, whose effectiveness has been experimentally demonstrated to effectively mitigate the privacy risks while maintaining the sparsity and accuracy of the pruned models.  ( 3 min )
    Development and Validation of ML-DQA -- a Machine Learning Data Quality Assurance Framework for Healthcare. (arXiv:2208.02670v1 [stat.ML])
    The approaches by which the machine learning and clinical research communities utilize real world data (RWD), including data captured in the electronic health record (EHR), vary dramatically. While clinical researchers cautiously use RWD for clinical investigations, ML for healthcare teams consume public datasets with minimal scrutiny to develop new algorithms. This study bridges this gap by developing and validating ML-DQA, a data quality assurance framework grounded in RWD best practices. The ML-DQA framework is applied to five ML projects across two geographies, different medical conditions, and different cohorts. A total of 2,999 quality checks and 24 quality reports were generated on RWD gathered on 247,536 patients across the five projects. Five generalizable practices emerge: all projects used a similar method to group redundant data element representations; all projects used automated utilities to build diagnosis and medication data elements; all projects used a common library of rules-based transformations; all projects used a unified approach to assign data quality checks to data elements; and all projects used a similar approach to clinical adjudication. An average of 5.8 individuals, including clinicians, data scientists, and trainees, were involved in implementing ML-DQA for each project and an average of 23.4 data elements per project were either transformed or removed in response to ML-DQA. This study demonstrates the importance role of ML-DQA in healthcare projects and provides teams a framework to conduct these essential activities.  ( 3 min )
    DoubleML -- An Object-Oriented Implementation of Double Machine Learning in R. (arXiv:2103.09603v3 [stat.ML] UPDATED)
    The R package DoubleML implements the double/debiased machine learning framework of Chernozhukov et al. (2018). It provides functionalities to estimate parameters in causal models based on machine learning methods. The double machine learning framework consist of three key ingredients: Neyman orthogonality, high-quality machine learning estimation and sample splitting. Estimation of nuisance components can be performed by various state-of-the-art machine learning methods that are available in the mlr3 ecosystem. DoubleML makes it possible to perform inference in a variety of causal models, including partially linear and interactive regression models and their extensions to instrumental variable estimation. The object-oriented implementation of DoubleML enables a high flexibility for the model specification and makes it easily extendable. This paper serves as an introduction to the double machine learning framework and the R package DoubleML. In reproducible code examples with simulated and real data sets, we demonstrate how DoubleML users can perform valid inference based on machine learning methods.  ( 2 min )
    Using Mixed-Effects Models to Learn Bayesian Networks from Related Data Sets. (arXiv:2206.03743v2 [stat.ML] UPDATED)
    We commonly assume that data are a homogeneous set of observations when learning the structure of Bayesian networks. However, they often comprise different data sets that are related but not homogeneous because they have been collected in different ways or from different populations. In our previous work (Azzimonti, Corani and Scutari, 2021), we proposed a closed-form Bayesian Hierarchical Dirichlet score for discrete data that pools information across related data sets to learn a single encompassing network structure, while taking into account the differences in their probabilistic structures. In this paper, we provide an analogous solution for learning a Bayesian network from continuous data using mixed-effects models to pool information across the related data sets. We study its structural, parametric, predictive and classification accuracy and we show that it outperforms both conditional Gaussian Bayesian networks (that do not perform any pooling) and classical Gaussian Bayesian networks (that disregard the heterogeneous nature of the data). The improvement is marked for low sample sizes and for unbalanced data sets.  ( 2 min )
    A similarity-based Bayesian mixture-of-experts model. (arXiv:2012.02130v4 [stat.ML] UPDATED)
    We present a new nonparametric mixture-of-experts model for multivariate regression problems, inspired by the probabilistic k-nearest neighbors algorithm. Using a conditionally specified model, predictions for out-of-sample inputs are based on similarities to each observed data point, yielding predictive distributions represented by Gaussian mixtures. Posterior inference is performed on the parameters of the mixture components as well as the distance metric using a mean-field variational Bayes algorithm accompanied with a stochastic gradient-based optimization procedure. The proposed method is especially advantageous in settings where inputs are of relatively high dimension in comparison to the data size, where input-output relationships are complex, and where predictive distributions may be skewed or multimodal. Computational studies on five datasets, of which two are synthetically generated, illustrate clear advantages of our mixture-of-experts method for high-dimensional inputs, outperforming competitor models both in terms of validation metrics and visual inspection.  ( 2 min )
    A new class of generative classifiers based on staged tree models. (arXiv:2012.13798v2 [cs.AI] UPDATED)
    Generative models for classification use the joint probability distribution of the class variable and the features to construct a decision rule. Among generative models, Bayesian networks and naive Bayes classifiers are the most commonly used and provide a clear graphical representation of the relationship among all variables. However, these have the disadvantage of highly restricting the type of relationships that could exist, by not allowing for context-specific independences. Here we introduce a new class of generative classifiers, called staged tree classifiers, which formally account for context-specific independence. They are constructed by a partitioning of the vertices of an event tree from which conditional independence can be formally read. The naive staged tree classifier is also defined, which extends the classic naive Bayes classifier whilst retaining the same complexity. An extensive simulation study shows that the classification accuracy of staged tree classifiers is competitive with that of state-of-the-art classifiers and an example showcases their use in practice.  ( 2 min )
    PyDTS: A Python Package for Discrete-Time Survival (Regularized) Regression with Competing Risks. (arXiv:2204.05731v3 [stat.ML] UPDATED)
    Time-to-event analysis (survival analysis) is used when the outcome or the response of interest is the time until a pre-specified event occurs. Time-to-event data are sometimes discrete either because time itself is discrete or due to grouping of failure times into intervals or rounding off measurements. In addition, the failure of an individual could be one of several distinct failure types; known as competing risks (events). This work focuses on discrete-time regression with competing events. We emphasize the main difference between the continuous and discrete settings with competing events, develop a faster estimation algorithm, and present PyDTS, an open source Python package which implements our procedure and other tools for discrete-time-survival analysis with competing risks.  ( 2 min )
    A Robust graph attention network with dynamic adjusted Graph. (arXiv:2009.13038v3 [cs.LG] UPDATED)
    Graph Attention Networks(GATs) are useful deep learning models to deal with the graph data. However, recent works show that the classical GAT is vulnerable to adversarial attacks. It degrades dramatically with slight perturbations. Therefore, how to enhance the robustness of GAT is a critical problem. Robust GAT(RoGAT) is proposed in this paper to improve the robustness of GAT based on the revision of the attention mechanism. Different from the original GAT, which uses the attention mechanism for different edges but is still sensitive to the perturbation, RoGAT adds an extra dynamic attention score progressively and improves the robustness. Firstly, RoGAT revises the edges weight based on the smoothness assumption which is quite common for ordinary graphs. Secondly, RoGAT further revises the features to suppress features' noise. Then, an extra attention score is generated by the dynamic edge's weight and can be used to reduce the impact of adversarial attacks. Different experiments against targeted and untargeted attacks on citation data on citation data demonstrate that RoGAT outperforms most of the recent defensive methods.  ( 2 min )
    Node Copying: A Random Graph Model for Effective Graph Sampling. (arXiv:2208.02435v1 [stat.ML])
    There has been an increased interest in applying machine learning techniques on relational structured-data based on an observed graph. Often, this graph is not fully representative of the true relationship amongst nodes. In these settings, building a generative model conditioned on the observed graph allows to take the graph uncertainty into account. Various existing techniques either rely on restrictive assumptions, fail to preserve topological properties within the samples or are prohibitively expensive for larger graphs. In this work, we introduce the node copying model for constructing a distribution over graphs. Sampling of a random graph is carried out by replacing each node's neighbors by those of a randomly sampled similar node. The sampled graphs preserve key characteristics of the graph structure without explicitly targeting them. Additionally, sampling from this model is extremely simple and scales linearly with the nodes. We show the usefulness of the copying model in three tasks. First, in node classification, a Bayesian formulation based on node copying achieves higher accuracy in sparse data settings. Second, we employ our proposed model to mitigate the effect of adversarial attacks on the graph topology. Last, incorporation of the model in a recommendation system setting improves recall over state-of-the-art methods.  ( 3 min )
    Interpolating Log-Determinant and Trace of the Powers of Matrix $\mathbf{A} + t \mathbf{B}$. (arXiv:2009.07385v3 [math.NA] UPDATED)
    We develop heuristic interpolation methods for the functions $t \mapsto \log \det \left( \mathbf{A} + t \mathbf{B} \right)$ and $t \mapsto \operatorname{trace}\left( (\mathbf{A} + t \mathbf{B})^{p} \right)$ where the matrices $\mathbf{A}$ and $\mathbf{B}$ are Hermitian and positive (semi) definite and $p$ and $t$ are real variables. These functions are featured in many applications in statistics, machine learning, and computational physics. The presented interpolation functions are based on the modification of sharp bounds for these functions. We demonstrate the accuracy and performance of the proposed method with numerical examples, namely, the marginal maximum likelihood estimation for Gaussian process regression and the estimation of the regularization parameter of ridge regression with the generalized cross-validation method.  ( 2 min )
    Improving Meta-Learning Generalization with Activation-Based Early-Stopping. (arXiv:2208.02377v1 [cs.LG])
    Meta-Learning algorithms for few-shot learning aim to train neural networks capable of generalizing to novel tasks using only a few examples. Early-stopping is critical for performance, halting model training when it reaches optimal generalization to the new task distribution. Early-stopping mechanisms in Meta-Learning typically rely on measuring the model performance on labeled examples from a meta-validation set drawn from the training (source) dataset. This is problematic in few-shot transfer learning settings, where the meta-test set comes from a different target dataset (OOD) and can potentially have a large distributional shift with the meta-validation set. In this work, we propose Activation Based Early-stopping (ABE), an alternative to using validation-based early-stopping for meta-learning. Specifically, we analyze the evolution, during meta-training, of the neural activations at each hidden layer, on a small set of unlabelled support examples from a single task of the target tasks distribution, as this constitutes a minimal and justifiably accessible information from the target problem. Our experiments show that simple, label agnostic statistics on the activations offer an effective way to estimate how the target generalization evolves over time. At each hidden layer, we characterize the activation distributions, from their first and second order moments, then further summarized along the feature dimensions, resulting in a compact yet intuitive characterization in a four-dimensional space. Detecting when, throughout training time, and at which layer, the target activation trajectory diverges from the activation trajectory of the source data, allows us to perform early-stopping and improve generalization in a large array of few-shot transfer learning settings, across different algorithms, source and target datasets.  ( 3 min )
    AACC: Asymmetric Actor-Critic in Contextual Reinforcement Learning. (arXiv:2208.02376v1 [cs.LG])
    Reinforcement Learning (RL) techniques have drawn great attention in many challenging tasks, but their performance deteriorates dramatically when applied to real-world problems. Various methods, such as domain randomization, have been proposed to deal with such situations by training agents under different environmental setups, and therefore they can be generalized to different environments during deployment. However, they usually do not incorporate the underlying environmental factor information that the agents interact with properly and thus can be overly conservative when facing changes in the surroundings. In this paper, we first formalize the task of adapting to changing environmental dynamics in RL as a generalization problem using Contextual Markov Decision Processes (CMDPs). We then propose the Asymmetric Actor-Critic in Contextual RL (AACC) as an end-to-end actor-critic method to deal with such generalization tasks. We demonstrate the essential improvements in the performance of AACC over existing baselines experimentally in a range of simulated environments.  ( 2 min )
    Degenerate Gaussian factors for probabilistic inference. (arXiv:2104.15010v2 [cs.LG] UPDATED)
    In this paper, we propose a parametrised factor that enables inference on Gaussian networks where linear dependencies exist among the random variables. Our factor representation is effectively a generalisation of traditional Gaussian parametrisations where the positive-definite constraint of the covariance matrix has been relaxed. For this purpose, we derive various statistical operations and results (such as marginalisation, multiplication and affine transformations of random variables) that extend the capabilities of Gaussian factors to these degenerate settings. By using this principled factor definition, degeneracies can be accommodated accurately and automatically at little additional computational cost. As illustration, we apply our methodology to a representative example involving recursive state estimation of cooperative mobile robots.  ( 2 min )
    Local versions of sum-of-norms clustering. (arXiv:2109.09589v3 [cs.LG] UPDATED)
    Sum-of-norms clustering is a convex optimization problem whose solution can be used for the clustering of multivariate data. We propose and study a localized version of this method, and show in particular that it can separate arbitrarily close balls in the stochastic ball model. More precisely, we prove a quantitative bound on the error incurred in the clustering of disjoint connected sets. Our bound is expressed in terms of the number of datapoints and the localization length of the functional.  ( 2 min )
    Diffusion-Based Voice Conversion with Fast Maximum Likelihood Sampling Scheme. (arXiv:2109.13821v2 [cs.SD] UPDATED)
    Voice conversion is a common speech synthesis task which can be solved in different ways depending on a particular real-world scenario. The most challenging one often referred to as one-shot many-to-many voice conversion consists in copying the target voice from only one reference utterance in the most general case when both source and target speakers do not belong to the training dataset. We present a scalable high-quality solution based on diffusion probabilistic modeling and demonstrate its superior quality compared to state-of-the-art one-shot voice conversion approaches. Moreover, focusing on real-time applications, we investigate general principles which can make diffusion models faster while keeping synthesis quality at a high level. As a result, we develop a novel Stochastic Differential Equations solver suitable for various diffusion model types and generative tasks as shown through empirical studies and justify it by theoretical analysis.  ( 2 min )
    A Class of Dimension-free Metrics for the Convergence of Empirical Measures. (arXiv:2104.12036v3 [math.PR] UPDATED)
    This paper concerns the convergence of empirical measures in high dimensions. We propose a new class of metrics and show that under such metrics, the convergence is free of the curse of dimensionality (CoD). Such a feature is critical for high-dimensional analysis and stands in contrast to classical metrics ({\it e.g.}, the Wasserstein distance). The proposed metrics originate from the maximum mean discrepancy, which we generalize by proposing specific criteria for selecting test function spaces to guarantee the property of being free of CoD. Therefore, we call this class of metrics the generalized maximum mean discrepancy (GMMD). Examples of the selected test function spaces include the reproducing kernel Hilbert space, Barron space, and flow-induced function spaces. Three applications of the proposed metrics are presented: 1. The convergence of empirical measure in the case of random variables; 2. The convergence of $n$-particle system to the solution to McKean-Vlasov stochastic differential equation; 3. The construction of an $\varepsilon$-Nash equilibrium for a homogeneous $n$-player game by its mean-field limit. As a byproduct, we prove that, given a distribution close to the target distribution measured by GMMD and a certain representation of the target distribution, we can generate a distribution close to the target one in terms of the Wasserstein distance and relative entropy. Overall, we show that the proposed class of metrics is a powerful tool to analyze the convergence of empirical measures in high dimensions without CoD.  ( 3 min )
    Explaining Classifiers Trained on Raw Hierarchical Multiple-Instance Data. (arXiv:2208.02694v1 [stat.ML])
    Learning from raw data input, thus limiting the need for feature engineering, is a component of many successful applications of machine learning methods in various domains. While many problems naturally translate into a vector representation directly usable in standard classifiers, a number of data sources have the natural form of structured data interchange formats (e.g., security logs in JSON/XML format). Existing methods, such as in Hierarchical Multiple Instance Learning (HMIL), allow learning from such data in their raw form. However, the explanation of the classifiers trained on raw structured data remains largely unexplored. By treating these models as sub-set selections problems, we demonstrate how interpretable explanations, with favourable properties, can be generated using computationally efficient algorithms. We compare to an explanation technique adopted from graph neural networks showing an order of magnitude speed-up and higher-quality explanations.  ( 2 min )
    Topological Signal Processing using the Weighted Ordinal Partition Network. (arXiv:2205.08349v2 [stat.ML] UPDATED)
    One of the most important problems arising in time series analysis is that of bifurcation, or change point detection. That is, given a collection of time series over a varying parameter, when has the structure of the underlying dynamical system changed? For this task, we turn to the field of topological data analysis (TDA), which encodes information about the shape and structure of data. The idea of utilizing tools from TDA for signal processing tasks, known as topological signal processing (TSP), has gained much attention in recent years, largely through a standard pipeline that computes the persistent homology of the point cloud generated by the Takens' embedding. However, this procedure is limited by computation time since the simplicial complex generated in this case is large, but also has a great deal of redundant data. For this reason, we turn to a more recent method for encoding the structure of the attractor, which constructs an ordinal partition network (OPN) representing information about when the dynamical system has passed between certain regions of state space. The result is a weighted graph whose structure encodes information about the underlying attractor. Our previous work began to find ways to package the information of the OPN in a manner that is amenable to TDA; however, that work only used the network structure and did nothing to encode the additional weighting information. In this paper, we take the next step: building a pipeline to analyze the weighted OPN with TDA and showing that this framework provides more resilience to noise or perturbations in the system and improves the accuracy of the dynamic state detection.  ( 3 min )
    Bayesian regularization of empirical MDPs. (arXiv:2208.02362v1 [cs.LG])
    In most applications of model-based Markov decision processes, the parameters for the unknown underlying model are often estimated from the empirical data. Due to noise, the policy learnedfrom the estimated model is often far from the optimal policy of the underlying model. When applied to the environment of the underlying model, the learned policy results in suboptimal performance, thus calling for solutions with better generalization performance. In this work we take a Bayesian perspective and regularize the objective function of the Markov decision process with prior information in order to obtain more robust policies. Two approaches are proposed, one based on $L^1$ regularization and the other on relative entropic regularization. We evaluate our proposed algorithms on synthetic simulations and on real-world search logs of a large scale online shopping store. Our results demonstrate the robustness of regularized MDP policies against the noise present in the models.  ( 2 min )
    Pareto Smoothed Importance Sampling. (arXiv:1507.02646v8 [stat.CO] UPDATED)
    Importance weighting is a general way to adjust Monte Carlo integration to account for draws from the wrong distribution, but the resulting estimate can be highly variable when the importance ratios have a heavy right tail. This routinely occurs when there are aspects of the target distribution that are not well captured by the approximating distribution, in which case more stable estimates can be obtained by modifying extreme importance ratios. We present a new method for stabilizing importance weights using a generalized Pareto distribution fit to the upper tail of the distribution of the simulated importance ratios. The method, which empirically performs better than existing methods for stabilizing importance sampling estimates, includes stabilized effective sample size estimates, Monte Carlo error estimates, and convergence diagnostics. The presented Pareto $\hat{k}$ finite sample convergence rate diagnostic is useful for any Monte Carlo estimator.  ( 3 min )
    An Optimal Likelihood Free Method for Biological Model Selection. (arXiv:2208.02344v1 [q-bio.QM])
    Systems biology seeks to create math models of biological systems to reduce inherent biological complexity and provide predictions for applications such as therapeutic development. However, it remains a challenge to determine which math model is correct and how to arrive optimally at the answer. We present an algorithm for automated biological model selection using mathematical models of systems biology and likelihood free inference methods. Our algorithm shows improved performance in arriving at correct models without a priori information over conventional heuristics used in experimental biology and random search. This method shows promise to accelerate biological basic science and drug discovery.  ( 2 min )
    Agnostic Learning of General ReLU Activation Using Gradient Descent. (arXiv:2208.02711v1 [cs.LG])
    We provide a convergence analysis of gradient descent for the problem of agnostically learning a single ReLU function under Gaussian distributions. Unlike prior work that studies the setting of zero bias, we consider the more challenging scenario when the bias of the ReLU function is non-zero. Our main result establishes that starting from random initialization, in a polynomial number of iterations gradient descent outputs, with high probability, a ReLU function that achieves a competitive error guarantee when compared to the error of the best ReLU function. We also provide finite sample guarantees, and these techniques generalize to a broader class of marginal distributions beyond Gaussians.  ( 2 min )
    Towards Understanding Mixture of Experts in Deep Learning. (arXiv:2208.02813v1 [cs.LG])
    The Mixture-of-Experts (MoE) layer, a sparsely-activated model controlled by a router, has achieved great success in deep learning. However, the understanding of such architecture remains elusive. In this paper, we formally study how the MoE layer improves the performance of neural network learning and why the mixture model will not collapse into a single model. Our empirical results suggest that the cluster structure of the underlying problem and the non-linearity of the expert are pivotal to the success of MoE. To further understand this, we consider a challenging classification problem with intrinsic cluster structures, which is hard to learn using a single expert. Yet with the MoE layer, by choosing the experts as two-layer nonlinear convolutional neural networks (CNNs), we show that the problem can be learned successfully. Furthermore, our theory shows that the router can learn the cluster-center features, which helps divide the input complex problem into simpler linear classification sub-problems that individual experts can conquer. To our knowledge, this is the first result towards formally understanding the mechanism of the MoE layer for deep learning.  ( 2 min )
  • Open

    7+ Best Books to Learn Neural Networks in 2022 for Beginners (Updated) -
    submitted by /u/Lakshmireddys [link] [comments]  ( 85 min )
  • Open

    6 Best Artificial Intelligence courses for Healthcare You should learn 2022 -
    submitted by /u/Lakshmireddys [link] [comments]  ( 86 min )
    interesting problems
    What are some Interesting problems you have solved using AI ? submitted by /u/Weary_Word_5262 [link] [comments]  ( 86 min )
    Gothic Manor by Midjourney
    submitted by /u/WonderingWhyWeExist [link] [comments]  ( 85 min )

  • Open

    [D] Working in the industry and coder recommendation (lucidrains, crowsonkb etc)
    ever since i started to work after phd, i'm noticing more and more that engineering customized systems is crucial (minor details like initializations, learning rates, schedulers etc can save or waste hundreds of hours), and writing bad, nonmodular code is one of the worst offenders in killing productivity. also, i work in generative modeling and noticed the whole community relies on a handful of people's code, passed over again and again in hundreds of papers (diffusion, stylegan based work, a lot of gan implementations, transformers etc). i'm not saying every new work should rewrite their codebase from scratch, but sometimes i try to test out code and modify it, and it is actually easier if had just written the whole thing (or parts of it i need to have control over) from scratch. also, i don't believe you actually rewrite everything from scratch, but bring together lego blocks and expand (a good example is how this gentleman implemented tons of gans, it's essentially compounding of knowledge where each new paper is usually only slightly different from the previous ones: https://github.com/eriklindernoren) recently, i started studying lucidrains' (https://github.com/lucidrains) and crowsonkb's (https://github.com/crowsonkb) code. i sit down, put the paper pdf one side and the code another, act like it's flashcards, hide the code, and try to rewrite the correct function. maybe it's a terrible way of learning (i already know the method described in the paper, but cannot implement it at this point), but seems to help (open to suggestions!). the people i mentioned above code like poetry (there is always some errors and not always faithful to the original implementations, but that's ok). i'm wondering do you know anyone like these guys i can just absorb information from? can be any kind of machine learning. i use only python, and like pytorch, tf 2 (hate tf 1), and started dipping my toe in jax. submitted by /u/onzanzo [link] [comments]  ( 89 min )
    [D] Have you responded to your NeurIPS22 rebuttals?
    If you are reviewing for NeurIPS this year, have you already read & responded to the rebuttals posted by the authors? submitted by /u/OpeningVariable [link] [comments]  ( 113 min )
    [D] What is the current SOTA in multi object 3d bounding box detection that is not self-driving based
    The only work that I have seen is Objectron, and that is definitely not open source. I am simply not able to find a generic paper that does 3d bounding box regression for a multi object scene submitted by /u/soulslicer0 [link] [comments]  ( 107 min )
    [D] Why is ML research so experimental?
    I'm still a bit of an ML noob, so this might be my inexperience talking, but why is so much research in ML experimental? My understanding is that areas such as physics have a strong experimental branch because they study already existing systems, but this doesn't seem to be the case with ML. I mean, we study mathematical objects, so it seems to me that we should be trying to understand them as such. ​ Like, if someone wants to propose a shortest path algorithm, they report its time complexity, not that it took 1min on average to run it, right? submitted by /u/apple_tau [link] [comments]  ( 94 min )
    What do you think is the place of Googles Carbon in ML? [D]
    Is there any place at all, taking C and C++ into consideration submitted by /u/ZuleZI [link] [comments]  ( 110 min )
    [D] Concept of collaborative open-source books for AI/DL
    Most of the knowledge base of modern AI/DL lives in papers, not on traditional books, unlike many other fields. This is mostly because AI/DL is so fast-paced, it is nearly impossible to write up-to-date book and keep it up-to-date. Publishing books the formal way requires tremendous effort by authors and publishing agency. Some have done it, like Goodfellow, Courville, Bengio and Kevin Murphy. But they are also likely to be outdated within few years as new algorithms emerge. Papers are difficult to read for non-experts/moderately-skilled workforces. Sometimes, different papers have very different notation and writing style which is confusing. Different authors have different "mental model" of the same concepts. So they aren't really unified. Is it possible to have open-source collaborative books (maybe a latex project hosted on Github, for example) where people (original authors or others) can submit new algorithms or changes as they appear in conferences and a group of "book maintainers" merge them depending on whether their notations/interpretations are compatible with the rest of the book. It's like Wikipedia, but much more curated and geared toward specific topic(s). Q: Is there any such successful projects like this, specifically for AI/DL ? submitted by /u/dasayan05 [link] [comments]  ( 88 min )
    [D] Lessons From Deploying Deep Learning To Production
    I used to think that machine learning was about the models. Actually, machine learning in production is about pipelines. One of the best predictors of success is the ability to effectively iterate on your model pipeline. That doesn't just mean iterating quickly, but also iterating intelligently. The second part is crucial, otherwise you end up with a pipeline that produces bad models very quickly. https://thegradient.pub/lessons-from-deploying-deep-learning-to-production/ submitted by /u/pgao_aquarium [link] [comments]  ( 87 min )
    [D] Journal taking long time to review. Editor/staff does not communicate to authors. What should we do?
    A ML/DL journal (keeping it anonymous for now) is taking a very long time to review a submitted paper. The journal's speed metrics page mentions that average time of review is 6-7 weeks. Our paper has been in review for more than 21 weeks. We have tried communicating with the editor three weeks ago, and also emailed the staff two weeks ago. No one has replied yet. We also spoke with a support official via chat and he/she mentions, "Rest assured that the Editors are doing their best to expedite the process. Once the Editors have completed and evaluated all the reviewer recommendation, they will provide the decision in due course." It has been one more week since this conversation and we have not received any communication yet. Yesterday we sent out another mail (as a reply to the previous email thread), but no one has replied yet. No one seems to be very open to communication. Can anyone please tell us what to do? submitted by /u/FastestLearner [link] [comments]  ( 112 min )
    [P] New Search Engine for Python ML Docs
    So I’ve been getting tired of googling and getting stackoverflow when I already know what library I want, and not being able to search those libraries docs because of their rudimentary keyword based searches. Thus, I decided to make a search tool for open-source python libraries (with a focus on ML libraries, since that's mostly what I work on) thats curated for actual developers and permits natural language queries. I’m gonna keep this free as long as I can, so it'd be wonderful to get feedback from anybody who'd be up to give it a try. Check it out at https://www.pysearch.com and please feel free to share with anybody else you know who might benefit from this! submitted by /u/oodmb [link] [comments]  ( 89 min )
    [D] Learning path for Machine Learning.
    Hi! I've decided to enter the black hole known as "machine learning" and after scowering through the reddit I came accross this lovely post: https://www.reddit.com/r/MachineLearning/comments/5z8110/d_a_super_harsh_guide_to_machine_learning/?ref=share&ref_source=link I noticed some people suggest that a better way to get started would be reading "Introduction to Statistical Learning" instead. I was wondering which chapters of introduction to statisical learning are a must-read before starting the elements of statistical learning? I was also curious as to wether there were other learning paths you all would suggest in contrary to the post I shared. I have pre-req math up until calculus 3 (vector calc) and linear algebra knowledge; I have also been coding for roughly 6 months in python. Thank you for your help. Have a good day! submitted by /u/h3cker999 [link] [comments]  ( 88 min )
    [D] Accessing/watching recorded ICML 2022 paper presentations?
    Hello, I would like to watch the talks/videos for accepted ICML 2022 papers. In the past, these used to be available for free at https://slideslive.com/library. For example, the oral presentations (https://icml.cc/virtual/2022/events/oral) cannot be accessed without registration. However, with the conference being over, registrations are closed already. Any ideas and tips on how to watch the videos would be very appreciated. Thanks! submitted by /u/solingermuc [link] [comments]  ( 88 min )
    [R] Questions About ACL Rolling Review Experience
    Hi all, I recently had some bad experiences with the ACL Rolling Review (ARR) and I wanted to know if my experience was typical and if there is anything I can do: - I've emailed ARR multiple times and I've never gotten a response, whether it was their support, tech, or editors email. These emails have included a request for tech support (I couldn't attach software to my submission) and a request for the status of reviews. - I received a meta-review (2) that gave a much lower score than any of the review scores I received (3.5, 3.5, 4, 4) which all had medium to high confidence (3, 5, 4, 4). The weaknesses and strengths given in the meta-review were different than those in the other reviews, which leads me to believe the meta-review was written like an independent review. The weaknesses given also did not seem to justify my low meta-review score. Has anyone else had similar experiences with ARR and does anyone have any advice about what to do? submitted by /u/Chrysomallo [link] [comments]  ( 90 min )
    [D] Cheap production-grade GPU in cloud
    We’re currently using AWS EKS with GPU enabled VMs to train our models and host the service that uses them to serve inferences, but the costs are killing us, so recently I’ve been looking for alternatives. Most of solutions I’ve found are either not that different from AWS in terms of pricing, or new, and I’m anxious about migrating our setup to something that could one day teleport our work to the trash can because they’ve run out of investor money or tell me that they can’t provision me a GPU because their data center doesn’t have any left. Do you guys have any recommendations for a cloud GPU provider that’s cheaper than AWS, but proven and reliable? submitted by /u/rj00na [link] [comments]  ( 89 min )
    [D] What are your sources of information to stay updated on the latest ML tools?
    Hello everyone, I am trying to assess what would be the best sources of information to remain updated on the latest ML tools / frameworks. Could you share what is your favorite media category? If one option is not present, it would be great if you could write it down :) View Poll submitted by /u/Separate-Still3770 [link] [comments]  ( 115 min )
    [Project] Face Recognition for 520 people
    I want to create a face detection network for a dataset of around 520 people. I have the code ready for the face detection and all the data loaders but I am struggling with which model/approach to go for. I have roughly about 25-30 pictures per person so what would be the most accurate way to go about this? submitted by /u/Normal_Gift927 [link] [comments]  ( 92 min )
    [Project] Project ideas for Web + AI (ML/Deep Learning)
    I have to make my final year project. I am proficient in full-stack web development and I need to make a project which also uses AI (ML or Deep Learning). Can you all suggest a good and useful project? Thanks in advance. submitted by /u/piyush_saha [link] [comments]  ( 111 min )
    [P] New book: Understanding Deep Learning
    Hi all, I've been writing a new textbook. It's titled "Understanding Deep Learning" and will be published by MIT press. A partial draft is now available at: https://udlbook.github.io/udlbook/ It's not the most applied book (it has no code) and it's not the most theoretical book (it has no proofs). The goal is exactly as the title suggests -- to allow the reader to understand the core ideas underpinning modern deep learning techniques in the simplest way. To this end, I've drawn a lot of new figures, and tried to come up with new and clearer explanations rather than rehash existing descriptions. I would love feedback from: Students. Which parts did you find confusing or ambiguous? Instructors. Will this book help your teaching? If not, then how could it be improved? Experts. Are there any glaring absences or mistakes? Please feel free to share and redistribute this link as you see fit. The more people that read this draft, the better the final product will be. submitted by /u/SimonJDPrince [link] [comments]  ( 90 min )
    [D] VQ-VAE with PixelCNN prior ?
    What does it mean to combine PixelCNN with the VQ-VAE model ? ( and how do you it ? ) submitted by /u/rishok [link] [comments]  ( 105 min )
    [D] Book Recommendation
    Cam you recommend the best book for learning information theory? I am a psych grad student, doing a lot of work with machine learning, and while I think I have educated myself in linear algebra, I have heard that it is also useful to learn information theory if one is to work with deep learning and related topics. Can you recommend some resources to learn information theory? submitted by /u/Hub_Pli [link] [comments]  ( 87 min )
    [D] How are Chinese universities like Tsinghua and PKU for ML PhD
    An offshoot from the thread discussing Canadian and European unis for ML phds. Lots of papers come from Chinese Universities, even smaller ones like Xiamen U, but then again churning out papers en masse isn’t a metric we should value too much. How is the international recognition of a degree from these places? submitted by /u/SocialEngineeeing [link] [comments]  ( 88 min )
    [P]Nash Finder - find Nash equilibrium for all games
    https://github.com/lansiz/nash-finder This program helps to find Nash equilibrium (NE) for any type of games. It is especially useful for those games with more than two players, which oftentimes are unsolvable. Example 1: find NE for two-person games ​ Payoff bimatrix of two-person game import grm game = grm.Game() # two playeys, and each player uses THREE pure strategies game.player_join(grm.Player(3)) game.player_join(grm.Player(3)) game.player_init_mixed_strategies() # assign the payoff (define the payoff function) # player 1 game.player_assign_payoff(1, "11", -231) game.player_assign_payoff(1, "12", -505) game.player_assign_payoff(1, "13", 525) game.player_assign_payoff(1, "21", -552) game.player_assign_payoff(1, "22", 831) game.player_assign_payoff(1, "23", -928) game.player_a…  ( 90 min )
    [P] Open sourcing my Kaggle Pipeline
    I am open sourcing my Kaggle Pipeline for Tabular Data Competitions. It is the result of hundreds of hours I have spent working through various competitions. This project will fast forward journey of a Kaggle newbie by several months. github: https://github.com/arnabbiswas1/kaggle_pipeline_tps_aug_22 Kaggle Discussion: https://www.kaggle.com/competitions/tabular-playground-series-aug-2022/discussion/341120 submitted by /u/abiswa [link] [comments]  ( 87 min )
    [D] Free cloud GPU options in 2022?
    We all know Colab, Gradient, Kaggle, etc. Any obscure/new free cloud GPU providers that are not talked about enough? Even if they're not ultra powerful. submitted by /u/No_Application_5581 [link] [comments]  ( 87 min )
    [D] The theory of everything
    Please critique my theory of everything. Looking to explore any logic I may be missing. https://docs.google.com/document/d/1lbrExCLuLh9yWvUPG_gx9l2bTEay-7naT2hgBJUU5zU/edit submitted by /u/averythomas [link] [comments]  ( 87 min )
    [D] Beginner in machine learning and feeling lost
    I am a beginner with little experience in machine learning and I'm thinking of starting a project with my beginners mates (object detection project). Although I have a background in deep learning, and computer vision (I took Kaggle's courses), I have never applied what I learned and have no idea what I should do next, so I would appreciate any suggestions, advice, or mentorship you could provide. submitted by /u/this-is-the-admin [link] [comments]  ( 87 min )
  • Open

    Trouble installing Arcade Learning Environment (atari library) on a remote machine
    I used `pip install gym[atari]` to install the ALE on a machine on papersapce. However I am unable to run my code using the Atari library, this is the error message I get : `File "/home/paperspace/.local/lib/python3.8/site-packages/gym/envs/atari/environment.py", line 196, in seed self.ale.loadROM(getattr(roms, self._game)) RuntimeError: Failed to initialize SDL` I had no issues installing and running ALE on my local machine but somehow it doesn't work on the remote machine. Could use a helping hand please, let me know if you've ever had this issue or if you know how to solve it. Thanks in advance. submitted by /u/youneskamel2 [link] [comments]  ( 87 min )
    Contextual Bandit Math
    Is there a simple to digest resource to understand the math behind contextual bandits and how it works. I understand UCB. Also, I am following this good text https://arxiv.org/abs/1904.07272, but it's thick and takes time to develop an intuition for the algorithms. And almost all videos and blogs are a tease! Thanks in advance. submitted by /u/sap2022 [link] [comments]  ( 86 min )
    Best model-based method for robotics environment?
    I am looking to solve the dm-control manipulator environment and have been struggling when using SAC or PPO, after a billion time steps the agent still isn't learning. So was going to try a model based method such as MPPI but since I'm not as familiar with model based methods I wanted to know what the state of the art is, preferably something we'll documented too would be helpful :) submitted by /u/SuperDuperDooken [link] [comments]  ( 86 min )
    How do parallel environments work?
    Hi, I'm trying to understand how using multiple running threads works. Say you have 12 environments. As far as I understood, you pass a 12 x n vector as actions in the step function. Then, the step functions gives you back a 12 x n vector as observation, rewards, etc. Is this correct? submitted by /u/No_Possibility_7588 [link] [comments]  ( 86 min )
    Why is my DQN cartpole not learning?
    I coded in a DQN (without any target network). For some reason, the algorithm fails to learn any meaningful policy. Here's my code. I will highly appreciate any and all suggestions and criticisms :) ​ #!/usr/bin/env python # coding: utf-8 # In[66]: # Here we import all libraries import numpy as np import gym import matplotlib.pyplot as plt import os import torch import random from torch import nn from torch.utils.data import DataLoader from torchvision import datasets, transforms from collections import deque import sys env = gym.make("CartPole-v0") # In[67]: #Hyperparameters episodes = 20000 eps = 1.0 learning_rate = 0.001 tot_rewards = [] tot_loss = [] decay_val = 0.0001 mem_size = 5000 batch_size = 100 gamma = 0.99 max_steps = 200 # In[68]: class NeuralNetwork(nn.Module): def __init__…  ( 88 min )
  • Open

    Nothing going on here, nobody is becoming conscious...
    submitted by /u/TheExtimate [link] [comments]  ( 85 min )
    The face of grief as it seen by ruDALL-E Kandinsky
    submitted by /u/knight_hildebrandt [link] [comments]  ( 85 min )
    AI tool to write and explain Excel formulas (www.tersho.com)
    submitted by /u/apugoneappu [link] [comments]  ( 86 min )
    Website to generate Code Snippets, Regexes, Linux & Git & SQL Commands, HTML and CSS from a written description. Furthermore translate code snippets to many languages and get a regex explained in plain english. Moreover you can fix broken code snippets. All with the help of AI 🤖
    https://preview.redd.it/cla8bb3lqqf91.jpg?width=1256&format=pjpg&auto=webp&s=e277f9013fff22c6e2e12128c46058d0a81c1974 Programming Function from Description Code to Explanation Fix invalid Code Translate Languages Class from Description Get Language from Code Function from Docstring Helpers Regex from Description Regex to Explanation Linux Command Get time complexity Git Command from Description Database Text Description to SQL Command Web Generate HTML from Description CSS from Description Meta Tags from Description I think this could be helpful to a lot of people (especially for beginner programmers). You can check out all functionalities on your own here: programming-helper.com Have fun using the tool ❤️ submitted by /u/Capital_Revolution35 [link] [comments]  ( 86 min )
    Any good pixel art generators?
    I make Minecraft mods in my free time but am terrible at art so I always pay an artists to make pixel art sprites for me. Is there an AI I can use or pay for that will generate 16x16 high quality pixel art? I saw that Dalle-2 was very impressive at this but obviously it is not available to me : submitted by /u/Swftness503 [link] [comments]  ( 86 min )
    Found a nice experiment on using sensor fusion and machine learning to detect smoke!
    Found a nice experiment on using sensor fusion and machine learning to detect smoke and get notified if the fire starts. Check this out: https://www.hackster.io/stefanblattmann/real-time-smoke-detection-with-ai-based-sensor-fusion-1086e6 submitted by /u/Potsieramirez [link] [comments]  ( 86 min )
    A Semantic Search Engine for Python ML Docs
    So I’ve been getting tired of googling and getting stackoverflow when I already know what library I want, and not being able to search those libraries docs because of their rudimentary keyword based searches. Thus, I decided to make a search tool for open-source python libraries (with a focus on ML libraries, since that’s mostly what I work on) thats curated for actual developers and permits natural language queries. I’m gonna keep this free as long as I can, so it'd be wonderful to get feedback from anybody who'd be up to give it a try. Check it out at https://www.pysearch.com and please feel free to share with anybody else you know who might benefit from this! submitted by /u/oodmb [link] [comments]  ( 86 min )
    high resolution AI Art Generator
    are there any AI-image generator platforms to produce higher resolution pictures? I'm talking around 3000x3000 that I could use for a commercial project. Willing to pay. How does the copyright / ownership of the project work? submitted by /u/hampark [link] [comments]  ( 86 min )
    When AI is the inventor who gets the patent?
    submitted by /u/originalmetaverse [link] [comments]  ( 86 min )
    Buddha praying surrounded by angels -. Midjourney
    submitted by /u/manomanolito [link] [comments]  ( 85 min )
    Cosmic Canal By BeyondImagination
    submitted by /u/widgia [link] [comments]  ( 91 min )
    AI Dream 69 - Short AI Animation Bubble Nebula
    submitted by /u/LordPewPew777 [link] [comments]  ( 90 min )
    looking for something to generate image from my own image library
    I've been looking around but haven't found anything to he able to input a library if my own images and then have a new image generated from the set ... I'm not a coder, I do have some rudimentary skills to be able to install and run something ... but I have delved into it and still haven't found something that does this..plenty of text to image, generate images based on an already trained library...but what I really want is to throw a few hundred images into something and have it spit out a high-res image based in my own inputs. Any suggestions for what I can use for this? submitted by /u/IrikanjiToys [link] [comments]  ( 87 min )
    How Gran Turismo 7's 'Sophy' AI Actually Works
    submitted by /u/GET_TUDA_CHOPPA [link] [comments]  ( 90 min )
    Amazon’s 20B-Parameter Alexa Model Sets New Marks In Few-Shot Learning Along With Low Carbon Footprint During Training (One-Fifth of GPT-3’s)
    Some of the most significant developments in AI have come through supervised learning. It speaks about computer learning models that have been trained using annotated data. However, reliance on data annotation is increasingly untenable as the size of commercial AI models grows. The new paradigm of generalizable intelligence, in which models can pick up new ideas and transfer knowledge from one language or task to another without much human input, is being investigated by researchers at Alexa AI. These models enable researchers to create new features and enhance Alexa across several languages quickly. As part of this change, Amazon has introduced Alexa Teacher Models (AlexaTM), which are massive transformer-based multilingual language models. Without additional human guidance, AlexaTM can learn a task in a new language with just a few instances and pick it up quickly. ✅ With an encoder-decoder architecture — rather than decoder only — the Alexa Teacher Model excels other large language models on few-shot tasks such as summarization and machine translation. ✅ AlexaTM 20B also tops GPT-3 by being multilingual, supporting Arabic, English, French, German, Hindi, Italian, Japanese, Marathi, Portuguese, Spanish, Tamil, and Telugu. ✅ Its carbon footprint during training is only one-fifth of GPT-3’s Continue reading| Checkout the paper submitted by /u/ai-lover [link] [comments]  ( 91 min )
    Spiral Galaxy
    submitted by /u/nalr00n [link] [comments]  ( 85 min )
  • Open

    Dive Into AI, Avatars and the Metaverse With NVIDIA at SIGGRAPH
    Innovative technologies in AI, virtual worlds and digital humans are shaping the future of design and content creation across every industry. Experience the latest advances from NVIDIA in all these areas at SIGGRAPH, the world’s largest gathering of computer graphics experts, running Aug. 8-11. At the conference, creators, developers, engineers, researchers and students will see Read article > The post Dive Into AI, Avatars and the Metaverse With NVIDIA at SIGGRAPH appeared first on NVIDIA Blog.  ( 6 min )
    What Is Direct and Indirect Lighting?
    Imagine hiking to a lake on a summer day — sitting under a shady tree and watching the water gleam under the sun. In this scene, the differences between light and shadow are examples of direct and indirect lighting. The sun shines onto the lake and the trees, making the water look like it’s shimmering Read article > The post What Is Direct and Indirect Lighting? appeared first on NVIDIA Blog.  ( 8 min )
    Pinterest Boosts Home Feed Engagement 16% With Switch to GPU Acceleration of Recommenders
    Pinterest has engineered a way to serve its photo-sharing community more of the images they love. The social-image service, with more than 400 million monthly active users, has trained bigger recommender models for improved accuracy at predicting people’s interests. Pinterest handles hundreds of millions of user requests an hour on any given day. And it Read article > The post Pinterest Boosts Home Feed Engagement 16% With Switch to GPU Acceleration of Recommenders appeared first on NVIDIA Blog.  ( 6 min )
    Rush Into August This GFN Thursday With 38 New Games on GeForce NOW
    It’s the first GFN Thursday of the month and you know the drill — GeForce NOW is bringing a big batch of games to the cloud. Get ready for 38 exciting titles like Saints Row and Rumbleverse arriving on the GeForce NOW library in August. Members can kick off the month streaming 13 new games Read article > The post Rush Into August This GFN Thursday With 38 New Games on GeForce NOW appeared first on NVIDIA Blog.  ( 6 min )
  • Open

    Introducing the Google Universal Image Embedding Challenge
    Posted by Bingyi Cao, Software Engineer, Google Research, and Mário Lipovský, Software Engineer, Google Lens Computer vision models see daily application for a wide variety of tasks, ranging from object recognition to image-based 3D object reconstruction. One challenging type of computer vision problem is instance-level recognition (ILR) — given an image of an object, the task is to not only determine the generic category of an object (e.g., an arch), but also the specific instance of the object (”Arc de Triomphe de l'Étoile, Paris, France”). Previously, ILR was tackled using deep learning approaches. First, a large set of images was collected. Then a deep model was trained to embed each image into a high-dimensional space where similar images have similar representations. Finally, the …  ( 25 min )
  • Open

    Optimal pricing for maximum profit using Amazon SageMaker
    This is a guest post by Viktor Enrico Jeney, Senior Machine Learning Engineer at Adspert. Adspert is a Berlin-based ISV that developed a bid management tool designed to automatically optimize performance marketing and advertising campaigns. The company’s core principle is to automate maximization of profit of ecommerce advertising with the help of artificial intelligence. The […]  ( 11 min )
  • Open

    New Search Engine for Python ML Docs
    submitted by /u/oodmb [link] [comments]  ( 85 min )
  • Open

    How to disappear a platypus
    I was testing DALL-E 2 to see if it would be subject to some common incorrect assumptions about the sizes of things. For example if you asked people what size a kiwi bird is, they tend to assume it's a smallish bird, maybe around the size of a  ( 4 min )
    Bonus: There can be only one
    AI Weirdness: the strange side of machine learning  ( 2 min )
  • Open

    Practical Intro to Docker for Data Scientists
    If you can build a Machine Learning model — you should be able to deploy it  ( 13 min )
    The Role of Artificial Intelligence in The Packaging Industry
    Artificial intelligence (AI) is a technology that can be used in many different industries to help businesses achieve their goals. For…  ( 12 min )
  • Open

    Combinatorial Causal Bandits. (arXiv:2206.01995v2 [cs.LG] UPDATED)
    In combinatorial causal bandits (CCB), the learning agent chooses at most $K$ variables in each round to intervene, collects feedback from the observed variables, with the goal of minimizing expected regret on the target variable $Y$. Different from all prior studies on causal bandits, CCB needs to deal with exponentially large action space. We study under the context of binary generalized linear models (BGLMs) with a succinct parametric representation of the causal models. We present the algorithm BGLM-OFU for Markovian BGLMs (i.e. no hidden variables) based on the maximum likelihood estimation method, and show that it achieves $O(\sqrt{T}\log T)$ regret, where $T$ is the time horizon. For the special case of linear models with hidden variables, we apply causal inference techniques such as the do-calculus to convert the original model into a Markovian model, and then show that our BGLM-OFU algorithm and another algorithm based on the linear regression both solve such linear models with hidden variables. Our novelty includes (a) considering the combinatorial intervention action space, (b) considering general causal models including ones with hidden variables, (c) integrating and adapting techniques from diverse studies such as generalized linear bandits and online influence maximization, and (d) not relying on unrealistic assumptions such as knowing the joint distribution of the parents of $Y$ under all interventions used in some prior studies.
    GraphFramEx: Towards Systematic Evaluation of Explainability Methods for Graph Neural Networks. (arXiv:2206.09677v3 [cs.LG] UPDATED)
    As one of the most popular machine learning models today, graph neural networks (GNNs) have attracted intense interest recently, and so does their explainability. Users are increasingly interested in a better understanding of GNN models and their outcomes. Unfortunately, today's evaluation frameworks for GNN explainability often rely on synthetic datasets, leading to conclusions of limited scope due to a lack of complexity in the problem instances. As GNN models are deployed to more mission-critical applications, we are in dire need for a common evaluation protocol of explainability methods of GNNs. In this paper, we propose, to our best knowledge, the first systematic evaluation framework for GNN explainability, considering explainability on three different "user needs:" explanation focus, mask nature, and mask transformation. We propose a unique metric that combines the fidelity measures and classify explanations based on their quality of being sufficient or necessary. We scope ourselves to node classification tasks and compare the most representative techniques in the field of input-level explainability for GNNs. For the widely used synthetic benchmarks, surprisingly shallow techniques such as personalized PageRank have the best performance for a minimum computation time. But when the graph structure is more complex and nodes have meaningful features, gradient-based methods, in particular Saliency, are the best according to our evaluation criteria. However, none dominates the others on all evaluation dimensions and there is always a trade-off. We further apply our evaluation protocol in a case study on eBay graphs to reflect the production environment.
    Beyond neural scaling laws: beating power law scaling via data pruning. (arXiv:2206.14486v2 [cs.LG] UPDATED)
    Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning. However, these improvements through scaling alone require considerable costs in compute and energy. Here we focus on the scaling of error with dataset size and show how both in theory and practice we can break beyond power law scaling and reduce it to exponential scaling instead if we have access to a high-quality data pruning metric that ranks the order in which training examples should be discarded to achieve any pruned dataset size. We then test this new exponential scaling prediction with pruned dataset size empirically, and indeed observe better than power law scaling performance on ResNets trained on CIFAR-10, SVHN, and ImageNet. Given the importance of finding high-quality pruning metrics, we perform the first large-scale benchmarking study of ten different data pruning metrics on ImageNet. We find most existing high performing metrics scale poorly to ImageNet, while the best are computationally intensive and require labels for every image. We therefore developed a new simple, cheap and scalable self-supervised pruning metric that demonstrates comparable performance to the best supervised metrics. Overall, our work suggests that the discovery of good data-pruning metrics may provide a viable path forward to substantially improved neural scaling laws, thereby reducing the resource costs of modern deep learning.
    Deep Learning-Enabled Semantic Communication Systems with Task-Unaware Transmitter and Dynamic Data. (arXiv:2205.00271v2 [cs.IT] UPDATED)
    Existing deep learning-enabled semantic communication systems often rely on shared background knowledge between the transmitter and receiver that includes empirical data and their associated semantic information. In practice, the semantic information is defined by the pragmatic task of the receiver and cannot be known to the transmitter. The actual observable data at the transmitter can also have non-identical distribution with the empirical data in the shared background knowledge library. To address these practical issues, this paper proposes a new neural network-based semantic communication system for image transmission, where the task is unaware at the transmitter and the data environment is dynamic. The system consists of two main parts, namely the semantic coding (SC) network and the data adaptation (DA) network. The SC network learns how to extract and transmit the semantic information using a receiver-leading training process. By using the domain adaptation technique from transfer learning, the DA network learns how to convert the data observed into a similar form of the empirical data that the SC network can process without retraining. Numerical experiments show that the proposed method can be adaptive to observable datasets while keeping high performance in terms of both data recovery and task execution.
    Naive Few-Shot Learning: Sequence Consistency Evaluation. (arXiv:2205.12013v2 [cs.AI] UPDATED)
    Cognitive psychologists often use the term $\textit{fluid intelligence}$ to describe the ability of humans to solve novel tasks without any prior training. In contrast to humans, deep neural networks can perform cognitive tasks only after extensive (pre-)training with a large number of relevant examples. Motivated by fluid intelligence research in the cognitive sciences, we built a benchmark task which we call sequence consistency evaluation (SCE) that can be used to address this gap. Solving the SCE task requires the ability to extract simple rules from sequences, a basic computation that in humans, is required for solving various intelligence tests. We tested $\textit{untrained}$ (naive) deep learning models in the SCE task. Specifically, we tested two networks that can learn latent relations, Relation Networks (RN) and Contrastive Predictive Coding (CPC). We found that the latter, which imposes a causal structure on the latent relations performs better. We then show that naive few-shot learning of sequences can be successfully used for anomaly detection in two different tasks, visual and auditory, without any prior training.
    Free Energy Evaluation Using Marginalized Annealed Importance Sampling. (arXiv:2204.03784v2 [stat.ML] UPDATED)
    The evaluation of the free energy of a stochastic model is considered a significant issue in various fields of physics and machine learning. However, the exact free energy evaluation is computationally infeasible because the free energy expression includes an intractable partition function. Annealed importance sampling (AIS) is a type of importance sampling based on the Markov chain Monte Carlo method that is similar to a simulated annealing and can effectively approximate the free energy. This study proposes an AIS-based approach, which is referred to as marginalized AIS (mAIS). The statistical efficiency of mAIS is investigated in detail based on theoretical and numerical perspectives. Based on the investigation, it is proved that mAIS is more effective than AIS under a certain condition.
    Machine Learning Training on a Real Processing-in-Memory System. (arXiv:2206.06022v2 [cs.AR] UPDATED)
    Training machine learning algorithms is a computationally intensive process, which is frequently memory-bound due to repeatedly accessing large training datasets. As a result, processor-centric systems (e.g., CPU, GPU) suffer from costly data movement between memory units and processing units, which consumes large amounts of energy and execution cycles. Memory-centric computing systems, i.e., computing systems with processing-in-memory (PIM) capabilities, can alleviate this data movement bottleneck. Our goal is to understand the potential of modern general-purpose PIM architectures to accelerate machine learning training. To do so, we (1) implement several representative classic machine learning algorithms (namely, linear regression, logistic regression, decision tree, K-means clustering) on a real-world general-purpose PIM architecture, (2) characterize them in terms of accuracy, performance and scaling, and (3) compare to their counterpart implementations on CPU and GPU. Our experimental evaluation on a memory-centric computing system with more than 2500 PIM cores shows that general-purpose PIM architectures can greatly accelerate memory-bound machine learning workloads, when the necessary operations and datatypes are natively supported by PIM hardware. To our knowledge, our work is the first one to evaluate training of machine learning algorithms on a real-world general-purpose PIM architecture.
    Eliciting and Learning with Soft Labels from Every Annotator. (arXiv:2207.00810v2 [cs.LG] UPDATED)
    The labels used to train machine learning (ML) models are of paramount importance. Typically for ML classification tasks, datasets contain hard labels, yet learning using soft labels has been shown to yield benefits for model generalization, robustness, and calibration. Earlier work found success in forming soft labels from multiple annotators' hard labels; however, this approach may not converge to the best labels and necessitates many annotators, which can be expensive and inefficient. We focus on efficiently eliciting soft labels from individual annotators. We collect and release a dataset of soft labels for CIFAR-10 via a crowdsourcing study ($N=248$). We demonstrate that learning with our labels achieves comparable model performance to prior approaches while requiring far fewer annotators. Our elicitation methodology therefore shows promise towards enabling practitioners to enjoy the benefits of improved model performance and reliability with fewer annotators, and serves as a guide for future dataset curators on the benefits of leveraging richer information, such as categorical uncertainty, from individual annotators.
    TSEM: Temporally Weighted Spatiotemporal Explainable Neural Network for Multivariate Time Series. (arXiv:2205.13012v2 [cs.LG] UPDATED)
    Deep learning has become a one-size-fits-all solution for technical and business domains thanks to its flexibility and adaptability. It is implemented using opaque models, which unfortunately undermines the outcome trustworthiness. In order to have a better understanding of the behavior of a system, particularly one driven by time series, a look inside a deep learning model so-called posthoc eXplainable Artificial Intelligence (XAI) approaches, is important. There are two major types of XAI for time series data, namely model-agnostic and model-specific. Model-specific approach is considered in this work. While other approaches employ either Class Activation Mapping (CAM) or Attention Mechanism, we merge the two strategies into a single system, simply called the Temporally Weighted Spatiotemporal Explainable Neural Network for Multivariate Time Series (TSEM). TSEM combines the capabilities of RNN and CNN models in such a way that RNN hidden units are employed as attention weights for the CNN feature maps temporal axis. The result shows that TSEM outperforms XCM. It is similar to STAM in terms of accuracy, while also satisfying a number of interpretability criteria, including causality, fidelity, and spatiotemporality.
    Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding. (arXiv:2206.15427v2 [eess.AS] UPDATED)
    This paper studies a transferable phoneme embedding framework that aims to deal with the cross-lingual text-to-speech (TTS) problem under the few-shot setting. Transfer learning is a common approach when it comes to few-shot learning since training from scratch on few-shot training data is bound to overfit. Still, we find that the naive transfer learning approach fails to adapt to unseen languages under extremely few-shot settings, where less than 8 minutes of data is provided. We deal with the problem by proposing a framework that consists of a phoneme-based TTS model and a codebook module to project phonemes from different languages into a learned latent space. Furthermore, by utilizing phoneme-level averaged self-supervised learned features, we effectively improve the quality of synthesized speeches. Experiments show that using 4 utterances, which is about 30 seconds of data, is enough to synthesize intelligible speech when adapting to an unseen language using our framework.
    FlowNet-PET: Unsupervised Learning to Perform Respiratory Motion Correction in PET Imaging. (arXiv:2205.14147v3 [eess.IV] UPDATED)
    To correct for respiratory motion in PET imaging, an interpretable and unsupervised deep learning technique, FlowNet-PET, was constructed. The network was trained to predict the optical flow between two PET frames from different breathing amplitude ranges. The trained model aligns different retrospectively-gated PET images, providing a final image with similar counting statistics as a non-gated image, but without the blurring effects. FlowNet-PET was applied to anthropomorphic digital phantom data, which provided the possibility to design robust metrics to quantify the corrections. When comparing the predicted optical flows to the ground truths, the median absolute error was found to be smaller than the pixel and slice widths. The improvements were illustrated by comparing against images without motion and computing the intersection over union (IoU) of the tumors as well as the enclosed activity and coefficient of variation (CoV) within the no-motion tumor volume before and after the corrections were applied. The average relative improvements provided by the network were 64%, 89%, and 75% for the IoU, total activity, and CoV, respectively. FlowNet-PET achieved similar results as the conventional retrospective phase binning approach, but only required one sixth of the scan duration. The code and data have been made publicly available (https://github.com/teaghan/FlowNet_PET).
    Explainable Artificial Intelligence in Process Mining: Assessing the Explainability-Performance Trade-Off in Outcome-Oriented Predictive Process Monitoring. (arXiv:2203.16073v2 [cs.LG] UPDATED)
    Recently, a shift has been made in the field of Outcome-Oriented Predictive Process Monitoring (OOPPM) to use models from the eXplainable Artificial Intelligence paradigm, however the evaluation still occurs mainly through performance-based metrics not accounting for the implications and lack of actionability of the explanations. In this paper, we define explainability by the interpretability of the explanations (through the widely-used XAI properties parsimony and functional complexity) and the faithfulness of the explainability model (through monotonicity and level of disagreement). The introduced properties are analysed along the event, case, and control flow perspective that are typical of a process-based analysis. This allows to quantitatively compare, inter alia, inherently created explanations (e.g., logistic regression coefficients) with post-hoc explanations (e.g., Shapley values). Moreover, this paper contributes a guideline named X-MOP to practitioners to select the appropriate model based on the event log specifications and the task at hand, by providing insight into how the varying preprocessing, model complexity and post-hoc explainability techniques typical in OOPPM influence the explainability of the model. To this end, we benchmark seven classifiers on thirteen real-life events logs.
    AUC Maximization in the Era of Big Data and AI: A Survey. (arXiv:2203.15046v3 [cs.LG] UPDATED)
    Area under the ROC curve, a.k.a. AUC, is a measure of choice for assessing the performance of a classifier for imbalanced data. AUC maximization refers to a learning paradigm that learns a predictive model by directly maximizing its AUC score. It has been studied for more than two decades dating back to late 90s and a huge amount of work has been devoted to AUC maximization since then. Recently, stochastic AUC maximization for big data and deep AUC maximization for deep learning have received increasing attention and yielded dramatic impact for solving real-world problems. However, to the best our knowledge there is no comprehensive survey of related works for AUC maximization. This paper aims to address the gap by reviewing the literature in the past two decades. We not only give a holistic view of the literature but also present detailed explanations and comparisons of different papers from formulations to algorithms and theoretical guarantees. We also identify and discuss remaining and emerging issues for deep AUC maximization, and provide suggestions on topics for future work.
    Policy Evaluation for Temporal and/or Spatial Dependent Experiments in Ride-sourcing Platforms. (arXiv:2202.10887v2 [stat.ME] UPDATED)
    Policy evaluation based on A/B testing has attracted considerable interest in digital marketing, but such evaluation in ride-sourcing platforms (e.g., Uber and Didi) is not well studied primarily due to the complex structure of their temporal and/or spatial dependent experiments. Motivated by policy evaluation in ride-sourcing platforms, the aim of this paper is to establish causal relationship between platform's policies and outcomes of interest under a switchback design. We propose a novel potential outcome framework based on a temporal varying coefficient decision process (VCDP) model to capture the dynamic treatment effects in temporal dependent experiments. We further characterize the average treatment effect by decomposing it as the sum of direct effect (DE) and indirect effect (IE). We develop estimation and inference procedures for both DE and IE. Furthermore, we propose a spatio-temporal VCDP to deal with spatiotemporal dependent experiments. For both VCDP models, we establish the statistical properties (e.g., weak convergence and asymptotic power) of our estimation and inference procedures. We conduct extensive simulations to investigate the finite-sample performance of the proposed estimation and inference procedures. We examine how our VCDP models can help improve policy evaluation for various dispatching and dispositioning policies in Didi.
    Spatial Autoregressive Coding for Graph Neural Recommendation. (arXiv:2205.09489v2 [cs.IR] UPDATED)
    Graph embedding methods including traditional shallow models and deep Graph Neural Networks (GNNs) have led to promising applications in recommendation. Nevertheless, shallow models especially random-walk-based algorithms fail to adequately exploit neighbor proximity in sampled subgraphs or sequences due to their optimization paradigm. GNN-based algorithms suffer from the insufficient utilization of high-order information and easily cause over-smoothing problems when stacking too much layers, which may deteriorate the recommendations of low-degree (long-tail) items, limiting the expressiveness and scalability. In this paper, we propose a novel framework SAC, namely Spatial Autoregressive Coding, to solve the above problems in a unified way. To adequately leverage neighbor proximity and high-order information, we design a novel spatial autoregressive paradigm. Specifically, we first randomly mask multi-hop neighbors and embed the target node by integrating all other surrounding neighbors with an explicit multi-hop attention. Then we reinforce the model to learn a neighbor-predictive coding for the target node by contrasting the coding and the masked neighbors' embedding, equipped with a new hard negative sampling strategy. To learn the minimal sufficient representation for the target-to-neighbor prediction task and remove the redundancy of neighbors, we devise Neighbor Information Bottleneck by maximizing the mutual information between target predictive coding and the masked neighbors' embedding, and simultaneously constraining those between the coding and surrounding neighbors' embedding. Experimental results on both public recommendation datasets and a real scenario web-scale dataset Douyin-Friend-Recommendation demonstrate the superiority of SAC compared with state-of-the-art methods.
    STEADY: Simultaneous State Estimation and Dynamics Learning from Indirect Observations. (arXiv:2203.01299v2 [cs.RO] UPDATED)
    Accurate kinodynamic models play a crucial role in many robotics applications such as off-road navigation and high-speed driving. Many state-of-the-art approaches in learning stochastic kinodynamic models, however, require precise measurements of robot states as labeled input/output examples, which can be hard to obtain in outdoor settings due to limited sensor capabilities and the absence of ground truth. In this work, we propose a new technique for learning neural stochastic kinodynamic models from noisy and indirect observations by performing simultaneous state estimation and dynamics learning. The proposed technique iteratively improves the kinodynamic model in an expectation-maximization loop, where the E Step samples posterior state trajectories using particle filtering, and the M Step updates the dynamics to be more consistent with the sampled trajectories via stochastic gradient ascent. We evaluate our approach on both simulation and real-world benchmarks and compare it with several baseline techniques. Our approach not only achieves significantly higher accuracy but is also more robust to observation noise, thereby showing promise for boosting the performance of many other robotics applications.
    auton-survival: an Open-Source Package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Event Data. (arXiv:2204.07276v4 [cs.LG] UPDATED)
    Applications of machine learning in healthcare often require working with time-to-event prediction tasks including prognostication of an adverse event, re-hospitalization or death. Such outcomes are typically subject to censoring due to loss of follow up. Standard machine learning methods cannot be applied in a straightforward manner to datasets with censored outcomes. In this paper, we present auton-survival, an open-source repository of tools to streamline working with censored time-to-event or survival data. auton-survival includes tools for survival regression, adjustment in the presence of domain shift, counterfactual estimation, phenotyping for risk stratification, evaluation, as well as estimation of treatment effects. Through real world case studies employing a large subset of the SEER oncology incidence data, we demonstrate the ability of auton-survival to rapidly support data scientists in answering complex health and epidemiological questions.
    Robust Training under Label Noise by Over-parameterization. (arXiv:2202.14026v2 [cs.LG] UPDATED)
    Recently, over-parameterized deep networks, with increasingly more network parameters than training samples, have dominated the performances of modern machine learning. However, when the training data is corrupted, it has been well-known that over-parameterized networks tend to overfit and do not generalize. In this work, we propose a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted. The main idea is yet very simple: label noise is sparse and incoherent with the network learned from clean data, so we model the noise and learn to separate it from the data. Specifically, we model the label noise via another sparse over-parameterization term, and exploit implicit algorithmic regularizations to recover and separate the underlying corruptions. Remarkably, when trained using such a simple method in practice, we demonstrate state-of-the-art test accuracy against label noise on a variety of real datasets. Furthermore, our experimental results are corroborated by theory on simplified linear models, showing that exact separation between sparse noise and low-rank data can be achieved under incoherent conditions. The work opens many interesting directions for improving over-parameterized models by using sparse over-parameterization and implicit regularization.
    Physics Constrained Flow Neural Network for Short-Timescale Predictions in Data Communications Networks. (arXiv:2112.12321v2 [cs.LG] UPDATED)
    Machine learning is gaining growing momentum in various recent models for the dynamic analysis of information flows in data communications networks. These preliminary models often rely on off-the-shelf learning models to predict from historical statistics while disregarding the physics governing the generating behaviors of these flows. This paper instead introduces Flow Neural Network (FlowNN) to improve the feature representation with learned physical bias. This is implemented by an induction layer, working upon the embedding layer, to impose the physics connected data correlations, and a self-supervised learning strategy with stop-gradient to make the learned physics universal. For the short-timescale network prediction tasks, FlowNN achieves 17% - 71% of loss decrease than the state-of-the-art baselines on both synthetic and real-world networking datasets, which shows the strength of this new approach. Code will be made available.
    Stochastic Gradient Line Bayesian Optimization for Efficient Noise-Robust Optimization of Parameterized Quantum Circuits. (arXiv:2111.07952v2 [quant-ph] UPDATED)
    Optimizing parameterized quantum circuits is a key routine in using near-term quantum devices. However, the existing algorithms for such optimization require an excessive number of quantum-measurement shots for estimating expectation values of observables and repeating many iterations, whose cost has been a critical obstacle for practical use. We develop an efficient alternative optimization algorithm, stochastic gradient line Bayesian optimization (SGLBO), to address this problem. SGLBO reduces the measurement-shot cost by estimating an appropriate direction of updating circuit parameters based on stochastic gradient descent (SGD) and further utilizing Bayesian optimization (BO) to estimate the optimal step size for each iteration in SGD. In addition, we formulate an adaptive measurement-shot strategy and introduce a technique of suffix averaging to reduce the effect of statistical and hardware noise. Our numerical simulation demonstrates that the SGLBO augmented with these techniques can drastically reduce the measurement-shot cost, improve the accuracy, and make the optimization noise-robust.
    Mapping Research Topics in Software Testing: A Bibliometric Analysis. (arXiv:2109.04086v3 [cs.DL] UPDATED)
    Background: The field of software testing is growing and rapidly-evolving. Aims: Based on keywords assigned to publications, we seek to identify predominant research topics and understand how they are connected and have evolved. Method: We apply co-word analysis to map the topology of testing research as a network where author-assigned keywords are connected by edges indicating co-occurrence in publications. Keywords are clustered based on edge density and frequency of connection. We examine the most popular keywords, summarize clusters into high-level research topics, examine how topics connect, and examine how the field is changing. Results: Testing research can be divided into 16 high-level topics and 18 subtopics. Creation guidance, automated test generation, evolution and maintenance, and test oracles have particularly strong connections to other topics, highlighting their multidisciplinary nature. Emerging keywords relate to web and mobile apps, machine learning, energy consumption, automated program repair and test generation, while emerging connections have formed between web apps, test oracles, and machine learning with many topics. Random and requirements-based testing show potential decline. Conclusions: Our observations, advice, and map data offer a deeper understanding of the field and inspiration regarding challenges and connections to explore.
    Laplacian Features for Learning with Hyperbolic Space. (arXiv:2202.06854v2 [cs.LG] UPDATED)
    Due to its geometric properties, hyperbolic space can support high-fidelity embeddings of tree- and graph-structured data. As a result, various hyperbolic networks have been developed which outperform Euclidean networks on many tasks: e.g. hyperbolic graph convolutional networks (GCN) can outperform vanilla GCN on some graph learning tasks. However, most existing hyperbolic networks are complicated, computationally expensive, and numerically unstable -- and they cannot scale to large graphs due to these shortcomings. With more and more hyperbolic networks proposed, it is becoming less and less clear what key component is necessary to make the model behave. In this paper, we propose HyLa, a simple and minimal approach to using hyperbolic space in networks: HyLa maps once from a hyperbolic-space embedding to Euclidean space via the eigenfunctions of the Laplacian operator in the hyperbolic space. We evaluate HyLa on graph learning tasks including node classification and text classification, where HyLa can be used together with any graph neural networks. When used with a linear model, HyLa shows significant improvements over hyperbolic networks and other baselines.
    Revisiting local branching with a machine learning lens. (arXiv:2112.02195v2 [math.OC] UPDATED)
    Finding high-quality solutions to mixed-integer linear programming problems (MILPs) is of great importance for many practical applications. In this respect, the refinement heuristic local branching (LB) has been proposed to produce improving solutions and has been highly influential for the development of local search methods in MILP. The algorithm iteratively explores a sequence of solution neighborhoods defined by the so-called local branching constraint, namely, a linear inequality limiting the distance from a reference solution. For a LB algorithm, the choice of the neighborhood size is critical to performance. In this work, we study the relation between the size of the search neighborhood and the behavior of the underlying LB algorithm, and we devise a leaning based framework for predicting the best size for the specific instance to be solved. Furthermore, we have also investigated the relation between the time limit for exploring the LB neighborhood and the actual performance of LB scheme, and devised a strategy for adapting the time limit. We computationally show that the neighborhood size and time limit can indeed be learned, leading to improved performances and that the overall algorithm generalizes well both with respect to the instance size and, remarkably, across instances.
    Automatic Meta-Path Discovery for Effective Graph-Based Recommendation. (arXiv:2112.12845v3 [cs.IR] UPDATED)
    Heterogeneous Information Networks (HINs) are labeled graphs that depict relationships among different types of entities (e.g., users, movies and directors). For HINs, meta-path-based recommenders (MPRs) utilize meta-paths (i.e., abstract paths consisting of node and link types) to predict user preference, and have attracted a lot of attention due to their explainability and performance. We observe that the performance of MPRs is highly sensitive to the meta-paths they use, but existing works manually select the meta-paths from many possible ones. Thus, to discover effective meta-paths automatically, we propose the Reinforcement learning-based Meta-path Selection (RMS) framework. Specifically, we define a vector encoding for meta-paths and design a policy network to extend meta-paths. The policy network is trained based on the results of downstream recommendation tasks and an early stopping approximation strategy is proposed to speed up training. RMS is a general model, and it can work with all existing MPRs. We also propose a new MPR called RMS-HRec, which uses an attention mechanism to aggregate information from the meta-paths. We conduct extensive experiments on real datasets. Compared with the manually selected meta-paths, the meta-paths identified by RMS consistently improve recommendation quality. Moreover, RMS-HRec outperforms state-of-the-art recommender systems by an average of 7% in hit ratio. The codes and datasets are available on https://github.com/Stevenn9981/RMS-HRec.
    Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process. (arXiv:2202.10589v4 [stat.ML] UPDATED)
    This paper is concerned with constructing a confidence interval for a target policy's value offline based on a pre-collected observational data in infinite horizon settings. Most of the existing works assume no unmeasured variables exist that confound the observed actions. This assumption, however, is likely to be violated in real applications such as healthcare and technological industries. In this paper, we show that with some auxiliary variables that mediate the effect of actions on the system dynamics, the target policy's value is identifiable in a confounded Markov decision process. Based on this result, we develop an efficient off-policy value estimator that is robust to potential model misspecification and provide rigorous uncertainty quantification. Our method is justified by theoretical results, simulated and real datasets obtained from ridesharing companies. A Python implementation of the proposed procedure is available at https://github.com/Mamba413/cope.
    Spectral Propagation Graph Network for Few-shot Time Series Classification. (arXiv:2202.04769v2 [cs.LG] UPDATED)
    Few-shot Time Series Classification (few-shot TSC) is a challenging problem in time series analysis. It is more difficult to classify when time series of the same class are not completely consistent in spectral domain or time series of different classes are partly consistent in spectral domain. To address this problem, we propose a novel method named Spectral Propagation Graph Network (SPGN) to explicitly model and propagate the spectrum-wise relations between different time series with graph network. To the best of our knowledge, SPGN is the first to utilize spectral comparisons in different intervals and involve spectral propagation across all time series with graph networks for few-shot TSC. SPGN first uses bandpass filter to expand time series in spectral domain for calculating spectrum-wise relations between time series. Equipped with graph networks, SPGN then integrates spectral relations with label information to make spectral propagation. The further study conveys the bi-directional effect between spectral relations acquisition and spectral propagation. We conduct extensive experiments on few-shot TSC benchmarks. SPGN outperforms state-of-the-art results by a large margin in $4\% \sim 13\%$. Moreover, SPGN surpasses them by around $12\%$ and $9\%$ under cross-domain and cross-way settings respectively.
    Generalized Out-of-Distribution Detection: A Survey. (arXiv:2110.11334v2 [cs.CV] UPDATED)
    Out-of-distribution (OOD) detection is critical to ensuring the reliability and safety of machine learning systems. For instance, in autonomous driving, we would like the driving system to issue an alert and hand over the control to humans when it detects unusual scenes or objects that it has never seen during training time and cannot make a safe decision. The term, OOD detection, first emerged in 2017 and since then has received increasing attention from the research community, leading to a plethora of methods developed, ranging from classification-based to density-based to distance-based ones. Meanwhile, several other problems, including anomaly detection (AD), novelty detection (ND), open set recognition (OSR), and outlier detection (OD), are closely related to OOD detection in terms of motivation and methodology. Despite common goals, these topics develop in isolation, and their subtle differences in definition and problem setting often confuse readers and practitioners. In this survey, we first present a unified framework called generalized OOD detection, which encompasses the five aforementioned problems, i.e., AD, ND, OSR, OOD detection, and OD. Under our framework, these five problems can be seen as special cases or sub-tasks, and are easier to distinguish. We then review each of these five areas by summarizing their recent technical developments, with a special focus on OOD detection methodologies. We conclude this survey with open challenges and potential research directions.
    List Autoencoder: Towards Deep Learning Based Reliable Transmission Over Noisy Channels. (arXiv:2112.11920v2 [cs.IT] UPDATED)
    In this paper, we present list autoencoder (listAE) to mimic list decoding used in classical coding theory. With listAE, the decoder network outputs a list of decoded message word candidates. To train the listAE, a genie is assumed to be available at the output of the decoder. A specific loss function is proposed to optimize the performance of a genie-aided (GA) list decoding. The listAE is a general framework and can be used with any AE architecture. We propose a specific architecture, referred to as incremental-redundancy AE (IR-AE), which decodes the received word on a sequence of component codes with non-increasing rates. Then, the listAE is trained and evaluated with both IR-AE and Turbo-AE. Finally, we employ cyclic redundancy check (CRC) codes to replace the genie at the decoder output and obtain a CRC aided (CA) list decoder. Our simulation results show that the IR-AE under CA list decoding demonstrates meaningful coding gain over Turbo-AE and polar code at low block error rates range.
    Bridging the Gap Between Object Detection and User Intent via Query-Modulation. (arXiv:2106.10258v2 [cs.CV] UPDATED)
    When interacting with objects through cameras, or pictures, users often have a specific intent. For example, they may want to perform a visual search. With most object detection models relying on image pixels as their sole input, undesired results are not uncommon. Most typically: lack of a high-confidence detection on the object of interest, or detection with a wrong class label. The issue is especially severe when operating capacity-constrained mobile object detectors on-device. In this paper we investigate techniques to modulate mobile detectors to explicitly account for the user intent, expressed as an embedding of a simple query. Compared to standard detectors, query-modulated detectors show superior performance at detecting objects for a given user query. Thanks to large-scale training data synthesized from standard object detection annotations, query-modulated detectors also outperform a specialized referring expression recognition system. Query-modulated detectors can also be trained to simultaneously solve for both localizing a user query and standard detection, even outperforming standard mobile detectors at the canonical COCO task.
    Scene Editing as Teleoperation: A Case Study in 6DoF Kit Assembly. (arXiv:2110.04450v3 [cs.RO] UPDATED)
    Studies in robot teleoperation have been centered around action specifications -- from continuous joint control to discrete end-effector pose control. However, these robot-centric interfaces often require skilled operators with extensive robotics expertise. To make teleoperation accessible to non-expert users, we propose the framework "Scene Editing as Teleoperation" (SEaT), where the key idea is to transform the traditional "robot-centric" interface into a "scene-centric" interface -- instead of controlling the robot, users focus on specifying the task's goal by manipulating digital twins of the real-world objects. As a result, a user can perform teleoperation without any expert knowledge of the robot hardware. To achieve this goal, we utilize a category-agnostic scene-completion algorithm that translates the real-world workspace (with unknown objects) into a manipulable virtual scene representation and an action-snapping algorithm that refines the user input before generating the robot's action plan. To train the algorithms, we procedurally generated a large-scale, diverse kit-assembly dataset that contains object-kit pairs that mimic real-world object-kitting tasks. Our experiments in simulation and on a real-world system demonstrate that our framework improves both the efficiency and success rate for 6DoF kit-assembly tasks. A user study demonstrates that SEaT framework participants achieve a higher task success rate and report a lower subjective workload compared to an alternative robot-centric interface. Video can be found at https://www.youtube.com/watch?v=-NdR3mkPbQQ .
    Efficiently Computing Nash Equilibria in Adversarial Team Markov Games. (arXiv:2208.02204v1 [cs.GT])
    Computing Nash equilibrium policies is a central problem in multi-agent reinforcement learning that has received extensive attention both in theory and in practice. However, provable guarantees have been thus far either limited to fully competitive or cooperative scenarios or impose strong assumptions that are difficult to meet in most practical applications. In this work, we depart from those prior results by investigating infinite-horizon \emph{adversarial team Markov games}, a natural and well-motivated class of games in which a team of identically-interested players -- in the absence of any explicit coordination or communication -- is competing against an adversarial player. This setting allows for a unifying treatment of zero-sum Markov games and Markov potential games, and serves as a step to model more realistic strategic interactions that feature both competing and cooperative interests. Our main contribution is the first algorithm for computing stationary $\epsilon$-approximate Nash equilibria in adversarial team Markov games with computational complexity that is polynomial in all the natural parameters of the game, as well as $1/\epsilon$. The proposed algorithm is particularly natural and practical, and it is based on performing independent policy gradient steps for each player in the team, in tandem with best responses from the side of the adversary; in turn, the policy for the adversary is then obtained by solving a carefully constructed linear program. Our analysis leverages non-standard techniques to establish the KKT optimality conditions for a nonlinear program with nonconvex constraints, thereby leading to a natural interpretation of the induced Lagrange multipliers. Along the way, we significantly extend an important characterization of optimal policies in adversarial (normal-form) team games due to Von Stengel and Koller (GEB `97).
    Can you hear me $\textit{now}$? Sensitive comparisons of human and machine perception. (arXiv:2003.12362v2 [eess.AS] UPDATED)
    The rise of machine-learning systems that process sensory input has brought with it a rise in comparisons between human and machine perception. But such comparisons face a challenge: Whereas machine perception of some stimulus can often be probed through direct and explicit measures, much of human perceptual knowledge is latent, incomplete, or unavailable for explicit report. Here, we explore how this asymmetry can cause such comparisons to misestimate the overlap in human and machine perception. As a case study, we consider human perception of \textit{adversarial speech} -- synthetic audio commands that are recognized as valid messages by automated speech-recognition systems but that human listeners reportedly hear as meaningless noise. In five experiments, we adapt task designs from the human psychophysics literature to show that even when subjects cannot freely transcribe such speech commands (the previous benchmark for human understanding), they often can demonstrate other forms of understanding, including discriminating adversarial speech from closely matched non-speech (Experiments 1--2), finishing common phrases begun in adversarial speech (Experiments 3--4), and solving simple math problems posed in adversarial speech (Experiment 5) -- even for stimuli previously described as unintelligible to human listeners. We recommend the adoption of such "sensitive tests" when comparing human and machine perception, and we discuss the broader consequences of such approaches for assessing the overlap between systems.
    Debiasing In-Sample Policy Performance for Small-Data, Large-Scale Optimization. (arXiv:2107.12438v4 [math.OC] UPDATED)
    Motivated by the poor performance of cross-validation in settings where data are scarce, we propose a novel estimator of the out-of-sample performance of a policy in data-driven optimization.Our approach exploits the optimization problem's sensitivity analysis to estimate the gradient of the optimal objective value with respect to the amount of noise in the data and uses the estimated gradient to debias the policy's in-sample performance. Unlike cross-validation techniques, our approach avoids sacrificing data for a test set, utilizes all data when training and, hence, is well-suited to settings where data are scarce. We prove bounds on the bias and variance of our estimator for optimization problems with uncertain linear objectives but known, potentially non-convex, feasible regions. For more specialized optimization problems where the feasible region is "weakly-coupled" in a certain sense, we prove stronger results. Specifically, we provide explicit high-probability bounds on the error of our estimator that hold uniformly over a policy class and depends on the problem's dimension and policy class's complexity. Our bounds show that under mild conditions, the error of our estimator vanishes as the dimension of the optimization problem grows, even if the amount of available data remains small and constant. Said differently, we prove our estimator performs well in the small-data, large-scale regime. Finally, we numerically compare our proposed method to state-of-the-art approaches through a case-study on dispatching emergency medical response services using real data. Our method provides more accurate estimates of out-of-sample performance and learns better-performing policies.
    Unified Framework for Spectral Dimensionality Reduction, Maximum Variance Unfolding, and Kernel Learning By Semidefinite Programming: Tutorial and Survey. (arXiv:2106.15379v2 [stat.ML] UPDATED)
    This is a tutorial and survey paper on unification of spectral dimensionality reduction methods, kernel learning by Semidefinite Programming (SDP), Maximum Variance Unfolding (MVU) or Semidefinite Embedding (SDE), and its variants. We first explain how the spectral dimensionality reduction methods can be unified as kernel Principal Component Analysis (PCA) with different kernels. This unification can be interpreted as eigenfunction learning or representation of kernel in terms of distance matrix. Then, since the spectral methods are unified as kernel PCA, we say let us learn the best kernel for unfolding the manifold of data to its maximum variance. We first briefly introduce kernel learning by SDP for the transduction task. Then, we explain MVU in detail. Various versions of supervised MVU using nearest neighbors graph, by class-wise unfolding, by Fisher criterion, and by colored MVU are explained. We also explain out-of-sample extension of MVU using eigenfunctions and kernel mapping. Finally, we introduce other variants of MVU including action respecting embedding, relaxed MVU, and landmark MVU for big data.
    RBNN: Memory-Efficient Reconfigurable Deep Binary Neural Network with IP Protection for Internet of Things. (arXiv:2105.03822v3 [cs.CR] UPDATED)
    Though deep neural network models exhibit outstanding performance for various applications, their large model size and extensive floating-point operations render deployment on mobile computing platforms a major challenge, and, in particular, on Internet of Things devices. One appealing solution is model quantization that reduces the model size and uses integer operations commonly supported by microcontrollers . To this end, a 1-bit quantized DNN model or deep binary neural network maximizes the memory efficiency, where each parameter in a BNN model has only 1-bit. In this paper, we propose a reconfigurable BNN (RBNN) to further amplify the memory efficiency for resource-constrained IoT devices. Generally, the RBNN can be reconfigured on demand to achieve any one of M (M>1) distinct tasks with the same parameter set, thus only a single task determines the memory requirements. In other words, the memory utilization is improved by times M. Our extensive experiments corroborate that up to seven commonly used tasks can co-exist (the value of M can be larger). These tasks with a varying number of classes have no or negligible accuracy drop-off on three binarized popular DNN architectures including VGG, ResNet, and ReActNet. The tasks span across different domains, e.g., computer vision and audio domains validated herein, with the prerequisite that the model architecture can serve those cross-domain tasks. To protect the intellectual property of an RBNN model, the reconfiguration can be controlled by both a user key and a device-unique root key generated by the intrinsic hardware fingerprint. By doing so, an RBNN model can only be used per paid user per authorized device, thus benefiting both the user and the model provider.
    Stable and Interpretable Unrolled Dictionary Learning. (arXiv:2106.00058v5 [cs.LG] UPDATED)
    The dictionary learning problem, representing data as a combination of a few atoms, has long stood as a popular method for learning representations in statistics and signal processing. The most popular dictionary learning algorithm alternates between sparse coding and dictionary update steps, and a rich literature has studied its theoretical convergence. The success of dictionary learning relies on access to a "good" initial estimate of the dictionary and the ability of the sparse coding step to provide an unbiased estimate of the code. The growing popularity of unrolled sparse coding networks has led to the empirical finding that backpropagation through such networks performs dictionary learning. We offer the theoretical analysis of these empirical results through PUDLE, a Provable Unrolled Dictionary LEarning method. We provide conditions on the network initialization and data distribution sufficient to recover and preserve the support of the latent code. Additionally, we address two challenges; first, the vanilla unrolled sparse coding computes a biased code estimate, and second, gradients during backpropagated learning can become unstable. We show approaches to reduce the bias of the code estimate in the forward pass, and that of the dictionary estimate in the backward pass. We propose strategies to resolve the learning instability by tuning network parameters and modifying the loss function. Overall, we highlight the impact of loss, unrolling, and backpropagation on convergence. We complement our findings through synthetic and image denoising experiments. Finally, we demonstrate PUDLE's interpretability, a driving factor in designing deep networks based on iterative optimizations, by building a mathematical relation between network weights, its output, and the training set.
    A first look into the carbon footprint of federated learning. (arXiv:2102.07627v4 [cs.LG] UPDATED)
    Despite impressive results, deep learning-based technologies also raise severe privacy and environmental concerns induced by the training procedure often conducted in data centers. In response, alternatives to centralized training such as Federated Learning (FL) have emerged. Perhaps unexpectedly, FL is starting to be deployed at a global scale by companies that must adhere to new legal demands and policies originating from governments and social groups advocating for privacy protection. \textit{However, the potential environmental impact related to FL remains unclear and unexplored. This paper offers the first-ever systematic study of the carbon footprint of FL.} First, we propose a rigorous model to quantify the carbon footprint, hence facilitating the investigation of the relationship between FL design and carbon emissions. Then, we compare the carbon footprint of FL to traditional centralized learning. Our findings show that, depending on the configuration, FL can emit up to two order of magnitude more carbon than centralized machine learning. However, in certain settings, it can be comparable to centralized learning due to the reduced energy consumption of embedded devices. We performed extensive experiments across different types of datasets, settings and various deep learning models with FL. Finally, we highlight and connect the reported results to the future challenges and trends in FL to reduce its environmental impact, including algorithms efficiency, hardware capabilities, and stronger industry transparency.
    A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis. (arXiv:2208.02189v1 [eess.AS])
    In human speech, the attitude of a speaker cannot be fully expressed only by the textual content. It has to come along with the intonation. Declarative questions are commonly used in daily Cantonese conversations, and they are usually uttered with rising intonation. Vanilla neural text-to-speech (TTS) systems are not capable of synthesizing rising intonation for these sentences due to the loss of semantic information. Though it has become more common to complement the systems with extra language models, their performance in modeling rising intonation is not well studied. In this paper, we propose to complement the Cantonese TTS model with a BERT-based statement/question classifier. We design different training strategies and compare their performance. We conduct our experiments on a Cantonese corpus named CanTTS. Empirical results show that the separate training approach obtains the best generalization performance and feasibility.
    Quantized Convolutional Neural Networks Through the Lens of Partial Differential Equations. (arXiv:2109.00095v2 [cs.LG] UPDATED)
    Quantization of Convolutional Neural Networks (CNNs) is a common approach to ease the computational burden involved in the deployment of CNNs, especially on low-resource edge devices. However, fixed-point arithmetic is not natural to the type of computations involved in neural networks. In this work, we explore ways to improve quantized CNNs using PDE-based perspective and analysis. First, we harness the total variation (TV) approach to apply edge-aware smoothing to the feature maps throughout the network. This aims to reduce outliers in the distribution of values and promote piece-wise constant maps, which are more suitable for quantization. Secondly, we consider symmetric and stable variants of common CNNs for image classification, and Graph Convolutional Networks (GCNs) for graph node-classification. We demonstrate through several experiments that the property of forward stability preserves the action of a network under different quantization rates. As a result, stable quantized networks behave similarly to their non-quantized counterparts even though they rely on fewer parameters. We also find that at times, stability even aids in improving accuracy. These properties are of particular interest for sensitive, resource-constrained, low-power or real-time applications like autonomous driving.
    Blockchain associated machine learning and IoT based hypoglycemia detection system with auto-injection feature. (arXiv:2208.02222v1 [cs.LG])
    Hypoglycemia is an unpleasant phenomenon caused by low blood glucose. The disease can lead a person to death or a high level of body damage. To avoid significant damage, patients need sugar. The research aims at implementing an automatic system to detect hypoglycemia and perform automatic sugar injections to save a life. Receiving the benefits of the internet of things (IoT), the sensor data was transferred using the hypertext transfer protocol (HTTP) protocol. To ensure the safety of health-related data, blockchain technology was utilized. The glucose sensor and smartwatch data were processed via Fog and sent to the cloud. A Random Forest algorithm was proposed and utilized to decide hypoglycemic events. When the hypoglycemic event was detected, the system sent a notification to the mobile application and auto-injection device to push the condensed sugar into the victims body. XGBoost, k-nearest neighbors (KNN), support vector machine (SVM), and decision tree were implemented to compare the proposed models performance. The random forest performed 0.942 testing accuracy, better than other models in detecting hypoglycemic events. The systems performance was measured in several conditions, and satisfactory results were achieved. The system can benefit hypoglycemia patients to survive this disease.
    A Glimpse of Physical Layer Decision Mechanisms: Facts, Challenges, and Remedies. (arXiv:2102.07258v3 [cs.LG] UPDATED)
    Communications are realized as a result of successive decisions at the physical layer, from modulation selection to multi-antenna strategy, and each decision affects the performance of the communication systems. Future communication systems must include extensive capabilities as they will encompass a wide variety of devices and applications. Conventional physical layer decision mechanisms may not meet these requirements, as they are often based on impractical and oversimplifying assumptions that result in a trade-off between complexity and efficiency. By leveraging past experiences, learning-driven designs are promising solutions to present a resilient decision mechanism and enable rapid response even under exceptional circumstances. The corresponding design solutions should evolve following the lines of learning-driven paradigms that offer more autonomy and robustness. This evolution must take place by considering the facts of real-world systems and without restraining assumptions. In this paper, the common assumptions in the physical layer are presented to highlight their discrepancies with practical systems. As a solution, learning algorithms are examined by considering the implementation steps and challenges. Furthermore, these issues are discussed through a real-time case study using software-defined radio nodes to demonstrate the potential performance improvement. A cyber-physical framework is presented to incorporate future remedies.
    Stochastic Neighbor Embedding with Gaussian and Student-t Distributions: Tutorial and Survey. (arXiv:2009.10301v2 [stat.ML] UPDATED)
    Stochastic Neighbor Embedding (SNE) is a manifold learning and dimensionality reduction method with a probabilistic approach. In SNE, every point is consider to be the neighbor of all other points with some probability and this probability is tried to be preserved in the embedding space. SNE considers Gaussian distribution for the probability in both the input and embedding spaces. However, t-SNE uses the Student-t and Gaussian distributions in these spaces, respectively. In this tutorial and survey paper, we explain SNE, symmetric SNE, t-SNE (or Cauchy-SNE), and t-SNE with general degrees of freedom. We also cover the out-of-sample extension and acceleration for these methods.
    Sequence Model Imitation Learning with Unobserved Contexts. (arXiv:2208.02225v1 [cs.LG])
    We consider imitation learning problems where the expert has access to a per-episode context that is hidden from the learner, both in the demonstrations and at test-time. While the learner might not be able to accurately reproduce expert behavior early on in an episode, by considering the entire history of states and actions, they might be able to eventually identify the context and act as the expert would. We prove that on-policy imitation learning algorithms (with or without access to a queryable expert) are better equipped to handle these sorts of asymptotically realizable problems than off-policy methods and are able to avoid the latching behavior (naive repetition of past actions) that plagues the latter. We conduct experiments in a toy bandit domain that show that there exist sharp phase transitions of whether off-policy approaches are able to match expert performance asymptotically, in contrast to the uniformly good performance of on-policy approaches. We demonstrate that on several continuous control tasks, on-policy approaches are able to use history to identify the context while off-policy approaches actually perform worse when given access to history.
    Multimodal Controller for Generative Models. (arXiv:2002.02572v7 [cs.LG] UPDATED)
    Class-conditional generative models are crucial tools for data generation from user-specified class labels. Existing approaches for class-conditional generative models require nontrivial modifications of backbone generative architectures to model conditional information fed into the model. This paper introduces a plug-and-play module named `multimodal controller' to generate multimodal data without introducing additional learning parameters. In the absence of the controllers, our model reduces to non-conditional generative models. We test the efficacy of multimodal controllers on CIFAR10, COIL100, and Omniglot benchmark datasets. We demonstrate that multimodal controlled generative models (including VAE, PixelCNN, Glow, and GAN) can generate class-conditional images of significantly better quality when compared with conditional generative models. Moreover, we show that multimodal controlled models can also create novel modalities of images.
    Interpretable bilinear attention network with domain adaptation improves drug-target prediction. (arXiv:2208.02194v1 [cs.LG])
    Predicting drug-target interaction is key for drug discovery. Recent deep learning-based methods show promising performance but two challenges remain: (i) how to explicitly model and learn local interactions between drugs and targets for better prediction and interpretation; (ii) how to generalize prediction performance on novel drug-target pairs from different distribution. In this work, we propose DrugBAN, a deep bilinear attention network (BAN) framework with domain adaptation to explicitly learn pair-wise local interactions between drugs and targets, and adapt on out-of-distribution data. DrugBAN works on drug molecular graphs and target protein sequences to perform prediction, with conditional domain adversarial learning to align learned interaction representations across different distributions for better generalization on novel drug-target pairs. Experiments on three benchmark datasets under both in-domain and cross-domain settings show that DrugBAN achieves the best overall performance against five state-of-the-art baselines. Moreover, visualizing the learned bilinear attention map provides interpretable insights from prediction results.
    Conv-NILM-Net, a causal and multi-appliance model for energy source separation. (arXiv:2208.02173v1 [eess.SP])
    Non-Intrusive Load Monitoring (NILM) seeks to save energy by estimating individual appliance power usage from a single aggregate measurement. Deep neural networks have become increasingly popular in attempting to solve NILM problems. However most used models are used for Load Identification rather than online Source Separation. Among source separation models, most use a single-task learning approach in which a neural network is trained exclusively for each appliance. This strategy is computationally expensive and ignores the fact that multiple appliances can be active simultaneously and dependencies between them. The rest of models are not causal, which is important for real-time application. Inspired by Convtas-Net, a model for speech separation, we propose Conv-NILM-net, a fully convolutional framework for end-to-end NILM. Conv-NILM-net is a causal model for multi appliance source separation. Our model is tested on two real datasets REDD and UK-DALE and clearly outperforms the state of the art while keeping a significantly smaller size than the competing models.
    Recovery of Future Data via Convolution Nuclear Norm Minimization. (arXiv:1909.03889v7 [cs.LG] UPDATED)
    This paper studies the problem of time series forecasting (TSF) from the perspective of compressed sensing. First of all, we convert TSF into a more inclusive problem called tensor completion with arbitrary sampling (TCAS), which is to restore a tensor from a subset of its entries sampled in an arbitrary manner. While it is known that, in the framework of Tucker low-rankness, it is theoretically impossible to identify the target tensor based on some arbitrarily selected entries, in this work we shall show that TCAS is indeed tackleable in the light of a new concept called convolutional low-rankness, which is a generalization of the well-known Fourier sparsity. Then we introduce a convex program termed Convolution Nuclear Norm Minimization (CNNM), and we prove that CNNM succeeds in solving TCAS as long as a sampling condition--which depends on the convolution rank of the target tensor--is obeyed. This theory provides a meaningful answer to the fundamental question of what is the minimum sampling size needed for making a given number of forecasts. Experiments on univariate time series, images and videos show encouraging results.
    Hierarchical Multiple-Instance Data Classification with Costly Features. (arXiv:1911.08756v5 [cs.LG] UPDATED)
    We motivate our research with a real-world problem of classifying malicious web domains using a remote service that provides various information. Crucially, some of the information can be further analyzed into a certain depth and this process sequentially creates a tree of hierarchically structured multiple-instance data. Each request sent to the remote service is associated with a cost (e.g., time or another cost per request) and the objective is to maximize the accuracy, constrained with a budget. We present a generic framework able to work with a class of similar problems. Our method is based on Classification with Costly Features (CwCF), Hierarchical Multiple-Instance Learning (HMIL) and hierarchical decomposition of the action space. It works with samples described as partially-observed trees of features of various types (similar to a JSON/XML file), which allows to model data with complex structure. The process is modeled as a Markov Decision Process (MDP), where a state represents acquired features, and actions select yet unknown ones. The policy is trained with deep reinforcement learning and we demonstrate our method with both real-world and synthetic data.
    SGEM: stochastic gradient with energy and momentum. (arXiv:2208.02208v1 [cs.LG])
    In this paper, we propose SGEM, Stochastic Gradient with Energy and Momentum, to solve a large class of general non-convex stochastic optimization problems, based on the AEGD method that originated in the work [AEGD: Adaptive Gradient Descent with Energy. arXiv: 2010.05109]. SGEM incorporates both energy and momentum at the same time so as to inherit their dual advantages. We show that SGEM features an unconditional energy stability property, and derive energy-dependent convergence rates in the general nonconvex stochastic setting, as well as a regret bound in the online convex setting. A lower threshold for the energy variable is also provided. Our experimental results show that SGEM converges faster than AEGD and generalizes better or at least as well as SGDM in training some deep neural networks.
    SpanDrop: Simple and Effective Counterfactual Learning for Long Sequences. (arXiv:2208.02169v1 [cs.LG])
    Distilling supervision signal from a long sequence to make predictions is a challenging task in machine learning, especially when not all elements in the input sequence contribute equally to the desired output. In this paper, we propose SpanDrop, a simple and effective data augmentation technique that helps models identify the true supervision signal in a long sequence with very few examples. By directly manipulating the input sequence, SpanDrop randomly ablates parts of the sequence at a time and ask the model to perform the same task to emulate counterfactual learning and achieve input attribution. Based on theoretical analysis of its properties, we also propose a variant of SpanDrop based on the beta-Bernoulli distribution, which yields diverse augmented sequences while providing a learning objective that is more consistent with the original dataset. We demonstrate the effectiveness of SpanDrop on a set of carefully designed toy tasks, as well as various natural language processing tasks that require reasoning over long sequences to arrive at the correct answer, and show that it helps models improve performance both when data is scarce and abundant.
    Optimised one-class classification performance. (arXiv:2102.02618v3 [cs.LG] UPDATED)
    We provide a thorough treatment of one-class classification with hyperparameter optimisation for five data descriptors: Support Vector Machine (SVM), Nearest Neighbour Distance (NND), Localised Nearest Neighbour Distance (LNND), Local Outlier Factor (LOF) and Average Localised Proximity (ALP). The hyperparameters of SVM and LOF have to be optimised through cross-validation, while NND, LNND and ALP allow an efficient form of leave-one-out validation and the reuse of a single nearest-neighbour query. We experimentally evaluate the effect of hyperparameter optimisation with 246 classification problems drawn from 50 datasets. From a selection of optimisation algorithms, the recent Malherbe-Powell proposal optimises the hyperparameters of all data descriptors most efficiently. We calculate the increase in test AUROC and the amount of overfitting as a function of the number of hyperparameter evaluations. After 50 evaluations, ALP and SVM significantly outperform LOF, NND and LNND, and LOF and NND outperform LNND. The performance of ALP and SVM is comparable, but ALP can be optimised more efficiently so constitutes a good default choice. Alternatively, using validation AUROC as a selection criterion between ALP or SVM gives the best overall result, and NND is the least computationally demanding option. We thus end up with a clear trade-off between three choices, allowing practitioners to make an informed decision.
    Subject-Specific Lesion Generation and Pseudo-Healthy Synthesis for Multiple Sclerosis Brain Images. (arXiv:2208.02135v1 [eess.IV])
    Understanding the intensity characteristics of brain lesions is key for defining image-based biomarkers in neurological studies and for predicting disease burden and outcome. In this work, we present a novel foreground-based generative method for modelling the local lesion characteristics that can both generate synthetic lesions on healthy images and synthesize subject-specific pseudo-healthy images from pathological images. Furthermore, the proposed method can be used as a data augmentation module to generate synthetic images for training brain image segmentation networks. Experiments on multiple sclerosis (MS) brain images acquired on magnetic resonance imaging (MRI) demonstrate that the proposed method can generate highly realistic pseudo-healthy and pseudo-pathological brain images. Data augmentation using the synthetic images improves the brain image segmentation performance compared to traditional data augmentation methods as well as a recent lesion-aware data augmentation technique, CarveMix. The code will be released at https://github.com/dogabasaran/lesion-synthesis.
    One Node at a Time: Node-Level Network Classification. (arXiv:2208.02162v1 [cs.SI])
    Network classification aims to group networks (or graphs) into distinct categories based on their structure. We study the connection between classification of a network and of its constituent nodes, and whether nodes from networks in different groups are distinguishable based on structural node characteristics such as centrality and clustering coefficient. We demonstrate, using various network datasets and random network models, that a classifier can be trained to accurately predict the network category of a given node (without seeing the whole network), implying that complex networks display distinct structural patterns even at the node level. Finally, we discuss two applications of node-level network classification: (i) whole-network classification from small samples of nodes, and (ii) network bootstrapping.
    Unsupervised Discovery of Semantic Concepts in Satellite Imagery with Style-based Wavelet-driven Generative Models. (arXiv:2208.02089v1 [cs.CV])
    In recent years, considerable advancements have been made in the area of Generative Adversarial Networks (GANs), particularly with the advent of style-based architectures that address many key shortcomings - both in terms of modeling capabilities and network interpretability. Despite these improvements, the adoption of such approaches in the domain of satellite imagery is not straightforward. Typical vision datasets used in generative tasks are well-aligned and annotated, and exhibit limited variability. In contrast, satellite imagery exhibits great spatial and spectral variability, wide presence of fine, high-frequency details, while the tedious nature of annotating satellite imagery leads to annotation scarcity - further motivating developments in unsupervised learning. In this light, we present the first pre-trained style- and wavelet-based GAN model that can readily synthesize a wide gamut of realistic satellite images in a variety of settings and conditions - while also preserving high-frequency information. Furthermore, we show that by analyzing the intermediate activations of our network, one can discover a multitude of interpretable semantic directions that facilitate the guided synthesis of satellite images in terms of high-level concepts (e.g., urbanization) without using any form of supervision. Via a set of qualitative and quantitative experiments we demonstrate the efficacy of our framework, in terms of suitability for downstream tasks (e.g., data augmentation), quality of synthetic imagery, as well as generalization capabilities to unseen datasets.
    Noise tolerance of learning to rank under class-conditional label noise. (arXiv:2208.02126v1 [cs.IR])
    Often, the data used to train ranking models is subject to label noise. For example, in web-search, labels created from clickstream data are noisy due to issues such as insufficient information in item descriptions on the SERP, query reformulation by the user, and erratic or unexpected user behavior. In practice, it is difficult to handle label noise without making strong assumptions about the label generation process. As a result, practitioners typically train their learning-to-rank (LtR) models directly on this noisy data without additional consideration of the label noise. Surprisingly, we often see strong performance from LtR models trained in this way. In this work, we describe a class of noise-tolerant LtR losses for which empirical risk minimization is a consistent procedure, even in the context of class-conditional label noise. We also develop noise-tolerant analogs of commonly used loss functions. The practical implications of our theoretical findings are further supported by experimental results.
    Machine learning optimization of Majorana hybrid nanowires. (arXiv:2208.02182v1 [cond-mat.mes-hall])
    As the complexity of quantum systems such as quantum bit arrays increases, efforts to automate expensive tuning are increasingly worthwhile. We investigate machine learning based tuning of gate arrays using the CMA-ES algorithm for the case study of Majorana wires with strong disorder. We find that the algorithm is able to efficiently improve the topological signatures, learn intrinsic disorder profiles, and completely eliminate disorder effects. For example, with only 20 gates, it is possible to fully recover Majorana zero modes destroyed by disorder by optimizing gate voltages.
    BPMN4sML: A BPMN Extension for Serverless Machine Learning. Technology Independent and Interoperable Modeling of Machine Learning Workflows and their Serverless Deployment Orchestration. (arXiv:2208.02030v1 [cs.SE])
    Machine learning (ML) continues to permeate all layers of academia, industry and society. Despite its successes, mental frameworks to capture and represent machine learning workflows in a consistent and coherent manner are lacking. For instance, the de facto process modeling standard, Business Process Model and Notation (BPMN), managed by the Object Management Group, is widely accepted and applied. However, it is short of specific support to represent machine learning workflows. Further, the number of heterogeneous tools for deployment of machine learning solutions can easily overwhelm practitioners. Research is needed to align the process from modeling to deploying ML workflows. We analyze requirements for standard based conceptual modeling for machine learning workflows and their serverless deployment. Confronting the shortcomings with respect to consistent and coherent modeling of ML workflows in a technology independent and interoperable manner, we extend BPMN's Meta-Object Facility (MOF) metamodel and the corresponding notation and introduce BPMN4sML (BPMN for serverless machine learning). Our extension BPMN4sML follows the same outline referenced by the Object Management Group (OMG) for BPMN. We further address the heterogeneity in deployment by proposing a conceptual mapping to convert BPMN4sML models to corresponding deployment models using TOSCA. BPMN4sML allows technology-independent and interoperable modeling of machine learning workflows of various granularity and complexity across the entire machine learning lifecycle. It aids in arriving at a shared and standardized language to communicate ML solutions. Moreover, it takes the first steps toward enabling conversion of ML workflow model diagrams to corresponding deployment models for serverless deployment via TOSCA.
    Empirical Study of Overfitting in Deep FNN Prediction Models for Breast Cancer Metastasis. (arXiv:2208.02150v1 [cs.LG])
    Overfitting is defined as the fact that the current model fits a specific data set perfectly, resulting in weakened generalization, and ultimately may affect the accuracy in predicting future data. In this research we used an EHR dataset concerning breast cancer metastasis to study overfitting of deep feedforward Neural Networks (FNNs) prediction models. We included 11 hyperparameters of the deep FNNs models and took an empirical approach to study how each of these hyperparameters was affecting both the prediction performance and overfitting when given a large range of values. We also studied how some of the interesting pairs of hyperparameters were interacting to influence the model performance and overfitting. The 11 hyperparameters we studied include activate function; weight initializer, number of hidden layers, learning rate, momentum, decay, dropout rate, batch size, epochs, L1, and L2. Our results show that most of the single hyperparameters are either negatively or positively corrected with model prediction performance and overfitting. In particular, we found that overfitting overall tends to negatively correlate with learning rate, decay, batch sides, and L2, but tends to positively correlate with momentum, epochs, and L1. According to our results, learning rate, decay, and batch size may have a more significant impact on both overfitting and prediction performance than most of the other hyperparameters, including L1, L2, and dropout rate, which were designed for minimizing overfitting. We also find some interesting interacting pairs of hyperparameters such as learning rate and momentum, learning rate and decay, and batch size and epochs. Keywords: Deep learning, overfitting, prediction, grid search, feedforward neural networks, breast cancer metastasis.
    Neural Nets with a Newton Conjugate Gradient Method on Multiple GPUs. (arXiv:2208.02017v1 [cs.LG])
    Training deep neural networks consumes increasing computational resource shares in many compute centers. Often, a brute force approach to obtain hyperparameter values is employed. Our goal is (1) to enhance this by enabling second-order optimization methods with fewer hyperparameters for large-scale neural networks and (2) to perform a survey of the performance optimizers for specific tasks to suggest users the best one for their problem. We introduce a novel second-order optimization method that requires the effect of the Hessian on a vector only and avoids the huge cost of explicitly setting up the Hessian for large-scale networks. We compare the proposed second-order method with two state-of-the-art optimizers on five representative neural network problems, including regression and very deep networks from computer vision or variational autoencoders. For the largest setup, we efficiently parallelized the optimizers with Horovod and applied it to a 8 GPU NVIDIA P100 (DGX-1) machine.
    MTGFlow: Unsupervised Multivariate Time Series Anomaly Detection via Dynamic Graph and Entity-aware Normalizing Flow. (arXiv:2208.02108v1 [cs.LG])
    Multivariate time series anomaly detection has been extensively studied under the semi-supervised setting, where a training dataset with all normal instances is required. However, preparing such a dataset is very laborious since each single data instance should be fully guaranteed to be normal. It is, therefore, desired to explore multivariate time series anomaly detection methods based on the dataset without any label knowledge. In this paper, we propose MTGFlow, an unsupervised anomaly detection approach for Multivariate Time series anomaly detection via dynamic Graph and entity-aware normalizing Flow, leaning only on a widely accepted hypothesis that abnormal instances exhibit sparse densities than the normal. However, the complex interdependencies among entities and the diverse inherent characteristics of each entity pose significant challenges on the density estimation, let alone to detect anomalies based on the estimated possibility distribution. To tackle these problems, we propose to learn the mutual and dynamic relations among entities via a graph structure learning model, which helps to model accurate distribution of multivariate time series. Moreover, taking account of distinct characteristics of the individual entities, an entity-aware normalizing flow is developed to describe each entity into a parameterized normal distribution, thereby producing fine-grained density estimation. Incorporating these two strategies, MTGFlowachieves superior anomaly detection performance. Experiments on the real-world datasets are conducted, demonstrating that MTGFlow outperforms the state-of-the-art (SOTA) by 5.0% and 1.6% AUROC for SWaT and WADI datasets respectively. Also, through the anomaly scores contributed by individual entities, MTGFlow can provide explanation information for the detection results.
    Robots with Different Embodiments Can Express and Influence Carefulness in Object Manipulation. (arXiv:2208.02058v1 [cs.RO])
    Humans have an extraordinary ability to communicate and read the properties of objects by simply watching them being carried by someone else. This level of communicative skills and interpretation, available to humans, is essential for collaborative robots if they are to interact naturally and effectively. For example, suppose a robot is handing over a fragile object. In that case, the human who receives it should be informed of its fragility in advance, through an immediate and implicit message, i.e., by the direct modulation of the robot's action. This work investigates the perception of object manipulations performed with a communicative intent by two robots with different embodiments (an iCub humanoid robot and a Baxter robot). We designed the robots' movements to communicate carefulness or not during the transportation of objects. We found that not only this feature is correctly perceived by human observers, but it can elicit as well a form of motor adaptation in subsequent human object manipulations. In addition, we get an insight into which motion features may induce to manipulate an object more or less carefully.
    Cross-lingual Approaches for the Detection of Adverse Drug Reactions in German from a Patient's Perspective. (arXiv:2208.02031v1 [cs.CL])
    In this work, we present the first corpus for German Adverse Drug Reaction (ADR) detection in patient-generated content. The data consists of 4,169 binary annotated documents from a German patient forum, where users talk about health issues and get advice from medical doctors. As is common in social media data in this domain, the class labels of the corpus are very imbalanced. This and a high topic imbalance make it a very challenging dataset, since often, the same symptom can have several causes and is not always related to a medication intake. We aim to encourage further multi-lingual efforts in the domain of ADR detection and provide preliminary experiments for binary classification using different methods of zero- and few-shot learning based on a multi-lingual model. When fine-tuning XLM-RoBERTa first on English patient forum data and then on the new German data, we achieve an F1-score of 37.52 for the positive class. We make the dataset and models publicly available for the community.
    A Novel Approach To Network Intrusion Detection System Using Deep Learning For Sdn: Futuristic Approach. (arXiv:2208.02094v1 [cs.CR])
    Software-Defined Networking (SDN) is the next generation to change the architecture of traditional networks. SDN is one of the promising solutions to change the architecture of internet networks. Attacks become more common due to the centralized nature of SDN architecture. It is vital to provide security for the SDN. In this study, we propose a Network Intrusion Detection System-Deep Learning module (NIDS-DL) approach in the context of SDN. Our suggested method combines Network Intrusion Detection Systems (NIDS) with many types of deep learning algorithms. Our approach employs 12 features extracted from 41 features in the NSL-KDD dataset using a feature selection method. We employed classifiers (CNN, DNN, RNN, LSTM, and GRU). When we compare classifier scores, our technique produced accuracy results of (98.63%, 98.53%, 98.13%, 98.04%, and 97.78%) respectively. The novelty of our new approach (NIDS-DL) uses 5 deep learning classifiers and made pre-processing dataset to harvests the best results. Our proposed approach was successful in binary classification and detecting attacks, implying that our approach (NIDS-DL) might be used with great efficiency in the future.
    Edge-Based Self-Supervision for Semi-Supervised Few-Shot Microscopy Image Cell Segmentation. (arXiv:2208.02105v1 [cs.CV])
    Deep neural networks currently deliver promising results for microscopy image cell segmentation, but they require large-scale labelled databases, which is a costly and time-consuming process. In this work, we relax the labelling requirement by combining self-supervised with semi-supervised learning. We propose the prediction of edge-based maps for self-supervising the training of the unlabelled images, which is combined with the supervised training of a small number of labelled images for learning the segmentation task. In our experiments, we evaluate on a few-shot microscopy image cell segmentation benchmark and show that only a small number of annotated images, e.g. 10% of the original training set, is enough for our approach to reach similar performance as with the fully annotated databases on 1- to 10-shots. Our code and trained models is made publicly available
    Gradient descent provably escapes saddle points in the training of shallow ReLU networks. (arXiv:2208.02083v1 [cs.LG])
    Dynamical systems theory has recently been applied in optimization to prove that gradient descent algorithms avoid so-called strict saddle points of the loss function. However, in many modern machine learning applications, the required regularity conditions are not satisfied. In particular, this is the case for rectified linear unit (ReLU) networks. In this paper, we prove a variant of the relevant dynamical systems result, a center-stable manifold theorem, in which we relax some of the regularity requirements. Then, we verify that shallow ReLU networks fit into the new framework. Building on a classification of critical points of the square integral loss of shallow ReLU networks measured against an affine target function, we deduce that gradient descent avoids most saddle points. We proceed to prove convergence to global minima if the initialization is sufficiently good, which is expressed by an explicit threshold on the limiting loss.
    Character Generation through Self-Supervised Vectorization. (arXiv:2208.02012v1 [cs.CV])
    The prevalent approach in self-supervised image generation is to operate on pixel level representations. While this approach can produce high quality images, it cannot benefit from the simplicity and innate quality of vectorization. Here we present a drawing agent that operates on stroke-level representation of images. At each time step, the agent first assesses the current canvas and decides whether to stop or keep drawing. When a 'draw' decision is made, the agent outputs a program indicating the stroke to be drawn. As a result, it produces a final raster image by drawing the strokes on a canvas, using a minimal number of strokes and dynamically deciding when to stop. We train our agent through reinforcement learning on MNIST and Omniglot datasets for unconditional generation and parsing (reconstruction) tasks. We utilize our parsing agent for exemplar generation and type conditioned concept generation in Omniglot challenge without any further training. We present successful results on all three generation tasks and the parsing task. Crucially, we do not need any stroke-level or vector supervision; we only use raster images for training.
    Exploration with Model Uncertainty at Extreme Scale in Real-Time Bidding. (arXiv:2208.01951v1 [cs.LG])
    In this work, we present a scalable and efficient system for exploring the supply landscape in real-time bidding. The system directs exploration based on the predictive uncertainty of models used for click-through rate prediction and works in a high-throughput, low-latency environment. Through online A/B testing, we demonstrate that exploration with model uncertainty has a positive impact on model performance and business KPIs.
    Learning Object Manipulation Skills from Video via Approximate Differentiable Physics. (arXiv:2208.01960v1 [cs.RO])
    We aim to teach robots to perform simple object manipulation tasks by watching a single video demonstration. Towards this goal, we propose an optimization approach that outputs a coarse and temporally evolving 3D scene to mimic the action demonstrated in the input video. Similar to previous work, a differentiable renderer ensures perceptual fidelity between the 3D scene and the 2D video. Our key novelty lies in the inclusion of a differentiable approach to solve a set of Ordinary Differential Equations (ODEs) that allows us to approximately model laws of physics such as gravity, friction, and hand-object or object-object interactions. This not only enables us to dramatically improve the quality of estimated hand and object states, but also produces physically admissible trajectories that can be directly translated to a robot without the need for costly reinforcement learning. We evaluate our approach on a 3D reconstruction task that consists of 54 video demonstrations sourced from 9 actions such as pull something from right to left or put something in front of something. Our approach improves over previous state-of-the-art by almost 30%, demonstrating superior quality on especially challenging actions involving physical interactions of two objects such as put something onto something. Finally, we showcase the learned skills on a Franka Emika Panda robot.
    Centroids Matching: an efficient Continual Learning approach operating in the embedding space. (arXiv:2208.02048v1 [cs.LG])
    Catastrophic forgetting (CF) occurs when a neural network loses the information previously learned while training on a set of samples from a different distribution, i.e., a new task. Existing approaches have achieved remarkable results in mitigating CF, especially in a scenario called task incremental learning. However, this scenario is not realistic, and limited work has been done to achieve good results on more realistic scenarios. In this paper, we propose a novel regularization method called Centroids Matching, that, inspired by meta-learning approaches, fights CF by operating in the feature space produced by the neural network, achieving good results while requiring a small memory footprint. Specifically, the approach classifies the samples directly using the feature vectors produced by the neural network, by matching those vectors with the centroids representing the classes from the current task, or all the tasks up to that point. Centroids Matching is faster than competing baselines, and it can be exploited to efficiently mitigate CF, by preserving the distances between the embedding space produced by the model when past tasks were over, and the one currently produced, leading to a method that achieves high accuracy on all the tasks, without using an external memory when operating on easy scenarios, or using a small one for more realistic ones. Extensive experiments demonstrate that Centroids Matching achieves accuracy gains on multiple datasets and scenarios.
    Binary Classification with Positive Labeling Sources. (arXiv:2208.01704v1 [cs.LG])
    To create a large amount of training labels for machine learning models effectively and efficiently, researchers have turned to Weak Supervision (WS), which uses programmatic labeling sources rather than manual annotation. Existing works of WS for binary classification typically assume the presence of labeling sources that are able to assign both positive and negative labels to data in roughly balanced proportions. However, for many tasks of interest where there is a minority positive class, negative examples could be too diverse for developers to generate indicative labeling sources. Thus, in this work, we study the application of WS on binary classification tasks with positive labeling sources only. We propose WEAPO, a simple yet competitive WS method for producing training labels without negative labeling sources. On 10 benchmark datasets, we show WEAPO achieves the highest averaged performance in terms of both the quality of synthesized labels and the performance of the final classifier supervised with these labels. We incorporated the implementation of \method into WRENCH, an existing benchmarking platform.
    Robust PCA for Anomaly Detection and Data Imputation in Seasonal Time Series. (arXiv:2208.01998v1 [stat.ML])
    We propose a robust principal component analysis (RPCA) framework to recover low-rank and sparse matrices from temporal observations. We develop an online version of the batch temporal algorithm in order to process larger datasets or streaming data. We empirically compare the proposed approaches with different RPCA frameworks and show their effectiveness in practical situations.
    Flow Annealed Importance Sampling Bootstrap. (arXiv:2208.01893v1 [cs.LG])
    Normalizing flows are tractable density models that can approximate complicated target distributions, e.g. Boltzmann distributions of physical systems. However, current methods for training flows either suffer from mode-seeking behavior, use samples from the target generated beforehand by expensive MCMC simulations, or use stochastic losses that have very high variance. To avoid these problems, we augment flows with annealed importance sampling (AIS) and minimize the mass covering $\alpha$-divergence with $\alpha=2$, which minimizes importance weight variance. Our method, Flow AIS Bootstrap (FAB), uses AIS to generate samples in regions where the flow is a poor approximation of the target, facilitating the discovery of new modes. We target with AIS the minimum variance distribution for the estimation of the $\alpha$-divergence via importance sampling. We also use a prioritized buffer to store and reuse AIS samples. These two features significantly improve FAB's performance. We apply FAB to complex multimodal targets and show that we can approximate them very accurately where previous methods fail. To the best of our knowledge, we are the first to learn the Boltzmann distribution of the alanine dipeptide molecule using only the unnormalized target density and without access to samples generated via Molecular Dynamics (MD) simulations: FAB produces better results than training via maximum likelihood on MD samples while using 100 times fewer target evaluations. After reweighting samples with importance weights, we obtain unbiased histograms of dihedral angles that are almost identical to the ground truth ones.
    OLLIE: Derivation-based Tensor Program Optimizer. (arXiv:2208.02025v1 [cs.LG])
    Boosting the runtime performance of deep neural networks (DNNs) is critical due to their wide adoption in real-world tasks. Existing approaches to optimizing the tensor algebra expression of a DNN only consider expressions representable by a fixed set of predefined operators, missing possible optimization opportunities between general expressions. We propose OLLIE, the first derivation-based tensor program optimizer. OLLIE optimizes tensor programs by leveraging transformations between general tensor algebra expressions, enabling a significantly larger expression search space that includes those supported by prior work as special cases. OLLIE uses a hybrid derivation-based optimizer that effectively combines explorative and guided derivations to quickly discover highly optimized expressions. Evaluation on seven DNNs shows that OLLIE can outperform existing optimizers by up to 2.73$\times$ (1.46$\times$ on average) on an A100 GPU and up to 2.68$\times$ (1.51$\times$) on a V100 GPU, respectively.
    HybridGNN: Learning Hybrid Representation in Multiplex Heterogeneous Networks. (arXiv:2208.02068v1 [cs.LG])
    Recently, graph neural networks have shown the superiority of modeling the complex topological structures in heterogeneous network-based recommender systems. Due to the diverse interactions among nodes and abundant semantics emerging from diverse types of nodes and edges, there is a bursting research interest in learning expressive node representations in multiplex heterogeneous networks. One of the most important tasks in recommender systems is to predict the potential connection between two nodes under a specific edge type (i.e., relationship). Although existing studies utilize explicit metapaths to aggregate neighbors, practically they only consider intra-relationship metapaths and thus fail to leverage the potential uplift by inter-relationship information. Moreover, it is not always straightforward to exploit inter-relationship metapaths comprehensively under diverse relationships, especially with the increasing number of node and edge types. In addition, contributions of different relationships between two nodes are difficult to measure. To address the challenges, we propose HybridGNN, an end-to-end GNN model with hybrid aggregation flows and hierarchical attentions to fully utilize the heterogeneity in the multiplex scenarios. Specifically, HybridGNN applies a randomized inter-relationship exploration module to exploit the multiplexity property among different relationships. Then, our model leverages hybrid aggregation flows under intra-relationship metapaths and randomized exploration to learn the rich semantics. To explore the importance of different aggregation flow and take advantage of the multiplexity property, we bring forward a novel hierarchical attention module which leverages both metapath-level attention and relationship-level attention. Extensive experimental results suggest that HybridGNN achieves the best performance compared to several state-of-the-art baselines.
    Neural Basis Functions for Accelerating Solutions to High Mach Euler Equations. (arXiv:2208.01687v1 [cs.LG])
    We propose an approach to solving partial differential equations (PDEs) using a set of neural networks which we call Neural Basis Functions (NBF). This NBF framework is a novel variation of the POD DeepONet operator learning approach where we regress a set of neural networks onto a reduced order Proper Orthogonal Decomposition (POD) basis. These networks are then used in combination with a branch network that ingests the parameters of the prescribed PDE to compute a reduced order approximation to the PDE. This approach is applied to the steady state Euler equations for high speed flow conditions (mach 10-30) where we consider the 2D flow around a cylinder which develops a shock condition. We then use the NBF predictions as initial conditions to a high fidelity Computational Fluid Dynamics (CFD) solver (CFD++) to show faster convergence. Lessons learned for training and implementing this algorithm will be presented as well.
    Maximal Independent Vertex Set applied to Graph Pooling. (arXiv:2208.01648v1 [cs.LG])
    Convolutional neural networks (CNN) have enabled major advances in image classification through convolution and pooling. In particular, image pooling transforms a connected discrete grid into a reduced grid with the same connectivity and allows reduction functions to take into account all the pixels of an image. However, a pooling satisfying such properties does not exist for graphs. Indeed, some methods are based on a vertex selection step which induces an important loss of information. Other methods learn a fuzzy clustering of vertex sets which induces almost complete reduced graphs. We propose to overcome both problems using a new pooling method, named MIVSPool. This method is based on a selection of vertices called surviving vertices using a Maximal Independent Vertex Set (MIVS) and an assignment of the remaining vertices to the survivors. Consequently, our method does not discard any vertex information nor artificially increase the density of the graph. Experimental results show an increase in accuracy for graph classification on various standard datasets.
    Vision-Based Safety System for Barrierless Human-Robot Collaboration. (arXiv:2208.02010v1 [cs.RO])
    Human safety has always been the main priority when working near an industrial robot. With the rise of Human-Robot Collaborative environments, physical barriers to avoiding collisions have been disappearing, increasing the risk of accidents and the need for solutions that ensure a safe Human-Robot Collaboration. This paper proposes a safety system that implements Speed and Separation Monitoring (SSM) type of operation. For this, safety zones are defined in the robot's workspace following current standards for industrial collaborative robots. A deep learning-based computer vision system detects, tracks, and estimates the 3D position of operators close to the robot. The robot control system receives the operator's 3D position and generates 3D representations of them in a simulation environment. Depending on the zone where the closest operator was detected, the robot stops or changes its operating speed. Three different operation modes in which the human and robot interact are presented. Results show that the vision-based system can correctly detect and classify in which safety zone an operator is located and that the different proposed operation modes ensure that the robot's reaction and stop time are within the required time limits to guarantee safety.
    Optimal Rates for Regularized Conditional Mean Embedding Learning. (arXiv:2208.01711v1 [stat.ML])
    We address the consistency of a kernel ridge regression estimate of the conditional mean embedding (CME), which is an embedding of the conditional distribution of $Y$ given $X$ into a target reproducing kernel Hilbert space $\mathcal{H}_Y$. The CME allows us to take conditional expectations of target RKHS functions, and has been employed in nonparametric causal and Bayesian inference. We address the misspecified setting, where the target CME is in the space of Hilbert-Schmidt operators acting from an input interpolation space between $\mathcal{H}_X$ and $L_2$, to $\mathcal{H}_Y$. This space of operators is shown to be isomorphic to a newly defined vector-valued interpolation space. Using this isomorphism, we derive a novel and adaptive statistical learning rate for the empirical CME estimator under the misspecified setting. Our analysis reveals that our rates match the optimal $O(\log n / n)$ rates without assuming $\mathcal{H}_Y$ to be finite dimensional. We further establish a lower bound on the learning rate, which shows that the obtained upper bound is optimal.
    Success of Uncertainty-Aware Deep Models Depends on Data Manifold Geometry. (arXiv:2208.01705v1 [cs.LG])
    For responsible decision making in safety-critical settings, machine learning models must effectively detect and process edge-case data. Although existing works show that predictive uncertainty is useful for these tasks, it is not evident from literature which uncertainty-aware models are best suited for a given dataset. Thus, we compare six uncertainty-aware deep learning models on a set of edge-case tasks: robustness to adversarial attacks as well as out-of-distribution and adversarial detection. We find that the geometry of the data sub-manifold is an important factor in determining the success of various models. Our finding suggests an interesting direction in the study of uncertainty-aware deep learning models.
    AI-driven Hypernetwork of Organic Chemistry: Network Statistics and Applications in Reaction Classification. (arXiv:2208.01647v1 [q-bio.MN])
    Rapid discovery of new reactions and molecules in recent years has been facilitated by the advancements in high throughput screening, accessibility to a much more complex chemical design space, and the development of accurate molecular modeling frameworks. A holistic study of the growing chemistry literature is, therefore, required that focuses on understanding the recent trends and extrapolating them into possible future trajectories. To this end, several network theory-based studies have been reported that use a directed graph representation of chemical reactions. Here, we perform a study based on representing chemical reactions as hypergraphs where the hyperedges represent chemical reactions and nodes represent the participating molecules. We use a standard reactions dataset to construct a hypernetwork and report its statistics such as degree distributions, average path length, assortativity or degree correlations, PageRank centrality, and graph-based clusters (or communities). We also compute each statistic for an equivalent directed graph representation of reactions to draw parallels and highlight differences between the two. To demonstrate the AI applicability of hypergraph reaction representation, we generate dense hypergraph embeddings and use them in the reaction classification problem. We conclude that the hypernetwork representation is flexible, preserves reaction context, and uncovers hidden insights that are otherwise not apparent in a traditional directed graph representation of chemical reactions.
    PolarMOT: How Far Can Geometric Relations Take Us in 3D Multi-Object Tracking?. (arXiv:2208.01957v1 [cs.CV])
    Most (3D) multi-object tracking methods rely on appearance-based cues for data association. By contrast, we investigate how far we can get by only encoding geometric relationships between objects in 3D space as cues for data-driven data association. We encode 3D detections as nodes in a graph, where spatial and temporal pairwise relations among objects are encoded via localized polar coordinates on graph edges. This representation makes our geometric relations invariant to global transformations and smooth trajectory changes, especially under non-holonomic motion. This allows our graph neural network to learn to effectively encode temporal and spatial interactions and fully leverage contextual and motion cues to obtain final scene interpretation by posing data association as edge classification. We establish a new state-of-the-art on nuScenes dataset and, more importantly, show that our method, PolarMOT, generalizes remarkably well across different locations (Boston, Singapore, Karlsruhe) and datasets (nuScenes and KITTI).
    Equivariant Disentangled Transformation for Domain Generalization under Combination Shift. (arXiv:2208.02011v1 [cs.LG])
    Machine learning systems may encounter unexpected problems when the data distribution changes in the deployment environment. A major reason is that certain combinations of domains and labels are not observed during training but appear in the test environment. Although various invariance-based algorithms can be applied, we find that the performance gain is often marginal. To formally analyze this issue, we provide a unique algebraic formulation of the combination shift problem based on the concepts of homomorphism, equivariance, and a refined definition of disentanglement. The algebraic requirements naturally derive a simple yet effective method, referred to as equivariant disentangled transformation (EDT), which augments the data based on the algebraic structures of labels and makes the transformation satisfy the equivariance and disentanglement requirements. Experimental results demonstrate that invariance may be insufficient, and it is important to exploit the equivariance structure in the combination shift problem.
    Exploring Generative Neural Temporal Point Process. (arXiv:2208.01874v1 [cs.LG])
    Temporal point process (TPP) is commonly used to model the asynchronous event sequence featuring occurrence timestamps and revealed by probabilistic models conditioned on historical impacts. While lots of previous works have focused on `goodness-of-fit' of TPP models by maximizing the likelihood, their predictive performance is unsatisfactory, which means the timestamps generated by models are far apart from true observations. Recently, deep generative models such as denoising diffusion and score matching models have achieved great progress in image generating tasks by demonstrating their capability of generating samples of high quality. However, there are no complete and unified works exploring and studying the potential of generative models in the context of event occurence modeling for TPP. In this work, we try to fill the gap by designing a unified \textbf{g}enerative framework for \textbf{n}eural \textbf{t}emporal \textbf{p}oint \textbf{p}rocess (\textsc{GNTPP}) model to explore their feasibility and effectiveness, and further improve models' predictive performance. Besides, in terms of measuring the historical impacts, we revise the attentive models which summarize influence from historical events with an adaptive reweighting term considering events' type relation and time intervals. Extensive experiments have been conducted to illustrate the improved predictive capability of \textsc{GNTPP} with a line of generative probabilistic decoders, and performance gain from the revised attention. To the best of our knowledge, this is the first work that adapts generative models in a complete unified framework and studies their effectiveness in the context of TPP. Our codebase including all the methods given in Section.5.1.1 is open in \url{https://github.com/BIRD-TAO/GNTPP}. We hope the code framework can facilitate future research in Neural TPPs.
    Maintaining Performance with Less Data. (arXiv:2208.02007v1 [cs.LG])
    We propose a novel method for training a neural network for image classification to reduce input data dynamically, in order to reduce the costs of training a neural network model. As Deep Learning tasks become more popular, their computational complexity increases, leading to more intricate algorithms and models which have longer runtimes and require more input data. The result is a greater cost on time, hardware, and environmental resources. By using data reduction techniques, we reduce the amount of work performed, and therefore the environmental impact of AI techniques, and with dynamic data reduction we show that accuracy may be maintained while reducing runtime by up to 50%, and reducing carbon emission proportionally.
    Adversarial Camouflage for Node Injection Attack on Graphs. (arXiv:2208.01819v1 [cs.LG])
    Node injection attacks against Graph Neural Networks (GNNs) have received emerging attention as a practical attack scenario, where the attacker injects malicious nodes instead of modifying node features or edges to degrade the performance of GNNs. Despite the initial success of node injection attacks, we find that the injected nodes by existing methods are easy to be distinguished from the original normal nodes by defense methods and limiting their attack performance in practice. To solve the above issues, we devote to camouflage node injection attack, i.e., camouflaging injected malicious nodes (structure/attributes) as the normal ones that appear legitimate/imperceptible to defense methods. The non-Euclidean nature of graph data and the lack of human prior brings great challenges to the formalization, implementation, and evaluation of camouflage on graphs. In this paper, we first propose and formulate the camouflage of injected nodes from both the fidelity and diversity of the ego networks centered around injected nodes. Then, we design an adversarial CAmouflage framework for Node injection Attack, namely CANA, to improve the camouflage while ensuring the attack performance. Several novel indicators for graph camouflage are further designed for a comprehensive evaluation. Experimental results demonstrate that when equipping existing node injection attack methods with our proposed CANA framework, the attack performance against defense methods as well as node camouflage is significantly improved.
    Localization and Classification of Parasitic Eggs in Microscopic Images Using an EfficientDet Detector. (arXiv:2208.01963v1 [cs.CV])
    IPIs caused by protozoan and helminth parasites are among the most common infections in humans in LMICs. They are regarded as a severe public health concern, as they cause a wide array of potentially detrimental health conditions. Researchers have been developing pattern recognition techniques for the automatic identification of parasite eggs in microscopic images. Existing solutions still need improvements to reduce diagnostic errors and generate fast, efficient, and accurate results. Our paper addresses this and proposes a multi-modal learning detector to localize parasitic eggs and categorize them into 11 categories. The experiments were conducted on the novel Chula-ParasiteEgg-11 dataset that was used to train both EfficientDet model with EfficientNet-v2 backbone and EfficientNet-B7+SVM. The dataset has 11,000 microscopic training images from 11 categories. Our results show robust performance with an accuracy of 92%, and an F1 score of 93%. Additionally, the IOU distribution illustrates the high localization capability of the detector.
    High-Speed Accurate Robot Control using Learned Forward Kinodynamics and Non-linear Least Squares Optimization. (arXiv:2206.08487v2 [cs.RO] UPDATED)
    Accurate control of robots at high speeds requires a control system that can take into account the kinodynamic interactions of the robot with the environment. Prior works on learning inverse kinodynamic (IKD) models of robots have shown success in capturing the complex kinodynamic effects. However, the types of control problems these approaches can be applied to are limited only to that of following pre-computed kinodynamically feasible trajectories. In this paper we present Optim-FKD, a new formulation for accurate, high-speed robot control that makes use of a learned forward kinodynamic (FKD) model and non-linear least squares optimization. Optim-FKD can be used for accurate, high speed control on any control task specifiable by a non-linear least squares objective. Optim-FKD can solve for control objectives such as path following and time-optimal control in real time, without needing access to pre-computed kinodynamically feasible trajectories. We empirically demonstrate these abilities of our approach through experiments on a scale one-tenth autonomous car. Our results show that Optim-FKD can follow desired trajectories more accurately and can find better solutions to optimal control problems than baseline approaches.
    Robust Graph Neural Networks using Weighted Graph Laplacian. (arXiv:2208.01853v1 [cs.LG])
    Graph neural network (GNN) is achieving remarkable performances in a variety of application domains. However, GNN is vulnerable to noise and adversarial attacks in input data. Making GNN robust against noises and adversarial attacks is an important problem. The existing defense methods for GNNs are computationally demanding and are not scalable. In this paper, we propose a generic framework for robustifying GNN known as Weighted Laplacian GNN (RWL-GNN). The method combines Weighted Graph Laplacian learning with the GNN implementation. The proposed method benefits from the positive semi-definiteness property of Laplacian matrix, feature smoothness, and latent features via formulating a unified optimization framework, which ensures the adversarial/noisy edges are discarded and connections in the graph are appropriately weighted. For demonstration, the experiments are conducted with Graph convolutional neural network(GCNN) architecture, however, the proposed framework is easily amenable to any existing GNN architecture. The simulation results with benchmark dataset establish the efficacy of the proposed method, both in accuracy and computational efficiency. Code can be accessed at https://github.com/Bharat-Runwal/RWL-GNN.
    DeepProphet2 -- A Deep Learning Gene Recommendation Engine. (arXiv:2208.01918v1 [q-bio.QM])
    New powerful tools for tackling life science problems have been created by recent advances in machine learning. The purpose of the paper is to discuss the potential advantages of gene recommendation performed by artificial intelligence (AI). Indeed, gene recommendation engines try to solve this problem: if the user is interested in a set of genes, which other genes are likely to be related to the starting set and should be investigated? This task was solved with a custom deep learning recommendation engine, DeepProphet2 (DP2), which is freely available to researchers worldwide via www.generecommender.com. Hereafter, insights behind the algorithm and its practical applications are illustrated. The gene recommendation problem can be addressed by mapping the genes to a metric space where a distance can be defined to represent the real semantic distance between them. To achieve this objective a transformer-based model has been trained on a well-curated freely available paper corpus, PubMed. The paper describes multiple optimization procedures that were employed to obtain the best bias-variance trade-off, focusing on embedding size and network depth. In this context, the model's ability to discover sets of genes implicated in diseases and pathways was assessed through cross-validation. A simple assumption guided the procedure: the network had no direct knowledge of pathways and diseases but learned genes' similarities and the interactions among them. Moreover, to further investigate the space where the neural network represents genes, the dimensionality of the embedding was reduced, and the results were projected onto a human-comprehensible space. In conclusion, a set of use cases illustrates the algorithm's potential applications in a real word setting.
    WrapperFL: A Model Agnostic Plug-in for Industrial Federated Learning. (arXiv:2206.10407v2 [cs.LG] UPDATED)
    Federated learning, as a privacy-preserving collaborative machine learning paradigm, has been gaining more and more attention in the industry. With the huge rise in demand, there have been many federated learning platforms that allow federated participants to set up and build a federated model from scratch. However, exiting platforms are highly intrusive, complicated, and hard to integrate with built machine learning models. For many real-world businesses that already have mature serving models, existing federated learning platforms have high entry barriers and development costs. This paper presents a simple yet practical federated learning plug-in inspired by ensemble learning, dubbed WrapperFL, allowing participants to build/join a federated system with existing models at minimal costs. The WrapperFL works in a plug-and-play way by simply attaching to the input and output interfaces of an existing model, without the need of re-development, significantly reducing the overhead of manpower and resources. We verify our proposed method on diverse tasks under heterogeneous data distributions and heterogeneous models. The experimental results demonstrate that WrapperFL can be successfully applied to a wide range of applications under practical settings and improves the local model with federated learning at a low cost.
    The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift. (arXiv:2208.01857v1 [cs.LG])
    We study linear regression under covariate shift, where the marginal distribution over the input covariates differs in the source and the target domains, while the conditional distribution of the output given the input covariates is similar across the two domains. We investigate a transfer learning approach with pretraining on the source data and finetuning based on the target data (both conducted by online SGD) for this problem. We establish sharp instance-dependent excess risk upper and lower bounds for this approach. Our bounds suggest that for a large class of linear regression instances, transfer learning with $O(N^2)$ source data (and scarce or no target data) is as effective as supervised learning with $N$ target data. In addition, we show that finetuning, even with only a small amount of target data, could drastically reduce the amount of source data required by pretraining. Our theory sheds light on the effectiveness and limitation of pretraining as well as the benefits of finetuning for tackling covariate shift problems.
    A Lightweight Transmission Parameter Selection Scheme Using Reinforcement Learning for LoRaWAN. (arXiv:2208.01824v1 [cs.LG])
    The number of IoT devices is predicted to reach 125 billion by 2023. The growth of IoT devices will intensify the collisions between devices, degrading communication performance. Selecting appropriate transmission parameters, such as channel and spreading factor (SF), can effectively reduce the collisions between long-range (LoRa) devices. However, most of the schemes proposed in the current literature are not easy to implement on an IoT device with limited computational complexity and memory. To solve this issue, we propose a lightweight transmission-parameter selection scheme, i.e., a joint channel and SF selection scheme using reinforcement learning for low-power wide area networking (LoRaWAN). In the proposed scheme, appropriate transmission parameters can be selected by simple four arithmetic operations using only Acknowledge (ACK) information. Additionally, we theoretically analyze the computational complexity and memory requirement of our proposed scheme, which verified that our proposed scheme could select transmission parameters with extremely low computational complexity and memory requirement. Moreover, a large number of experiments were implemented on the LoRa devices in the real world to evaluate the effectiveness of our proposed scheme. The experimental results demonstrate the following main phenomena. (1) Compared to other lightweight transmission-parameter selection schemes, collisions between LoRa devices can be efficiently avoided by our proposed scheme in LoRaWAN irrespective of changes in the available channels. (2) The frame success rate (FSR) can be improved by selecting access channels and using SFs as opposed to only selecting access channels. (3) Since interference exists between adjacent channels, FSR and fairness can be improved by increasing the interval of adjacent available channels.
    EgPDE-Net: Building Continuous Neural Networks for Time Series Prediction with Exogenous Variables. (arXiv:2208.01913v1 [cs.LG])
    While exogenous variables have a major impact on performance improvement in time series analysis, inter-series correlation and time dependence among them are rarely considered in the present continuous methods. The dynamical systems of multivariate time series could be modelled with complex unknown partial differential equations (PDEs) which play a prominent role in many disciplines of science and engineering. In this paper, we propose a continuous-time model for arbitrary-step prediction to learn an unknown PDE system in multivariate time series whose governing equations are parameterised by self-attention and gated recurrent neural networks. The proposed model, \underline{E}xogenous-\underline{g}uided \underline{P}artial \underline{D}ifferential \underline{E}quation Network (EgPDE-Net), takes account of the relationships among the exogenous variables and their effects on the target series. Importantly, the model can be reduced into a regularised ordinary differential equation (ODE) problem with special designed regularisation guidance, which makes the PDE problem tractable to obtain numerical solutions and feasible to predict multiple future values of the target series at arbitrary time points. Extensive experiments demonstrate that our proposed model could achieve competitive accuracy over strong baselines: on average, it outperforms the best baseline by reducing $9.85\%$ on RMSE and $13.98\%$ on MAE for arbitrary-step prediction.
    Leveraging Smartphone Sensors for Detecting Abnormal Gait for Smart Wearable Mobile Technologies. (arXiv:2208.01876v1 [cs.HC])
    Walking is one of the most common modes of terrestrial locomotion for humans. Walking is essential for humans to perform most kinds of daily activities. When a person walks, there is a pattern in it, and it is known as gait. Gait analysis is used in sports and healthcare. We can analyze this gait in different ways, like using video captured by the surveillance cameras or depth image cameras in the lab environment. It also can be recognized by wearable sensors. e.g., accelerometer, force sensors, gyroscope, flexible goniometer, magneto resistive sensors, electromagnetic tracking system, force sensors, and electromyography (EMG). Analysis through these sensors required a lab condition, or users must wear these sensors. For detecting abnormality in gait action of a human, we need to incorporate the sensors separately. We can know about one's health condition by abnormal human gait after detecting it. Understanding a regular gait vs. abnormal gait may give insights to the health condition of the subject using the smart wearable technologies. Therefore, in this paper, we proposed a way to analyze abnormal human gait through smartphone sensors. Though smart devices like smartphones and smartwatches are used by most of the person nowadays. So, we can track down their gait using sensors of these intelligent wearable devices.
    Zero-Shot Style Transfer for Gesture Animation driven by Text and Speech using Adversarial Disentanglement of Multimodal Style Encoding. (arXiv:2208.01917v1 [cs.SD])
    Modeling virtual agents with behavior style is one factor for personalizing human agent interaction. We propose an efficient yet effective machine learning approach to synthesize gestures driven by prosodic features and text in the style of different speakers including those unseen during training. Our model performs zero shot multimodal style transfer driven by multimodal data from the PATS database containing videos of various speakers. We view style as being pervasive while speaking, it colors the communicative behaviors expressivity while speech content is carried by multimodal signals and text. This disentanglement scheme of content and style allows us to directly infer the style embedding even of speaker whose data are not part of the training phase, without requiring any further training or fine tuning. The first goal of our model is to generate the gestures of a source speaker based on the content of two audio and text modalities. The second goal is to condition the source speaker predicted gestures on the multimodal behavior style embedding of a target speaker. The third goal is to allow zero shot style transfer of speakers unseen during training without retraining the model. Our system consists of: (1) a speaker style encoder network that learns to generate a fixed dimensional speaker embedding style from a target speaker multimodal data and (2) a sequence to sequence synthesis network that synthesizes gestures based on the content of the input modalities of a source speaker and conditioned on the speaker style embedding. We evaluate that our model can synthesize gestures of a source speaker and transfer the knowledge of target speaker style variability to the gesture generation task in a zero shot setup. We convert the 2D gestures to 3D poses and produce 3D animations. We conduct objective and subjective evaluations to validate our approach and compare it with a baseline.
    Asynchronous Federated Learning for Edge-assisted Vehicular Networks. (arXiv:2208.01901v1 [cs.LG])
    Vehicular networks enable vehicles support real-time vehicular applications through training data. Due to the limited computing capability, vehicles usually transmit data to a road side unit (RSU) at the network edge to process data. However, vehicles are usually reluctant to share data with each other due to the privacy issue. For the traditional federated learning (FL), vehicles train the data locally to obtain a local model and then upload the local model to the RSU to update the global model, thus the data privacy can be protected through sharing model parameters instead of data. The traditional FL updates the global model synchronously, i.e., the RSU needs to wait for all vehicles to upload their models for the global model updating. However, vehicles may usually drive out of the coverage of the RSU before they obtain their local models through training, which reduces the accuracy of the global model. It is necessary to propose an asynchronous federated learning (AFL) to solve this problem, where the RSU updates the global model once it receives a local model from a vehicle. However, the amount of data, computing capability and vehicle mobility may affect the accuracy of the global model. In this paper, we jointly consider the amount of data, computing capability and vehicle mobility to design an AFL scheme to improve the accuracy of the global model. Extensive simulation experiments have demonstrated that our scheme outperforms the FL scheme
    A Tighter Analysis of Spectral Clustering, and Beyond. (arXiv:2208.01724v1 [cs.DS])
    This work studies the classical spectral clustering algorithm which embeds the vertices of some graph $G=(V_G, E_G)$ into $\mathbb{R}^k$ using $k$ eigenvectors of some matrix of $G$, and applies $k$-means to partition $V_G$ into $k$ clusters. Our first result is a tighter analysis on the performance of spectral clustering, and explains why it works under some much weaker condition than the ones studied in the literature. For the second result, we show that, by applying fewer than $k$ eigenvectors to construct the embedding, spectral clustering is able to produce better output for many practical instances; this result is the first of its kind in spectral clustering. Besides its conceptual and theoretical significance, the practical impact of our work is demonstrated by the empirical analysis on both synthetic and real-world datasets, in which spectral clustering produces comparable or better results with fewer than $k$ eigenvectors.
    Robust Learning of Deep Time Series Anomaly Detection Models with Contaminated Training Data. (arXiv:2208.01841v1 [cs.LG])
    Time series anomaly detection (TSAD) is an important data mining task with numerous applications in the IoT era. In recent years, a large number of deep neural network-based methods have been proposed, demonstrating significantly better performance than conventional methods on addressing challenging TSAD problems in a variety of areas. Nevertheless, these deep TSAD methods typically rely on a clean training dataset that is not polluted by anomalies to learn the "normal profile" of the underlying dynamics. This requirement is nontrivial since a clean dataset can hardly be provided in practice. Moreover, without the awareness of their robustness, blindly applying deep TSAD methods with potentially contaminated training data can possibly incur significant performance degradation in the detection phase. In this work, to tackle this important challenge, we firstly investigate the robustness of commonly used deep TSAD methods with contaminated training data which provides a guideline for applying these methods when the provided training data are not guaranteed to be anomaly-free. Furthermore, we propose a model-agnostic method which can effectively improve the robustness of learning mainstream deep TSAD models with potentially contaminated data. Experiment results show that our method can consistently prevent or mitigate performance degradation of mainstream deep TSAD models on widely used benchmark datasets.
    Link Prediction on Heterophilic Graphs via Disentangled Representation Learning. (arXiv:2208.01820v1 [cs.LG])
    Link prediction is an important task that has wide applications in various domains. However, the majority of existing link prediction approaches assume the given graph follows homophily assumption, and designs similarity-based heuristics or representation learning approaches to predict links. However, many real-world graphs are heterophilic graphs, where the homophily assumption does not hold, which challenges existing link prediction methods. Generally, in heterophilic graphs, there are many latent factors causing the link formation, and two linked nodes tend to be similar in one or two factors but might be dissimilar in other factors, leading to low overall similarity. Thus, one way is to learn disentangled representation for each node with each vector capturing the latent representation of a node on one factor, which paves a way to model the link formation in heterophilic graphs, resulting in better node representation learning and link prediction performance. However, the work on this is rather limited. Therefore, in this paper, we study a novel problem of exploring disentangled representation learning for link prediction on heterophilic graphs. We propose a novel framework DisenLink which can learn disentangled representations by modeling the link formation and perform factor-aware message-passing to facilitate link prediction. Extensive experiments on 13 real-world datasets demonstrate the effectiveness of DisenLink for link prediction on both heterophilic and hemophiliac graphs. Our codes are available at https://github.com/sjz5202/DisenLink
    Pyramidal Denoising Diffusion Probabilistic Models. (arXiv:2208.01864v1 [cs.CV])
    Diffusion models have demonstrated impressive image generation performance, and have been used in various computer vision tasks. Unfortunately, image generation using diffusion models is very time-consuming since it requires thousands of sampling steps. To address this problem, here we present a novel pyramidal diffusion model to generate high resolution images starting from much coarser resolution images using a single score function trained with a positional embedding. This enables a time-efficient sampling for image generation, and also solves the low batch size problem when training with limited resources. Furthermore, we show that the proposed approach can be efficiently used for multi-scale super-resolution problem using a single score function.
    Graph Regularized Nonnegative Latent Factor Analysis Model for Temporal Link Prediction in Cryptocurrency Transaction Networks. (arXiv:2208.01923v1 [cs.LG])
    With the development of blockchain technology, the cryptocurrency based on blockchain technology is becoming more and more popular. This gave birth to a huge cryptocurrency transaction network has received widespread attention. Link prediction learning structure of network is helpful to understand the mechanism of network, so it is also widely studied in cryptocurrency network. However, the dynamics of cryptocurrency transaction networks have been neglected in the past researches. We use graph regularized method to link past transaction records with future transactions. Based on this, we propose a single latent factor-dependent, non-negative, multiplicative and graph regularized-incorporated update (SLF-NMGRU) algorithm and further propose graph regularized nonnegative latent factor analysis (GrNLFA) model. Finally, experiments on a real cryptocurrency transaction network show that the proposed method improves both the accuracy and the computational efficiency
    Understanding Adversarial Imitation Learning in Small Sample Regime: A Stage-coupled Analysis. (arXiv:2208.01899v1 [cs.LG])
    Imitation learning learns a policy from expert trajectories. While the expert data is believed to be crucial for imitation quality, it was found that a kind of imitation learning approach, adversarial imitation learning (AIL), can have exceptional performance. With as little as only one expert trajectory, AIL can match the expert performance even in a long horizon, on tasks such as locomotion control. There are two mysterious points in this phenomenon. First, why can AIL perform well with only a few expert trajectories? Second, why does AIL maintain good performance despite the length of the planning horizon? In this paper, we theoretically explore these two questions. For a total-variation-distance-based AIL (called TV-AIL), our analysis shows a horizon-free imitation gap $\mathcal O(\{\min\{1, \sqrt{|\mathcal S|/N} \})$ on a class of instances abstracted from locomotion control tasks. Here $|\mathcal S|$ is the state space size for a tabular Markov decision process, and $N$ is the number of expert trajectories. We emphasize two important features of our bound. First, this bound is meaningful in both small and large sample regimes. Second, this bound suggests that the imitation gap of TV-AIL is at most 1 regardless of the planning horizon. Therefore, this bound can explain the empirical observation. Technically, we leverage the structure of multi-stage policy optimization in TV-AIL and present a new stage-coupled analysis via dynamic programming
    A Deep Learning Approach to Detect Lean Blowout in Combustion Systems. (arXiv:2208.01871v1 [cs.LG])
    Lean combustion is environment friendly with low NOx emissions and also provides better fuel efficiency in a combustion system. However, approaching towards lean combustion can make engines more susceptible to lean blowout. Lean blowout (LBO) is an undesirable phenomenon that can cause sudden flame extinction leading to sudden loss of power. During the design stage, it is quite challenging for the scientists to accurately determine the optimal operating limits to avoid sudden LBO occurrence. Therefore, it is crucial to develop accurate and computationally tractable frameworks for online LBO detection in low NOx emission engines. To the best of our knowledge, for the first time, we propose a deep learning approach to detect lean blowout in combustion systems. In this work, we utilize a laboratory-scale combustor to collect data for different protocols. We start far from LBO for each protocol and gradually move towards the LBO regime, capturing a quasi-static time series dataset at each condition. Using one of the protocols in our dataset as the reference protocol and with conditions annotated by domain experts, we find a transition state metric for our trained deep learning model to detect LBO in the other test protocols. We find that our proposed approach is more accurate and computationally faster than other baseline models to detect the transitions to LBO. Therefore, we recommend this method for real-time performance monitoring in lean combustion engines.
    Digital Twin-Assisted Efficient Reinforcement Learning for Edge Task Scheduling. (arXiv:2208.01781v1 [cs.LG])
    Task scheduling is a critical problem when one user offloads multiple different tasks to the edge server. When a user has multiple tasks to offload and only one task can be transmitted to server at a time, while server processes tasks according to the transmission order, the problem is NP-hard. However, it is difficult for traditional optimization methods to quickly obtain the optimal solution, while approaches based on reinforcement learning face with the challenge of excessively large action space and slow convergence. In this paper, we propose a Digital Twin (DT)-assisted RL-based task scheduling method in order to improve the performance and convergence of the RL. We use DT to simulate the results of different decisions made by the agent, so that one agent can try multiple actions at a time, or, similarly, multiple agents can interact with environment in parallel in DT. In this way, the exploration efficiency of RL can be significantly improved via DT, and thus RL can converges faster and local optimality is less likely to happen. Particularly, two algorithms are designed to made task scheduling decisions, i.e., DT-assisted asynchronous Q-learning (DTAQL) and DT-assisted exploring Q-learning (DTEQL). Simulation results show that both algorithms significantly improve the convergence speed of Q-learning by increasing the exploration efficiency.
    A data-centric weak supervised learning for highway traffic incident detection. (arXiv:2112.09792v2 [cs.LG] UPDATED)
    Using the data from loop detector sensors for near-real-time detection of traffic incidents in highways is crucial to averting major traffic congestion. While recent supervised machine learning methods offer solutions to incident detection by leveraging human-labeled incident data, the false alarm rate is often too high to be used in practice. Specifically, the inconsistency in the human labeling of the incidents significantly affects the performance of supervised learning models. To that end, we focus on a data-centric approach to improve the accuracy and reduce the false alarm rate of traffic incident detection on highways. We develop a weak supervised learning workflow to generate high-quality training labels for the incident data without the ground truth labels, and we use those generated labels in the supervised learning setup for final detection. This approach comprises three stages. First, we introduce a data preprocessing and curation pipeline that processes traffic sensor data to generate high-quality training data through leveraging labeling functions, which can be domain knowledge-related or simple heuristic rules. Second, we evaluate the training data generated by weak supervision using three supervised learning models -- random forest, k-nearest neighbors, and a support vector machine ensemble -- and long short-term memory classifiers. The results show that the accuracy of all of the models improves significantly after using the training data generated by weak supervision. Third, we develop an online real-time incident detection approach that leverages the model ensemble and the uncertainty quantification while detecting incidents. Overall, we show that our proposed weak supervised learning workflow achieves a high incident detection rate (0.90) and low false alarm rate (0.08).
    A Roadmap for Greater Public Use of Privacy-Sensitive Government Data: Workshop Report. (arXiv:2208.01636v1 [cs.CR])
    Government agencies collect and manage a wide range of ever-growing datasets. While such data has the potential to support research and evidence-based policy making, there are concerns that the dissemination of such data could infringe upon the privacy of the individuals (or organizations) from whom such data was collected. To appraise the current state of data sharing, as well as learn about opportunities for stimulating such sharing at a faster pace, a virtual workshop was held on May 21st and 26th, 2021, sponsored by the National Science Foundation and National Institute of Standards and Technologies, where a multinational collection of researchers and practitioners were brought together to discuss their experiences and learn about recently developed technologies for managing privacy while sharing data. The workshop specifically focused on challenges and successes in government data sharing at various levels. The first day focused on successful examples of new technology applied to sharing of public data, including formal privacy techniques, synthetic data, and cryptographic approaches. Day two emphasized brainstorming sessions on some of the challenges and directions to address them.
    Two-Stream Transformer Architecture for Long Video Understanding. (arXiv:2208.01753v1 [cs.CV])
    Pure vision transformer architectures are highly effective for short video classification and action recognition tasks. However, due to the quadratic complexity of self attention and lack of inductive bias, transformers are resource intensive and suffer from data inefficiencies. Long form video understanding tasks amplify data and memory efficiency problems in transformers making current approaches unfeasible to implement on data or memory restricted domains. This paper introduces an efficient Spatio-Temporal Attention Network (STAN) which uses a two-stream transformer architecture to model dependencies between static image features and temporal contextual features. Our proposed approach can classify videos up to two minutes in length on a single GPU, is data efficient, and achieves SOTA performance on several long video understanding tasks.
    A Transformational Characterization of Unconditionally Equivalent Bayesian Networks. (arXiv:2203.00521v2 [stat.ML] UPDATED)
    We consider the problem of characterizing Bayesian networks up to unconditional equivalence, i.e., when directed acyclic graphs (DAGs) have the same set of unconditional $d$-separation statements. Each unconditional equivalence class (UEC) is uniquely represented with an undirected graph whose clique structure encodes the members of the class. Via this structure, we provide a transformational characterization of unconditional equivalence; i.e., we show that two DAGs are in the same UEC if and only if one can be transformed into the other via a finite sequence of specified moves. We also extend this characterization to the essential graphs representing the Markov equivalence classes (MECs) in the UEC. UECs partition the space of MECs and are easily estimable from marginal independence tests. Thus, a characterization of unconditional equivalence has applications in methods that involve searching the space of MECs of Bayesian networks.
    Post-hoc Interpretability based Parameter Selection for Data Oriented Nuclear Reactor Accident Diagnosis System. (arXiv:2208.01805v1 [eess.SY])
    During applying data-oriented diagnosis systems to distinguishing the type of and evaluating the severity of nuclear power plant initial events, it is of vital importance to decide which parameters to be used as the system input. However, although several diagnosis systems have already achieved acceptable performance in diagnosis precision and speed, hardly have the researchers discussed the method of monitoring point choosing and its layout. For this reason, redundant measuring data are used to train the diagnostic model, leading to high uncertainty of the classification, extra training time consumption, and higher probability of overfitting while training. In this study, a method of choosing thermal hydraulics parameters of a nuclear power plant is proposed, using the theory of post-hoc interpretability theory in deep learning. At the start, a novel Time-sequential Residual Convolutional Neural Network (TRES-CNN) diagnosis model is introduced to identify the position and hydrodynamic diameter of breaks in LOCA, using 38 parameters manually chosen on HPR1000 empirically. Afterwards, post-hoc interpretability methods are applied to evaluate the attributions of diagnosis model's outputs, deciding which 15 parameters to be more decisive in diagnosing LOCA details. The results show that the TRES-CNN based diagnostic model successfully predicts the position and size of breaks in LOCA via selected 15 parameters of HPR1000, with 25% of time consumption while training the model compared the process using total 38 parameters. In addition, the relative diagnostic accuracy error is within 1.5 percent compared with the model using parameters chosen empirically, which can be regarded as the same amount of diagnostic reliability.
    RemixIT: Continual self-training of speech enhancement models via bootstrapped remixing. (arXiv:2202.08862v3 [cs.SD] UPDATED)
    We present RemixIT, a simple yet effective self-supervised method for training speech enhancement without the need of a single isolated in-domain speech nor a noise waveform. Our approach overcomes limitations of previous methods which make them dependent on clean in-domain target signals and thus, sensitive to any domain mismatch between train and test samples. RemixIT is based on a continuous self-training scheme in which a pre-trained teacher model on out-of-domain data infers estimated pseudo-target signals for in-domain mixtures. Then, by permuting the estimated clean and noise signals and remixing them together, we generate a new set of bootstrapped mixtures and corresponding pseudo-targets which are used to train the student network. Vice-versa, the teacher periodically refines its estimates using the updated parameters of the latest student models. Experimental results on multiple speech enhancement datasets and tasks not only show the superiority of our method over prior approaches but also showcase that RemixIT can be combined with any separation model as well as be applied towards any semi-supervised and unsupervised domain adaptation task. Our analysis, paired with empirical evidence, sheds light on the inside functioning of our self-training scheme wherein the student model keeps obtaining better performance while observing severely degraded pseudo-targets.
    Deep Reinforcement Learning for Multi-Agent Interaction. (arXiv:2208.01769v1 [cs.MA])
    The development of autonomous agents which can interact with other agents to accomplish a given task is a core area of research in artificial intelligence and machine learning. Towards this goal, the Autonomous Agents Research Group develops novel machine learning algorithms for autonomous systems control, with a specific focus on deep reinforcement learning and multi-agent reinforcement learning. Research problems include scalable learning of coordinated agent policies and inter-agent communication; reasoning about the behaviours, goals, and composition of other agents from limited observations; and sample-efficient learning based on intrinsic motivation, curriculum learning, causal inference, and representation learning. This article provides a broad overview of the ongoing research portfolio of the group and discusses open problems for future directions.
    Matrix Decomposition and Applications. (arXiv:2201.00145v2 [math.NA] UPDATED)
    In 1954, Alston S. Householder published Principles of Numerical Analysis, one of the first modern treatments on matrix decomposition that favored a (block) LU decomposition-the factorization of a matrix into the product of lower and upper triangular matrices. And now, matrix decomposition has become a core technology in machine learning, largely due to the development of the back propagation algorithm in fitting a neural network. The sole aim of this survey is to give a self-contained introduction to concepts and mathematical tools in numerical linear algebra and matrix analysis in order to seamlessly introduce matrix decomposition techniques and their applications in subsequent sections. However, we clearly realize our inability to cover all the useful and interesting results concerning matrix decomposition and given the paucity of scope to present this discussion, e.g., the separated analysis of the Euclidean space, Hermitian space, Hilbert space, and things in the complex domain. We refer the reader to literature in the field of linear algebra for a more detailed introduction to the related fields.
    A cloud platform for automating and sharing analysis of raw simulation data from high throughput polymer molecular dynamics simulations. (arXiv:2208.01692v1 [cond-mat.mtrl-sci])
    Open material databases storing hundreds of thousands of material structures and their corresponding properties have become the cornerstone of modern computational materials science. Yet, the raw outputs of the simulations, such as the trajectories from molecular dynamics simulations and charge densities from density functional theory calculations, are generally not shared due to their huge size. In this work, we describe a cloud-based platform to facilitate the sharing of raw data and enable the fast post-processing in the cloud to extract new properties defined by the user. As an initial demonstration, our database currently includes 6286 molecular dynamics trajectories for amorphous polymer electrolytes and 5.7 terabytes of data. We create a public analysis library at https://github.com/TRI-AMDD/htp_md to extract multiple properties from the raw data, using both expert designed functions and machine learning models. The analysis is run automatically with computation in the cloud, and results then populate a database that can be accessed publicly. Our platform encourages users to contribute both new trajectory data and analysis functions via public interfaces. Newly analyzed properties will be incorporated into the database. Finally, we create a front-end user interface at https://www.htpmd.matr.io for browsing and visualization of our data. We envision the platform to be a new way of sharing raw data and new insights for the computational materials science community.
    Quantum-Inspired Tensor Neural Networks for Partial Differential Equations. (arXiv:2208.02235v1 [cs.LG])
    Partial Differential Equations (PDEs) are used to model a variety of dynamical systems in science and engineering. Recent advances in deep learning have enabled us to solve them in a higher dimension by addressing the curse of dimensionality in new ways. However, deep learning methods are constrained by training time and memory. To tackle these shortcomings, we implement Tensor Neural Networks (TNN), a quantum-inspired neural network architecture that leverages Tensor Network ideas to improve upon deep learning approaches. We demonstrate that TNN provide significant parameter savings while attaining the same accuracy as compared to the classical Dense Neural Network (DNN). In addition, we also show how TNN can be trained faster than DNN for the same accuracy. We benchmark TNN by applying them to solve parabolic PDEs, specifically the Black-Scholes-Barenblatt equation, widely used in financial pricing theory, empirically showing the advantages of TNN over DNN. Further examples, such as the Hamilton-Jacobi-Bellman equation, are also discussed.
    Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature. (arXiv:2102.04168v5 [cs.LG] UPDATED)
    This paper studies model-based bandit and reinforcement learning (RL) with nonlinear function approximations. We propose to study convergence to approximate local maxima because we show that global convergence is statistically intractable even for one-layer neural net bandit with a deterministic reward. For both nonlinear bandit and RL, the paper presents a model-based algorithm, Virtual Ascent with Online Model Learner (ViOlin), which provably converges to a local maximum with sample complexity that only depends on the sequential Rademacher complexity of the model class. Our results imply novel global or local regret bounds on several concrete settings such as linear bandit with finite or sparse model class, and two-layer neural net bandit. A key algorithmic insight is that optimism may lead to over-exploration even for two-layer neural net model class. On the other hand, for convergence to local maxima, it suffices to maximize the virtual return if the model can also reasonably predict the size of the gradient and Hessian of the real return.
    Internet of Things (IoT) based ECG System for Rural Health Care. (arXiv:2208.02226v1 [eess.SP])
    Nearly 30% of the people in the rural areas of Bangladesh are below the poverty level. Moreover, due to the unavailability of modernized healthcare-related technology, nursing and diagnosis facilities are limited for rural people. Therefore, rural people are deprived of proper healthcare. In this perspective, modern technology can be facilitated to mitigate their health problems. ECG sensing tools are interfaced with the human chest, and requisite cardiovascular data is collected through an IoT device. These data are stored in the cloud incorporates with the MQTT and HTTP servers. An innovative IoT-based method for ECG monitoring systems on cardiovascular or heart patients has been suggested in this study. The ECG signal parameters P, Q, R, S, T are collected, pre-processed, and predicted to monitor the cardiovascular conditions for further health management. The machine learning algorithm is used to determine the significance of ECG signal parameters and error rate. The logistic regression model fitted the better agreements between the train and test data. The prediction has been performed to determine the variation of PQRST quality and its suitability in the ECG Monitoring System. Considering the values of quality parameters, satisfactory results are obtained. The proposed IoT-based ECG system reduces the health care cost and complexity of cardiovascular diseases in the future.
    ProcK: Machine Learning for Knowledge-Intensive Processes. (arXiv:2109.04881v2 [cs.LG] UPDATED)
    We present a novel methodology to build powerful predictive process models. Our method, denoted ProcK (Process & Knowledge), relies not only on sequential input data in the form of event logs, but can learn to use a knowledge graph to incorporate information about the attribute values of the events and their mutual relationships. The idea is realized by mapping event attributes to nodes of a knowledge graph and training a sequence model alongside a graph neural network in an end-to-end fashion. This hybrid approach substantially enhances the flexibility and applicability of predictive process monitoring, as both the static and dynamic information residing in the databases of organizations can be directly taken as input data. We demonstrate the potential of ProcK by applying it to a number of predictive process monitoring tasks, including tasks with knowledge graphs available as well as an existing process monitoring benchmark where no such graph is given. The experiments provide evidence that our methodology achieves state-of-the-art performance and improves predictive power when a knowledge graph is available.
    Multimodal sensor fusion in the latent representation space. (arXiv:2208.02183v1 [cs.AI])
    A new method for multimodal sensor fusion is introduced. The technique relies on a two-stage process. In the first stage, a multimodal generative model is constructed from unlabelled training data. In the second stage, the generative model serves as a reconstruction prior and the search manifold for the sensor fusion tasks. The method also handles cases where observations are accessed only via subsampling i.e. compressed sensing. We demonstrate the effectiveness and excellent performance on a range of multimodal fusion experiments such as multisensory classification, denoising, and recovery from subsampled observations.
    AdaCat: Adaptive Categorical Discretization for Autoregressive Models. (arXiv:2208.02246v1 [cs.LG])
    Autoregressive generative models can estimate complex continuous data distributions, like trajectory rollouts in an RL environment, image intensities, and audio. Most state-of-the-art models discretize continuous data into several bins and use categorical distributions over the bins to approximate the continuous data distribution. The advantage is that the categorical distribution can easily express multiple modes and are straightforward to optimize. However, such approximation cannot express sharp changes in density without using significantly more bins, making it parameter inefficient. We propose an efficient, expressive, multimodal parameterization called Adaptive Categorical Discretization (AdaCat). AdaCat discretizes each dimension of an autoregressive model adaptively, which allows the model to allocate density to fine intervals of interest, improving parameter efficiency. AdaCat generalizes both categoricals and quantile-based regression. AdaCat is a simple add-on to any discretization-based distribution estimator. In experiments, AdaCat improves density estimation for real-world tabular data, images, audio, and trajectories, and improves planning in model-based offline RL.
    A Screening Strategy for Structured Optimization Involving Nonconvex $\ell_{q,p}$ Regularization. (arXiv:2208.02161v1 [cs.LG])
    In this paper, we develop a simple yet effective screening rule strategy to improve the computational efficiency in solving structured optimization involving nonconvex $\ell_{q,p}$ regularization. Based on an iteratively reweighted $\ell_1$ (IRL1) framework, the proposed screening rule works like a preprocessing module that potentially removes the inactive groups before starting the subproblem solver, thereby reducing the computational time in total. This is mainly achieved by heuristically exploiting the dual subproblem information during each iteration.Moreover, we prove that our screening rule can remove all inactive variables in a finite number of iterations of the IRL1 method. Numerical experiments illustrate the efficiency of our screening rule strategy compared with several state-of-the-art algorithms.
    Masked Vision and Language Modeling for Multi-modal Representation Learning. (arXiv:2208.02131v1 [cs.CV])
    In this paper, we study how to use masked signal modeling in vision and language (V+L) representation learning. Instead of developing masked language modeling (MLM) and masked image modeling (MIM) independently, we propose to build joint masked vision and language modeling, where the masked signal of one modality is reconstructed with the help from another modality. This is motivated by the nature of image-text paired data that both of the image and the text convey almost the same information but in different formats. The masked signal reconstruction of one modality conditioned on another modality can also implicitly learn cross-modal alignment between language tokens and image patches. Our experiments on various V+L tasks show that the proposed method not only achieves state-of-the-art performances by using a large amount of data, but also outperforms the other competitors by a significant margin in the regimes of limited training data.
    KPI-BERT: A Joint Named Entity Recognition and Relation Extraction Model for Financial Reports. (arXiv:2208.02140v1 [cs.CL])
    We present KPI-BERT, a system which employs novel methods of named entity recognition (NER) and relation extraction (RE) to extract and link key performance indicators (KPIs), e.g. "revenue" or "interest expenses", of companies from real-world German financial documents. Specifically, we introduce an end-to-end trainable architecture that is based on Bidirectional Encoder Representations from Transformers (BERT) combining a recurrent neural network (RNN) with conditional label masking to sequentially tag entities before it classifies their relations. Our model also introduces a learnable RNN-based pooling mechanism and incorporates domain expert knowledge by explicitly filtering impossible relations. We achieve a substantially higher prediction performance on a new practical dataset of German financial reports, outperforming several strong baselines including a competing state-of-the-art span-based entity tagging approach.
    Efficient Fine-Tuning of Compressed Language Models with Learners. (arXiv:2208.02070v1 [cs.CL])
    Fine-tuning BERT-based models is resource-intensive in memory, computation, and time. While many prior works aim to improve inference efficiency via compression techniques, e.g., pruning, these works do not explicitly address the computational challenges of training to downstream tasks. We introduce Learner modules and priming, novel methods for fine-tuning that exploit the overparameterization of pre-trained language models to gain benefits in convergence speed and resource utilization. Learner modules navigate the double bind of 1) training efficiently by fine-tuning a subset of parameters, and 2) training effectively by ensuring quick convergence and high metric scores. Our results on DistilBERT demonstrate that learners perform on par with or surpass the baselines. Learners train 7x fewer parameters than state-of-the-art methods on GLUE. On CoLA, learners fine-tune 20% faster, and have significantly lower resource utilization.
    Adaptive Domain Generalization via Online Disagreement Minimization. (arXiv:2208.01996v1 [cs.CV])
    Deep neural networks suffer from significant performance deterioration when there exists distribution shift between deployment and training. Domain Generalization (DG) aims to safely transfer a model to unseen target domains by only relying on a set of source domains. Although various DG approaches have been proposed, a recent study named DomainBed, reveals that most of them do not beat the simple Empirical Risk Minimization (ERM). To this end, we propose a general framework that is orthogonal to existing DG algorithms and could improve their performance consistently. Unlike previous DG works that stake on a static source model to be hopefully a universal one, our proposed AdaODM adaptively modifies the source model at test time for different target domains. Specifically, we create multiple domain-specific classifiers upon a shared domain-generic feature extractor. The feature extractor and classifiers are trained in an adversarial way, where the feature extractor embeds the input samples into a domain-invariant space, and the multiple classifiers capture the distinct decision boundaries that each of them relates to a specific source domain. During testing, distribution differences between target and source domains could be effectively measured by leveraging prediction disagreement among source classifiers. By fine-tuning source models to minimize the disagreement at test time, target domain features are well aligned to the invariant feature space. We verify AdaODM on two popular DG methods, namely ERM and CORAL, and four DG benchmarks, namely VLCS, PACS, OfficeHome, and TerraIncognita. The results show AdaODM stably improves the generalization capacity on unseen domains and achieves state-of-the-art performance.
    A Convolutional Persistence Transform. (arXiv:2208.02107v1 [math.AT])
    We consider a new topological feauturization of $d$-dimensional images, obtained by convolving images with various filters before computing persistence. Viewing a convolution filter as a motif within an image, the persistence diagram of the resulting convolution describes the way the motif is distributed throughout that image. This pipeline, which we call convolutional persistence, extends the capacity of topology to observe patterns in image data. Indeed, we prove that (generically speaking) for any two images one can find some filter for which they produce different persistence diagrams, so that the collection of all possible convolutional persistence diagrams for a given image is an injective invariant. This is proven by showing convolutional persistence to be a special case of another topological invariant, the Persistent Homology Transform. Other advantages of convolutional persistence are improved stability and robustness to noise, greater flexibility for data-dependent vectorizations, and reduced computational complexity for convolutions with large stride vectors. Additionally, we have a suite of experiments showing that convolutions greatly improve the predictive power of persistence on a host of classification tasks, even if one uses random filters and vectorizes the resulting diagrams by recording only their total persistences.
    Multi-Feature Vision Transformer via Self-Supervised Representation Learning for Improvement of COVID-19 Diagnosis. (arXiv:2208.01843v1 [eess.IV])
    The role of chest X-ray (CXR) imaging, due to being more cost-effective, widely available, and having a faster acquisition time compared to CT, has evolved during the COVID-19 pandemic. To improve the diagnostic performance of CXR imaging a growing number of studies have investigated whether supervised deep learning methods can provide additional support. However, supervised methods rely on a large number of labeled radiology images, which is a time-consuming and complex procedure requiring expert clinician input. Due to the relative scarcity of COVID-19 patient data and the costly labeling process, self-supervised learning methods have gained momentum and has been proposed achieving comparable results to fully supervised learning approaches. In this work, we study the effectiveness of self-supervised learning in the context of diagnosing COVID-19 disease from CXR images. We propose a multi-feature Vision Transformer (ViT) guided architecture where we deploy a cross-attention mechanism to learn information from both original CXR images and corresponding enhanced local phase CXR images. We demonstrate the performance of the baseline self-supervised learning models can be further improved by leveraging the local phase-based enhanced CXR images. By using 10\% labeled CXR scans, the proposed model achieves 91.10\% and 96.21\% overall accuracy tested on total 35,483 CXR images of healthy (8,851), regular pneumonia (6,045), and COVID-19 (18,159) scans and shows significant improvement over state-of-the-art techniques. Code is available https://github.com/endiqq/Multi-Feature-ViT
    V-Coder: Adaptive AutoEncoder for Semantic Disclosure in Knowledge Graphs. (arXiv:2208.01735v1 [cs.AI])
    Semantic Web or Knowledge Graphs (KG) emerged to one of the most important information source for intelligent systems requiring access to structured knowledge. One of the major challenges is the extraction and processing of unambiguous information from textual data. Following the human perception, overlapping semantic linkages between two named entities become clear due to our common-sense about the context a relationship lives in which is not the case when we look at it from an automatically driven process of a machine. In this work, we are interested in the problem of Relational Resolution within the scope of KGs, i.e, we are investigating the inherent semantic of relationships between entities within a network. We propose a new adaptive AutoEncoder, called V-Coder, to identify relations inherently connecting entities from different domains. Those relations can be considered as being ambiguous and are candidates for disentanglement. Likewise to the Adaptive Learning Theory (ART), our model learns new patterns from the KG by increasing units in a competitive layer without discarding the previous observed patterns whilst learning the quality of each relation separately. The evaluation on real-world datasets of Freebase, Yago and NELL shows that the V-Coder is not only able to recover links from corrupted input data, but also shows that the semantic disclosure of relations in a KG show the tendency to improve link prediction. A semantic evaluation wraps the evaluation up.
    Reconstructing Sparse Illicit Supply Networks: A Case Study of Multiplex Drug Trafficking Networks. (arXiv:2208.01739v1 [cs.SI])
    The network structure provides critical information for law enforcement agencies to develop effective strategies to interdict illicit supply networks. However, the complete structure of covert networks is often unavailable, thus it is crucially important to develop approaches to infer a more complete structure of covert networks. In this paper, we work on real-world multiplex drug trafficking networks extracted from an investigation report. A statistical approach built on the EM algorithm (DegEM) as well as other methods based on structural similarity are applied to reconstruct the multiplex drug trafficking network given different fractions of observed nodes and links. It is found that DegEM approach achieves the best predictive performance in terms of several accuracy metrics. Meanwhile, structural similarity-based methods perform poorly in reconstructing the drug trafficking networks due to the sparsity of links between nodes in the network. The inferred multiplex networks can be leveraged to (i) inform the decision-making on monitoring covert networks as well as allocating limited resources for collecting additional information to improve the reconstruction accuracy and (ii) develop more effective interdiction strategies.
    A New Implementation of Federated Learning for Privacy and Security Enhancement. (arXiv:2208.01826v1 [cs.CR])
    Motivated by the ever-increasing concerns on personal data privacy and the rapidly growing data volume at local clients, federated learning (FL) has emerged as a new machine learning setting. An FL system is comprised of a central parameter server and multiple local clients. It keeps data at local clients and learns a centralized model by sharing the model parameters learned locally. No local data needs to be shared, and privacy can be well protected. Nevertheless, since it is the model instead of the raw data that is shared, the system can be exposed to the poisoning model attacks launched by malicious clients. Furthermore, it is challenging to identify malicious clients since no local client data is available on the server. Besides, membership inference attacks can still be performed by using the uploaded model to estimate the client's local data, leading to privacy disclosure. In this work, we first propose a model update based federated averaging algorithm to defend against Byzantine attacks such as additive noise attacks and sign-flipping attacks. The individual client model initialization method is presented to provide further privacy protections from the membership inference attacks by hiding the individual local machine learning model. When combining these two schemes, privacy and security can be both effectively enhanced. The proposed schemes are proved to converge experimentally under non-IID data distribution when there are no attacks. Under Byzantine attacks, the proposed schemes perform much better than the classical model based FedAvg algorithm.
    Cross-Modal Alignment Learning of Vision-Language Conceptual Systems. (arXiv:2208.01744v1 [cs.CV])
    Human infants learn the names of objects and develop their own conceptual systems without explicit supervision. In this study, we propose methods for learning aligned vision-language conceptual systems inspired by infants' word learning mechanisms. The proposed model learns the associations of visual objects and words online and gradually constructs cross-modal relational graph networks. Additionally, we also propose an aligned cross-modal representation learning method that learns semantic representations of visual objects and words in a self-supervised manner based on the cross-modal relational graph networks. It allows entities of different modalities with conceptually the same meaning to have similar semantic representation vectors. We quantitatively and qualitatively evaluate our method, including object-to-word mapping and zero-shot learning tasks, showing that the proposed model significantly outperforms the baselines and that each conceptual system is topologically aligned.
    No Pattern, No Recognition: a Survey about Reproducibility and Distortion Issues of Text Clustering and Topic Modeling. (arXiv:2208.01712v1 [cs.LG])
    Extracting knowledge from unlabeled texts using machine learning algorithms can be complex. Document categorization and information retrieval are two applications that may benefit from unsupervised learning (e.g., text clustering and topic modeling), including exploratory data analysis. However, the unsupervised learning paradigm poses reproducibility issues. The initialization can lead to variability depending on the machine learning algorithm. Furthermore, the distortions can be misleading when regarding cluster geometry. Amongst the causes, the presence of outliers and anomalies can be a determining factor. Despite the relevance of initialization and outlier issues for text clustering and topic modeling, the authors did not find an in-depth analysis of them. This survey provides a systematic literature review (2011-2022) of these subareas and proposes a common terminology since similar procedures have different terms. The authors describe research opportunities, trends, and open issues. The appendices summarize the theoretical background of the text vectorization, the factorization, and the clustering algorithms that are directly or indirectly related to the reviewed works.
    Analysis of the Spatio-temporal Dynamics of COVID-19 in Massachusetts via Spectral Graph Wavelet Theory. (arXiv:2208.01749v1 [cs.SI])
    The rapid spread of COVID-19 disease has had a significant impact on the world. In this paper, we study COVID-19 data interpretation and visualization using open-data sources for 351 cities and towns in Massachusetts from December 6, 2020 to September 25, 2021. Because cities are embedded in rather complex transportation networks, we construct the spatio-temporal dynamic graph model, in which the graph attention neural network is utilized as a deep learning method to learn the pandemic transition probability among major cities in Massachusetts. Using the spectral graph wavelet transform (SGWT), we process the COVID-19 data on the dynamic graph, which enables us to design effective tools to analyze and detect spatio-temporal patterns in the pandemic spreading. We design a new node classification method, which effectively identifies the anomaly cities based on spectral graph wavelet coefficients. It can assist administrations or public health organizations in monitoring the spread of the pandemic and developing preventive measures. Unlike most work focusing on the evolution of confirmed cases over time, we focus on the spatio-temporal patterns of pandemic evolution among cities. Through the data analysis and visualization, a better understanding of the epidemiological development at the city level is obtained and can be helpful with city-specific surveillance.
    Convex-Concave Min-Max Stackelberg Games. (arXiv:2110.05192v7 [cs.GT] UPDATED)
    Min-max optimization problems (i.e., min-max games) have been attracting a great deal of attention because of their applicability to a wide range of machine learning problems. Although significant progress has been made recently, the literature to date has focused on games with independent strategy sets; little is known about solving games with dependent strategy sets, which can be characterized as min-max Stackelberg games. We introduce two first-order methods that solve a large class of convex-concave min-max Stackelberg games, and show that our methods converge in polynomial time. Min-max Stackelberg games were first studied by Wald, under the posthumous name of Wald's maximin model, a variant of which is the main paradigm used in robust optimization, which means that our methods can likewise solve many convex robust optimization problems. We observe that the computation of competitive equilibria in Fisher markets also comprises a min-max Stackelberg game. Further, we demonstrate the efficacy and efficiency of our algorithms in practice by computing competitive equilibria in Fisher markets with varying utility structures. Our experiments suggest potential ways to extend our theoretical results, by demonstrating how different smoothness properties can affect the convergence rate of our algorithms.
    Diagnosis of Paratuberculosis in Histopathological Images Based on Explainable Artificial Intelligence and Deep Learning. (arXiv:2208.01674v1 [eess.IV])
    Artificial intelligence holds great promise in medical imaging, especially histopathological imaging. However, artificial intelligence algorithms cannot fully explain the thought processes during decision-making. This situation has brought the problem of explainability, i.e., the black box problem, of artificial intelligence applications to the agenda: an algorithm simply responds without stating the reasons for the given images. To overcome the problem and improve the explainability, explainable artificial intelligence (XAI) has come to the fore, and piqued the interest of many researchers. Against this backdrop, this study examines a new and original dataset using the deep learning algorithm, and visualizes the output with gradient-weighted class activation mapping (Grad-CAM), one of the XAI applications. Afterwards, a detailed questionnaire survey was conducted with the pathologists on these images. Both the decision-making processes and the explanations were verified, and the accuracy of the output was tested. The research results greatly help pathologists in the diagnosis of paratuberculosis.  ( 2 min )
    Curvature-informed multi-task learning for graph networks. (arXiv:2208.01684v1 [cs.LG])
    Properties of interest for crystals and molecules, such as band gap, elasticity, and solubility, are generally related to each other: they are governed by the same underlying laws of physics. However, when state-of-the-art graph neural networks attempt to predict multiple properties simultaneously (the multi-task learning (MTL) setting), they frequently underperform a suite of single property predictors. This suggests graph networks may not be fully leveraging these underlying similarities. Here we investigate a potential explanation for this phenomenon: the curvature of each property's loss surface significantly varies, leading to inefficient learning. This difference in curvature can be assessed by looking at spectral properties of the Hessians of each property's loss function, which is done in a matrix-free manner via randomized numerical linear algebra. We evaluate our hypothesis on two benchmark datasets (Materials Project (MP) and QM8) and consider how these findings can inform the training of novel multi-task learning models.  ( 2 min )
    Differentially Private Vertical Federated Clustering. (arXiv:2208.01700v1 [cs.CR])
    In many applications, multiple parties have private data regarding the same set of users but on disjoint sets of attributes, and a server wants to leverage the data to train a model. To enable model learning while protecting the privacy of the data subjects, we need vertical federated learning (VFL) techniques, where the data parties share only information for training the model, instead of the private data. However, it is challenging to ensure that the shared information maintains privacy while learning accurate models. To the best of our knowledge, the algorithm proposed in this paper is the first practical solution for differentially private vertical federated k-means clustering, where the server can obtain a set of global centers with a provable differential privacy guarantee. Our algorithm assumes an untrusted central server that aggregates differentially private local centers and membership encodings from local data parties. It builds a weighted grid as the synopsis of the global dataset based on the received information. Final centers are generated by running any k-means algorithm on the weighted grid. Our approach for grid weight estimation uses a novel, light-weight, and differentially private set intersection cardinality estimation algorithm based on the Flajolet-Martin sketch. To improve the estimation accuracy in the setting with more than two data parties, we further propose a refined version of the weights estimation algorithm and a parameter tuning strategy to reduce the final k-means utility to be close to that in the central private setting. We provide theoretical utility analysis and experimental evaluation results for the cluster centers computed by our algorithm and show that our approach performs better both theoretically and empirically than the two baselines based on existing techniques.  ( 3 min )
    Adapting Triplet Importance of Implicit Feedback for Personalized Recommendation. (arXiv:2208.01709v1 [cs.IR])
    Implicit feedback is frequently used for developing personalized recommendation services due to its ubiquity and accessibility in real-world systems. In order to effectively utilize such information, most research adopts the pairwise ranking method on constructed training triplets (user, positive item, negative item) and aims to distinguish between positive items and negative items for each user. However, most of these methods treat all the training triplets equally, which ignores the subtle difference between different positive or negative items. On the other hand, even though some other works make use of the auxiliary information (e.g., dwell time) of user behaviors to capture this subtle difference, such auxiliary information is hard to obtain. To mitigate the aforementioned problems, we propose a novel training framework named Triplet Importance Learning (TIL), which adaptively learns the importance score of training triplets. We devise two strategies for the importance score generation and formulate the whole procedure as a bilevel optimization, which does not require any rule-based design. We integrate the proposed training procedure with several Matrix Factorization (MF)- and Graph Neural Network (GNN)-based recommendation models, demonstrating the compatibility of our framework. Via a comparison using three real-world datasets with many state-of-the-art methods, we show that our proposed method outperforms the best existing models by 3-21\% in terms of Recall@k for the top-k recommendation.  ( 3 min )
  • Open

    Beyond neural scaling laws: beating power law scaling via data pruning. (arXiv:2206.14486v2 [cs.LG] UPDATED)
    Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning. However, these improvements through scaling alone require considerable costs in compute and energy. Here we focus on the scaling of error with dataset size and show how both in theory and practice we can break beyond power law scaling and reduce it to exponential scaling instead if we have access to a high-quality data pruning metric that ranks the order in which training examples should be discarded to achieve any pruned dataset size. We then test this new exponential scaling prediction with pruned dataset size empirically, and indeed observe better than power law scaling performance on ResNets trained on CIFAR-10, SVHN, and ImageNet. Given the importance of finding high-quality pruning metrics, we perform the first large-scale benchmarking study of ten different data pruning metrics on ImageNet. We find most existing high performing metrics scale poorly to ImageNet, while the best are computationally intensive and require labels for every image. We therefore developed a new simple, cheap and scalable self-supervised pruning metric that demonstrates comparable performance to the best supervised metrics. Overall, our work suggests that the discovery of good data-pruning metrics may provide a viable path forward to substantially improved neural scaling laws, thereby reducing the resource costs of modern deep learning.
    Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process. (arXiv:2202.10589v4 [stat.ML] UPDATED)
    This paper is concerned with constructing a confidence interval for a target policy's value offline based on a pre-collected observational data in infinite horizon settings. Most of the existing works assume no unmeasured variables exist that confound the observed actions. This assumption, however, is likely to be violated in real applications such as healthcare and technological industries. In this paper, we show that with some auxiliary variables that mediate the effect of actions on the system dynamics, the target policy's value is identifiable in a confounded Markov decision process. Based on this result, we develop an efficient off-policy value estimator that is robust to potential model misspecification and provide rigorous uncertainty quantification. Our method is justified by theoretical results, simulated and real datasets obtained from ridesharing companies. A Python implementation of the proposed procedure is available at https://github.com/Mamba413/cope.
    Free Energy Evaluation Using Marginalized Annealed Importance Sampling. (arXiv:2204.03784v2 [stat.ML] UPDATED)
    The evaluation of the free energy of a stochastic model is considered a significant issue in various fields of physics and machine learning. However, the exact free energy evaluation is computationally infeasible because the free energy expression includes an intractable partition function. Annealed importance sampling (AIS) is a type of importance sampling based on the Markov chain Monte Carlo method that is similar to a simulated annealing and can effectively approximate the free energy. This study proposes an AIS-based approach, which is referred to as marginalized AIS (mAIS). The statistical efficiency of mAIS is investigated in detail based on theoretical and numerical perspectives. Based on the investigation, it is proved that mAIS is more effective than AIS under a certain condition.
    Combinatorial Causal Bandits. (arXiv:2206.01995v2 [cs.LG] UPDATED)
    In combinatorial causal bandits (CCB), the learning agent chooses at most $K$ variables in each round to intervene, collects feedback from the observed variables, with the goal of minimizing expected regret on the target variable $Y$. Different from all prior studies on causal bandits, CCB needs to deal with exponentially large action space. We study under the context of binary generalized linear models (BGLMs) with a succinct parametric representation of the causal models. We present the algorithm BGLM-OFU for Markovian BGLMs (i.e. no hidden variables) based on the maximum likelihood estimation method, and show that it achieves $O(\sqrt{T}\log T)$ regret, where $T$ is the time horizon. For the special case of linear models with hidden variables, we apply causal inference techniques such as the do-calculus to convert the original model into a Markovian model, and then show that our BGLM-OFU algorithm and another algorithm based on the linear regression both solve such linear models with hidden variables. Our novelty includes (a) considering the combinatorial intervention action space, (b) considering general causal models including ones with hidden variables, (c) integrating and adapting techniques from diverse studies such as generalized linear bandits and online influence maximization, and (d) not relying on unrealistic assumptions such as knowing the joint distribution of the parents of $Y$ under all interventions used in some prior studies.
    Robust Training under Label Noise by Over-parameterization. (arXiv:2202.14026v2 [cs.LG] UPDATED)
    Recently, over-parameterized deep networks, with increasingly more network parameters than training samples, have dominated the performances of modern machine learning. However, when the training data is corrupted, it has been well-known that over-parameterized networks tend to overfit and do not generalize. In this work, we propose a principled approach for robust training of over-parameterized deep networks in classification tasks where a proportion of training labels are corrupted. The main idea is yet very simple: label noise is sparse and incoherent with the network learned from clean data, so we model the noise and learn to separate it from the data. Specifically, we model the label noise via another sparse over-parameterization term, and exploit implicit algorithmic regularizations to recover and separate the underlying corruptions. Remarkably, when trained using such a simple method in practice, we demonstrate state-of-the-art test accuracy against label noise on a variety of real datasets. Furthermore, our experimental results are corroborated by theory on simplified linear models, showing that exact separation between sparse noise and low-rank data can be achieved under incoherent conditions. The work opens many interesting directions for improving over-parameterized models by using sparse over-parameterization and implicit regularization.
    Diffusion bridges vector quantized Variational AutoEncoders. (arXiv:2202.04895v2 [stat.ML] UPDATED)
    Vector Quantized-Variational AutoEncoders (VQ-VAE) are generative models based on discrete latent representations of the data, where inputs are mapped to a finite set of learned embeddings.To generate new samples, an autoregressive prior distribution over the discrete states must be trained separately. This prior is generally very complex and leads to slow generation. In this work, we propose a new model to train the prior and the encoder/decoder networks simultaneously. We build a diffusion bridge between a continuous coded vector and a non-informative prior distribution. The latent discrete states are then given as random functions of these continuous vectors. We show that our model is competitive with the autoregressive prior on the mini-Imagenet and CIFAR dataset and is efficient in both optimization and sampling. Our framework also extends the standard VQ-VAE and enables end-to-end training.
    A Transformational Characterization of Unconditionally Equivalent Bayesian Networks. (arXiv:2203.00521v2 [stat.ML] UPDATED)
    We consider the problem of characterizing Bayesian networks up to unconditional equivalence, i.e., when directed acyclic graphs (DAGs) have the same set of unconditional $d$-separation statements. Each unconditional equivalence class (UEC) is uniquely represented with an undirected graph whose clique structure encodes the members of the class. Via this structure, we provide a transformational characterization of unconditional equivalence; i.e., we show that two DAGs are in the same UEC if and only if one can be transformed into the other via a finite sequence of specified moves. We also extend this characterization to the essential graphs representing the Markov equivalence classes (MECs) in the UEC. UECs partition the space of MECs and are easily estimable from marginal independence tests. Thus, a characterization of unconditional equivalence has applications in methods that involve searching the space of MECs of Bayesian networks.
    AUC Maximization in the Era of Big Data and AI: A Survey. (arXiv:2203.15046v3 [cs.LG] UPDATED)
    Area under the ROC curve, a.k.a. AUC, is a measure of choice for assessing the performance of a classifier for imbalanced data. AUC maximization refers to a learning paradigm that learns a predictive model by directly maximizing its AUC score. It has been studied for more than two decades dating back to late 90s and a huge amount of work has been devoted to AUC maximization since then. Recently, stochastic AUC maximization for big data and deep AUC maximization for deep learning have received increasing attention and yielded dramatic impact for solving real-world problems. However, to the best our knowledge there is no comprehensive survey of related works for AUC maximization. This paper aims to address the gap by reviewing the literature in the past two decades. We not only give a holistic view of the literature but also present detailed explanations and comparisons of different papers from formulations to algorithms and theoretical guarantees. We also identify and discuss remaining and emerging issues for deep AUC maximization, and provide suggestions on topics for future work.
    auton-survival: an Open-Source Package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Event Data. (arXiv:2204.07276v4 [cs.LG] UPDATED)
    Applications of machine learning in healthcare often require working with time-to-event prediction tasks including prognostication of an adverse event, re-hospitalization or death. Such outcomes are typically subject to censoring due to loss of follow up. Standard machine learning methods cannot be applied in a straightforward manner to datasets with censored outcomes. In this paper, we present auton-survival, an open-source repository of tools to streamline working with censored time-to-event or survival data. auton-survival includes tools for survival regression, adjustment in the presence of domain shift, counterfactual estimation, phenotyping for risk stratification, evaluation, as well as estimation of treatment effects. Through real world case studies employing a large subset of the SEER oncology incidence data, we demonstrate the ability of auton-survival to rapidly support data scientists in answering complex health and epidemiological questions.
    Stochastic Neighbor Embedding with Gaussian and Student-t Distributions: Tutorial and Survey. (arXiv:2009.10301v2 [stat.ML] UPDATED)
    Stochastic Neighbor Embedding (SNE) is a manifold learning and dimensionality reduction method with a probabilistic approach. In SNE, every point is consider to be the neighbor of all other points with some probability and this probability is tried to be preserved in the embedding space. SNE considers Gaussian distribution for the probability in both the input and embedding spaces. However, t-SNE uses the Student-t and Gaussian distributions in these spaces, respectively. In this tutorial and survey paper, we explain SNE, symmetric SNE, t-SNE (or Cauchy-SNE), and t-SNE with general degrees of freedom. We also cover the out-of-sample extension and acceleration for these methods.  ( 2 min )
    AdaCat: Adaptive Categorical Discretization for Autoregressive Models. (arXiv:2208.02246v1 [cs.LG])
    Autoregressive generative models can estimate complex continuous data distributions, like trajectory rollouts in an RL environment, image intensities, and audio. Most state-of-the-art models discretize continuous data into several bins and use categorical distributions over the bins to approximate the continuous data distribution. The advantage is that the categorical distribution can easily express multiple modes and are straightforward to optimize. However, such approximation cannot express sharp changes in density without using significantly more bins, making it parameter inefficient. We propose an efficient, expressive, multimodal parameterization called Adaptive Categorical Discretization (AdaCat). AdaCat discretizes each dimension of an autoregressive model adaptively, which allows the model to allocate density to fine intervals of interest, improving parameter efficiency. AdaCat generalizes both categoricals and quantile-based regression. AdaCat is a simple add-on to any discretization-based distribution estimator. In experiments, AdaCat improves density estimation for real-world tabular data, images, audio, and trajectories, and improves planning in model-based offline RL.  ( 2 min )
    Stochastic Gradient Line Bayesian Optimization for Efficient Noise-Robust Optimization of Parameterized Quantum Circuits. (arXiv:2111.07952v2 [quant-ph] UPDATED)
    Optimizing parameterized quantum circuits is a key routine in using near-term quantum devices. However, the existing algorithms for such optimization require an excessive number of quantum-measurement shots for estimating expectation values of observables and repeating many iterations, whose cost has been a critical obstacle for practical use. We develop an efficient alternative optimization algorithm, stochastic gradient line Bayesian optimization (SGLBO), to address this problem. SGLBO reduces the measurement-shot cost by estimating an appropriate direction of updating circuit parameters based on stochastic gradient descent (SGD) and further utilizing Bayesian optimization (BO) to estimate the optimal step size for each iteration in SGD. In addition, we formulate an adaptive measurement-shot strategy and introduce a technique of suffix averaging to reduce the effect of statistical and hardware noise. Our numerical simulation demonstrates that the SGLBO augmented with these techniques can drastically reduce the measurement-shot cost, improve the accuracy, and make the optimization noise-robust.  ( 2 min )
    Stable and Interpretable Unrolled Dictionary Learning. (arXiv:2106.00058v5 [cs.LG] UPDATED)
    The dictionary learning problem, representing data as a combination of a few atoms, has long stood as a popular method for learning representations in statistics and signal processing. The most popular dictionary learning algorithm alternates between sparse coding and dictionary update steps, and a rich literature has studied its theoretical convergence. The success of dictionary learning relies on access to a "good" initial estimate of the dictionary and the ability of the sparse coding step to provide an unbiased estimate of the code. The growing popularity of unrolled sparse coding networks has led to the empirical finding that backpropagation through such networks performs dictionary learning. We offer the theoretical analysis of these empirical results through PUDLE, a Provable Unrolled Dictionary LEarning method. We provide conditions on the network initialization and data distribution sufficient to recover and preserve the support of the latent code. Additionally, we address two challenges; first, the vanilla unrolled sparse coding computes a biased code estimate, and second, gradients during backpropagated learning can become unstable. We show approaches to reduce the bias of the code estimate in the forward pass, and that of the dictionary estimate in the backward pass. We propose strategies to resolve the learning instability by tuning network parameters and modifying the loss function. Overall, we highlight the impact of loss, unrolling, and backpropagation on convergence. We complement our findings through synthetic and image denoising experiments. Finally, we demonstrate PUDLE's interpretability, a driving factor in designing deep networks based on iterative optimizations, by building a mathematical relation between network weights, its output, and the training set.  ( 3 min )
    Debiasing In-Sample Policy Performance for Small-Data, Large-Scale Optimization. (arXiv:2107.12438v4 [math.OC] UPDATED)
    Motivated by the poor performance of cross-validation in settings where data are scarce, we propose a novel estimator of the out-of-sample performance of a policy in data-driven optimization.Our approach exploits the optimization problem's sensitivity analysis to estimate the gradient of the optimal objective value with respect to the amount of noise in the data and uses the estimated gradient to debias the policy's in-sample performance. Unlike cross-validation techniques, our approach avoids sacrificing data for a test set, utilizes all data when training and, hence, is well-suited to settings where data are scarce. We prove bounds on the bias and variance of our estimator for optimization problems with uncertain linear objectives but known, potentially non-convex, feasible regions. For more specialized optimization problems where the feasible region is "weakly-coupled" in a certain sense, we prove stronger results. Specifically, we provide explicit high-probability bounds on the error of our estimator that hold uniformly over a policy class and depends on the problem's dimension and policy class's complexity. Our bounds show that under mild conditions, the error of our estimator vanishes as the dimension of the optimization problem grows, even if the amount of available data remains small and constant. Said differently, we prove our estimator performs well in the small-data, large-scale regime. Finally, we numerically compare our proposed method to state-of-the-art approaches through a case-study on dispatching emergency medical response services using real data. Our method provides more accurate estimates of out-of-sample performance and learns better-performing policies.  ( 3 min )
    Policy Evaluation for Temporal and/or Spatial Dependent Experiments in Ride-sourcing Platforms. (arXiv:2202.10887v2 [stat.ME] UPDATED)
    Policy evaluation based on A/B testing has attracted considerable interest in digital marketing, but such evaluation in ride-sourcing platforms (e.g., Uber and Didi) is not well studied primarily due to the complex structure of their temporal and/or spatial dependent experiments. Motivated by policy evaluation in ride-sourcing platforms, the aim of this paper is to establish causal relationship between platform's policies and outcomes of interest under a switchback design. We propose a novel potential outcome framework based on a temporal varying coefficient decision process (VCDP) model to capture the dynamic treatment effects in temporal dependent experiments. We further characterize the average treatment effect by decomposing it as the sum of direct effect (DE) and indirect effect (IE). We develop estimation and inference procedures for both DE and IE. Furthermore, we propose a spatio-temporal VCDP to deal with spatiotemporal dependent experiments. For both VCDP models, we establish the statistical properties (e.g., weak convergence and asymptotic power) of our estimation and inference procedures. We conduct extensive simulations to investigate the finite-sample performance of the proposed estimation and inference procedures. We examine how our VCDP models can help improve policy evaluation for various dispatching and dispositioning policies in Didi.  ( 3 min )
    Unified Framework for Spectral Dimensionality Reduction, Maximum Variance Unfolding, and Kernel Learning By Semidefinite Programming: Tutorial and Survey. (arXiv:2106.15379v2 [stat.ML] UPDATED)
    This is a tutorial and survey paper on unification of spectral dimensionality reduction methods, kernel learning by Semidefinite Programming (SDP), Maximum Variance Unfolding (MVU) or Semidefinite Embedding (SDE), and its variants. We first explain how the spectral dimensionality reduction methods can be unified as kernel Principal Component Analysis (PCA) with different kernels. This unification can be interpreted as eigenfunction learning or representation of kernel in terms of distance matrix. Then, since the spectral methods are unified as kernel PCA, we say let us learn the best kernel for unfolding the manifold of data to its maximum variance. We first briefly introduce kernel learning by SDP for the transduction task. Then, we explain MVU in detail. Various versions of supervised MVU using nearest neighbors graph, by class-wise unfolding, by Fisher criterion, and by colored MVU are explained. We also explain out-of-sample extension of MVU using eigenfunctions and kernel mapping. Finally, we introduce other variants of MVU including action respecting embedding, relaxed MVU, and landmark MVU for big data.  ( 3 min )
    Multimodal Controller for Generative Models. (arXiv:2002.02572v7 [cs.LG] UPDATED)
    Class-conditional generative models are crucial tools for data generation from user-specified class labels. Existing approaches for class-conditional generative models require nontrivial modifications of backbone generative architectures to model conditional information fed into the model. This paper introduces a plug-and-play module named `multimodal controller' to generate multimodal data without introducing additional learning parameters. In the absence of the controllers, our model reduces to non-conditional generative models. We test the efficacy of multimodal controllers on CIFAR10, COIL100, and Omniglot benchmark datasets. We demonstrate that multimodal controlled generative models (including VAE, PixelCNN, Glow, and GAN) can generate class-conditional images of significantly better quality when compared with conditional generative models. Moreover, we show that multimodal controlled models can also create novel modalities of images.  ( 2 min )
    Optimised one-class classification performance. (arXiv:2102.02618v3 [cs.LG] UPDATED)
    We provide a thorough treatment of one-class classification with hyperparameter optimisation for five data descriptors: Support Vector Machine (SVM), Nearest Neighbour Distance (NND), Localised Nearest Neighbour Distance (LNND), Local Outlier Factor (LOF) and Average Localised Proximity (ALP). The hyperparameters of SVM and LOF have to be optimised through cross-validation, while NND, LNND and ALP allow an efficient form of leave-one-out validation and the reuse of a single nearest-neighbour query. We experimentally evaluate the effect of hyperparameter optimisation with 246 classification problems drawn from 50 datasets. From a selection of optimisation algorithms, the recent Malherbe-Powell proposal optimises the hyperparameters of all data descriptors most efficiently. We calculate the increase in test AUROC and the amount of overfitting as a function of the number of hyperparameter evaluations. After 50 evaluations, ALP and SVM significantly outperform LOF, NND and LNND, and LOF and NND outperform LNND. The performance of ALP and SVM is comparable, but ALP can be optimised more efficiently so constitutes a good default choice. Alternatively, using validation AUROC as a selection criterion between ALP or SVM gives the best overall result, and NND is the least computationally demanding option. We thus end up with a clear trade-off between three choices, allowing practitioners to make an informed decision.  ( 3 min )
    Hierarchical Multiple-Instance Data Classification with Costly Features. (arXiv:1911.08756v5 [cs.LG] UPDATED)
    We motivate our research with a real-world problem of classifying malicious web domains using a remote service that provides various information. Crucially, some of the information can be further analyzed into a certain depth and this process sequentially creates a tree of hierarchically structured multiple-instance data. Each request sent to the remote service is associated with a cost (e.g., time or another cost per request) and the objective is to maximize the accuracy, constrained with a budget. We present a generic framework able to work with a class of similar problems. Our method is based on Classification with Costly Features (CwCF), Hierarchical Multiple-Instance Learning (HMIL) and hierarchical decomposition of the action space. It works with samples described as partially-observed trees of features of various types (similar to a JSON/XML file), which allows to model data with complex structure. The process is modeled as a Markov Decision Process (MDP), where a state represents acquired features, and actions select yet unknown ones. The policy is trained with deep reinforcement learning and we demonstrate our method with both real-world and synthetic data.  ( 3 min )
    Centroids Matching: an efficient Continual Learning approach operating in the embedding space. (arXiv:2208.02048v1 [cs.LG])
    Catastrophic forgetting (CF) occurs when a neural network loses the information previously learned while training on a set of samples from a different distribution, i.e., a new task. Existing approaches have achieved remarkable results in mitigating CF, especially in a scenario called task incremental learning. However, this scenario is not realistic, and limited work has been done to achieve good results on more realistic scenarios. In this paper, we propose a novel regularization method called Centroids Matching, that, inspired by meta-learning approaches, fights CF by operating in the feature space produced by the neural network, achieving good results while requiring a small memory footprint. Specifically, the approach classifies the samples directly using the feature vectors produced by the neural network, by matching those vectors with the centroids representing the classes from the current task, or all the tasks up to that point. Centroids Matching is faster than competing baselines, and it can be exploited to efficiently mitigate CF, by preserving the distances between the embedding space produced by the model when past tasks were over, and the one currently produced, leading to a method that achieves high accuracy on all the tasks, without using an external memory when operating on easy scenarios, or using a small one for more realistic ones. Extensive experiments demonstrate that Centroids Matching achieves accuracy gains on multiple datasets and scenarios.  ( 3 min )
    Flow Annealed Importance Sampling Bootstrap. (arXiv:2208.01893v1 [cs.LG])
    Normalizing flows are tractable density models that can approximate complicated target distributions, e.g. Boltzmann distributions of physical systems. However, current methods for training flows either suffer from mode-seeking behavior, use samples from the target generated beforehand by expensive MCMC simulations, or use stochastic losses that have very high variance. To avoid these problems, we augment flows with annealed importance sampling (AIS) and minimize the mass covering $\alpha$-divergence with $\alpha=2$, which minimizes importance weight variance. Our method, Flow AIS Bootstrap (FAB), uses AIS to generate samples in regions where the flow is a poor approximation of the target, facilitating the discovery of new modes. We target with AIS the minimum variance distribution for the estimation of the $\alpha$-divergence via importance sampling. We also use a prioritized buffer to store and reuse AIS samples. These two features significantly improve FAB's performance. We apply FAB to complex multimodal targets and show that we can approximate them very accurately where previous methods fail. To the best of our knowledge, we are the first to learn the Boltzmann distribution of the alanine dipeptide molecule using only the unnormalized target density and without access to samples generated via Molecular Dynamics (MD) simulations: FAB produces better results than training via maximum likelihood on MD samples while using 100 times fewer target evaluations. After reweighting samples with importance weights, we obtain unbiased histograms of dihedral angles that are almost identical to the ground truth ones.  ( 3 min )
    Robust PCA for Anomaly Detection and Data Imputation in Seasonal Time Series. (arXiv:2208.01998v1 [stat.ML])
    We propose a robust principal component analysis (RPCA) framework to recover low-rank and sparse matrices from temporal observations. We develop an online version of the batch temporal algorithm in order to process larger datasets or streaming data. We empirically compare the proposed approaches with different RPCA frameworks and show their effectiveness in practical situations.  ( 2 min )
    The Power and Limitation of Pretraining-Finetuning for Linear Regression under Covariate Shift. (arXiv:2208.01857v1 [cs.LG])
    We study linear regression under covariate shift, where the marginal distribution over the input covariates differs in the source and the target domains, while the conditional distribution of the output given the input covariates is similar across the two domains. We investigate a transfer learning approach with pretraining on the source data and finetuning based on the target data (both conducted by online SGD) for this problem. We establish sharp instance-dependent excess risk upper and lower bounds for this approach. Our bounds suggest that for a large class of linear regression instances, transfer learning with $O(N^2)$ source data (and scarce or no target data) is as effective as supervised learning with $N$ target data. In addition, we show that finetuning, even with only a small amount of target data, could drastically reduce the amount of source data required by pretraining. Our theory sheds light on the effectiveness and limitation of pretraining as well as the benefits of finetuning for tackling covariate shift problems.  ( 2 min )
    No Pattern, No Recognition: a Survey about Reproducibility and Distortion Issues of Text Clustering and Topic Modeling. (arXiv:2208.01712v1 [cs.LG])
    Extracting knowledge from unlabeled texts using machine learning algorithms can be complex. Document categorization and information retrieval are two applications that may benefit from unsupervised learning (e.g., text clustering and topic modeling), including exploratory data analysis. However, the unsupervised learning paradigm poses reproducibility issues. The initialization can lead to variability depending on the machine learning algorithm. Furthermore, the distortions can be misleading when regarding cluster geometry. Amongst the causes, the presence of outliers and anomalies can be a determining factor. Despite the relevance of initialization and outlier issues for text clustering and topic modeling, the authors did not find an in-depth analysis of them. This survey provides a systematic literature review (2011-2022) of these subareas and proposes a common terminology since similar procedures have different terms. The authors describe research opportunities, trends, and open issues. The appendices summarize the theoretical background of the text vectorization, the factorization, and the clustering algorithms that are directly or indirectly related to the reviewed works.  ( 3 min )
    Pyramidal Denoising Diffusion Probabilistic Models. (arXiv:2208.01864v1 [cs.CV])
    Diffusion models have demonstrated impressive image generation performance, and have been used in various computer vision tasks. Unfortunately, image generation using diffusion models is very time-consuming since it requires thousands of sampling steps. To address this problem, here we present a novel pyramidal diffusion model to generate high resolution images starting from much coarser resolution images using a single score function trained with a positional embedding. This enables a time-efficient sampling for image generation, and also solves the low batch size problem when training with limited resources. Furthermore, we show that the proposed approach can be efficiently used for multi-scale super-resolution problem using a single score function.  ( 2 min )
    Curvature-informed multi-task learning for graph networks. (arXiv:2208.01684v1 [cs.LG])
    Properties of interest for crystals and molecules, such as band gap, elasticity, and solubility, are generally related to each other: they are governed by the same underlying laws of physics. However, when state-of-the-art graph neural networks attempt to predict multiple properties simultaneously (the multi-task learning (MTL) setting), they frequently underperform a suite of single property predictors. This suggests graph networks may not be fully leveraging these underlying similarities. Here we investigate a potential explanation for this phenomenon: the curvature of each property's loss surface significantly varies, leading to inefficient learning. This difference in curvature can be assessed by looking at spectral properties of the Hessians of each property's loss function, which is done in a matrix-free manner via randomized numerical linear algebra. We evaluate our hypothesis on two benchmark datasets (Materials Project (MP) and QM8) and consider how these findings can inform the training of novel multi-task learning models.  ( 2 min )
    A Tighter Analysis of Spectral Clustering, and Beyond. (arXiv:2208.01724v1 [cs.DS])
    This work studies the classical spectral clustering algorithm which embeds the vertices of some graph $G=(V_G, E_G)$ into $\mathbb{R}^k$ using $k$ eigenvectors of some matrix of $G$, and applies $k$-means to partition $V_G$ into $k$ clusters. Our first result is a tighter analysis on the performance of spectral clustering, and explains why it works under some much weaker condition than the ones studied in the literature. For the second result, we show that, by applying fewer than $k$ eigenvectors to construct the embedding, spectral clustering is able to produce better output for many practical instances; this result is the first of its kind in spectral clustering. Besides its conceptual and theoretical significance, the practical impact of our work is demonstrated by the empirical analysis on both synthetic and real-world datasets, in which spectral clustering produces comparable or better results with fewer than $k$ eigenvectors.  ( 2 min )
    Optimal Rates for Regularized Conditional Mean Embedding Learning. (arXiv:2208.01711v1 [stat.ML])
    We address the consistency of a kernel ridge regression estimate of the conditional mean embedding (CME), which is an embedding of the conditional distribution of $Y$ given $X$ into a target reproducing kernel Hilbert space $\mathcal{H}_Y$. The CME allows us to take conditional expectations of target RKHS functions, and has been employed in nonparametric causal and Bayesian inference. We address the misspecified setting, where the target CME is in the space of Hilbert-Schmidt operators acting from an input interpolation space between $\mathcal{H}_X$ and $L_2$, to $\mathcal{H}_Y$. This space of operators is shown to be isomorphic to a newly defined vector-valued interpolation space. Using this isomorphism, we derive a novel and adaptive statistical learning rate for the empirical CME estimator under the misspecified setting. Our analysis reveals that our rates match the optimal $O(\log n / n)$ rates without assuming $\mathcal{H}_Y$ to be finite dimensional. We further establish a lower bound on the learning rate, which shows that the obtained upper bound is optimal.  ( 2 min )

  • Open

    [D] Using a product's name/description to choose among multiple detected objects?
    Hi. I'm currently working on improving an object detection model specifically made for e-commerce items. The particular problem that I'm facing is that the object detection model would catch multiple objects (e.g., for a picture of a model advertising a bag, the shirt and pants would also be caught) but I want to be able to detect only the particular item of interest. I thought that using text information would be able to help with this problem, but I'm having trouble finding any relevant work in that field. Would anybody have any ideas on some research papers or any work in that direction? Thanks. submitted by /u/Seankala [link] [comments]  ( 87 min )
    [P] A Website to generate Code Snippets, Regexes, Linux & Git & SQL Commands, HTML and CSS from a written description. Furthermore translate code snippets to many languages and get a regex explained in plain english. Moreover you can fix broken code snippets. All with the help of ML 🤖
    https://reddit.com/link/wfl4nc/video/5ntmbzj9zkf91/player https://reddit.com/link/wfl4nc/video/vul525t9zkf91/player https://reddit.com/link/wfl4nc/video/13b738nbzkf91/player Programming Function from Description Code to Explanation Fix invalid Code Translate Languages Class from Description Get Language from Code Function from Docstring Helpers Regex from Description Regex to Explanation Linux Command Get time complexity Git Command from Description Database Text Description to SQL Command Web Generate HTML from Description CSS from Description Meta Tags from Description I think this could be helpful to a lot of people (especially for beginner programmers). You can check out all functionalities on your own here: programming-helper.com Have fun using the tool ❤️ submitted by /u/Capital_Revolution35 [link] [comments]  ( 88 min )
    [D] CVAT and LabelStudio for image labeling
    We started using Label Studio but many of the annotaters we hire are familiar with CVAT which we are not big fan of (we don't like the complexity). Is there a way to let the annotators use CVAT but convert the output to something that can be read/edited in Label Studio? The other option is to train them to use Label Studio but just having a conversion tool would be much faster submitted by /u/randomtopics12 [link] [comments]  ( 88 min )
    [D] Which infrastructure do you use to train models?
    Wondering about your workflow to train large models or run batch jobs that are either too big for you laptop? Do you use AWS VMs to run them and shut them back down after, SageMaker or AzureML? I'm asking because I recently started working with https://github.com/dstackai/dstack which lets you run python jobs in AWS from your CLI but I'm not sure how others run their ML jobs. submitted by /u/dmart89 [link] [comments]  ( 131 min )
    [D] The Machine Learning Community is totally biased to positive results.
    Nearly all papers published do only include positive results but rarely conclude with statements like „we tried this but it didn’t work out“. submitted by /u/Insighteous [link] [comments]  ( 89 min )
    [D] The Machine Learning Community is totally biased to positive results.
    Nearly all papers published do only include positive results but rarely conclude with statements like „we tried this but it didn’t work out“. submitted by /u/Insighteous [link] [comments]  ( 89 min )
    [D] looking for vendor agnostic ONXX/NNEF library
    Every vendor seems to have their own api for deep learning. I’m looking to target desktops with a model that runs on the consumers computer. I’ve tried opencv dnn. But that implementation is incomplete so failed to compile my model. I’ve also looked at DirectML but that use DirectX which is windows specific. BTW how is it that a gpu vender written api is os specific. Then intel has onednn which they say is intel specific. However it only uses c++ and opencl so it might work on other gpus but I haven’t tried that yet. Are there any fully fledged libraries like this? If not what do you recommend using. submitted by /u/noahbadoa [link] [comments]  ( 87 min )
    [D] Is it just me or is Canadian (and maybe European) ML PhD programs underrated compared to US ones?
    University of Montreal has Yoshua Bengio(!), Aaron Courville, Christopher Pal and many other stellar professors, University of Toronto has Jimmy Ba, Richard Zemel and also many other established researchers in the field. But when people discuss PhD admission, they generally consider top 4s(Stanford, CMU, MIT, Berkeley) the best even though not every professor in those schools are "stars". While it is true that top 4 schools have top-notch professors but it is also true that many stellar professors work in schools that are not top 4. For example, Yann LeCun is in NYU Courant and David Blei is in Columbia. My question is, why aren't students applying to schools like UMontreal, UToronto, NYU Courant more? I would book a flight to Canada right away IF(this is a huge if but still 😂) Bengio accepts me as his masters student even though I get accepted to a fully-funded PhD program at Stanford. submitted by /u/DesperateBread3179 [link] [comments]  ( 97 min )
    [R] "What are the Red Flags for Neural Network Suffering?" - Seeds of Science call for reviewers
    What are the Red Flags for Neural Suffering? By [redacted] and [redacted] Abstract: Which kind of evidence would we need to see to believe that artificial neural networks can suffer? We review neuroscience literature, investigate behavioral arguments and propose high-level considerations that could shift our beliefs. Of these three approaches, we believe that high-level considerations, i.e. understanding under which circumstances suffering arises as an optimal training strategy, is the most promising. Our main finding, however, is that the understanding of artificial suffering is very limited and should likely get more attention. - - Seeds of Science is a new journal (funded through Scott Alexander's ACX grants program) that publishes speculative or non-traditional articles on scien…  ( 92 min )
    "What are the Red Flags for Neural Network Suffering?" - Seeds of Science call for reviewers "[Research]"
    Seeds of Science is a new journal (funded through Scott Alexander's ACX grants program) that publishes speculative or non-traditional articles on scientific topics. Peer review is conducted through community-based voting and commenting by a diverse network of reviewers (or "gardeners" as we call them). We just sent out an article for review - "What are the Red Flags for Neural Network Suffering?" - that may be of interest to some in the r/MachineLearning, so I wanted to see if anyone would be interested in joining us a gardener to review the article. It is free to join and anyone is welcome (we currently have gardeners from all levels of academia and outside of it). Participation is entirely voluntary - we send you submitted articles and you can choose to vote/comment or abstain without …  ( 89 min )
    [D] Building a model from scratch VS. Open-source implementation
    When would you consider building a model from scratch in say Pytorch or TF rather than just using some open-source implementation (say from Github) and why? submitted by /u/Inquation [link] [comments]  ( 121 min )
    [Research], [R]: Research Study on Data Labelling Tools & Bias in AI - Participants Needed (Paid)
    ​ https://preview.redd.it/7kan1u3g6jf91.jpg?width=1587&format=pjpg&auto=webp&s=17deeb4b0fc09bfec4e315eb14eb21c7a9144e61 ARE YOU INTERESTED IN BIAS IN ARTIFICIAL INTELLIGENCE/MACHINE LEARNING? Hi All, My name is India Semper-Hughes and I am a Human-Computer Interaction (HCI) student at City, University of London (United Kingdom). I am conducting a research project as part of my MSc programme and am looking to interview people who have done data annotation/labelling work (in particular, though not limited to, annotators who have done Natural Language processing annotation work). Participation in the interview would be paid at an agreed hourly rate (I am open to suggestions as to what you think a fair rate would be) and all data collected with by anonymised and kept securely. Interviews …  ( 89 min )
    [D] Opinions about TabNet
    The TabNet paper claims some impressive performance on various tabular datasets -- outperforming both more traditional neural networks as well as tree-based algorithms such as XGBoost. But I've also heard anecdotal reports of TabNet performing poorly in industry. Does anyone have any experience with TabNet in the real world, or insight into why this discrepancy might happen? Here's the link to the paper: https://arxiv.org/abs/1908.07442 submitted by /u/_aitalks_ [link] [comments]  ( 88 min )
    [D] Characteristics of a dynamical system from a deep learning model
    Lets say, i have a model f which takes a D dimensional state at time t and outputs a D dimensional state at time t+1. Since, this model f kindof works like a state equation of a dynamical system, I was wondering what are some of the characteristics of the dynamical system that I can work on without really knowing what f is? submitted by /u/Labib666Camp [link] [comments]  ( 88 min )
    [D] Difference between PINN and PGNN
    PINN's: https://arxiv.org/pdf/2104.02556.pdf PGNN's: https://arxiv.org/abs/1710.11431 Hi all. There is a task to use neural networks to build some kind of hybrid model, the advantages of which are a more accurate solution and speed over the classical analytical solution of partial differential equations. By researching articles, I came to the conclusion that there are two types of "physical" neural networks. Networks based on the direct solution of partial differential equations. But based on the described approach, when using PINN, it is impossible to make a prediction on an unknown time interval for the network, or rather, it gives an extremely poor result. There is also some PHNN, which also takes into account the physicality of what is happening, but there is no drawback of the previous type. I can make predictions for the next interval. Question: which approach is better? submitted by /u/Adventurous_Guitar59 [link] [comments]  ( 88 min )
    [P] What we learned by benchmarking TorchDynamo (PyTorch team), ONNX Runtime and TensorRT on transformers model (inference)
    TL;DR: TorchDynamo (prototype from PyTorch team) plus nvfuser (from Nvidia) backend makes Bert (the tool is model agnostic) inference on PyTorch > 3X faster most of the time (it depends on input shape) by just adding a single line of code in Python script. The surprising thing is that during the benchmark, we have not seen any drawback implied by the use of this library, the acceleration just comes for free. On the same model, TensorRT is (of course) much faster, > 5X at least (and even more at batch size 1 which is impressive) but comes with its own complexity. The tool being a prototype, better performances are to be expected with more mature support of some backends, in particular regarding fx2trt (aka TensorRT mixed with PyTorch)! Our TorchDynamo benchmark notebook can be found there:…  ( 95 min )
    [D] Need a better speaker annotation tool
    I do not know if this is the correct subreddit to post this or not (if not please guide me) but I need a better voice annotation tool than this one (https://github.com/gong-io/gecko). Can anyone help? submitted by /u/Dot_in_a_2D_plane [link] [comments]  ( 87 min )
    [D] Fan-made NeurIPS 2022 Movie Trailer
    https://twitter.com/postrat_dril/status/1554255464505950210?s=20&t=bIUCJA4xo_Lp2jyNfCOkgw Pardon the $h1t-Post, but we should all just laugh at ourselves every once in a while :) submitted by /u/iidealized [link] [comments]  ( 88 min )
    [P] Gradient free methodologies and algorithms for training Neural Nets
    Hi, everyone I'm looking forward to analyzing Gradient free methodologies and algorithms for training Neural Nets. So far I have discovered that any Gradient free optimization methodology (e.g. Particle Swarm Optimization) can be practically applied for training Neural Networks. However, there are algorithms that have been analyzed more extensively in the bibliography (e.g. ADMM or BCD variants). Have you in mind any other Gradient-free algorithm that has been extensively used for Gradient-free Neural Networks Training? By the way, Is there any article online that summarizes all those Gradient free methodologies? ​ Thank you in advance for any of your answers submitted by /u/Suitable_Pea_6866 [link] [comments]  ( 127 min )
    [P] Tensorflow implementation of "Tackling the Generative Learning Trilemma with Denoising Diffusion GANs" (ICLR 2022 Spotlight)
    ​ teaser Abstract A wide variety of deep generative models has been developed in the past decade. Yet, these models often struggle with simultaneously addressing three key require- ments including: high sample quality, mode coverage, and fast sampling. We call the challenge imposed by these requirements the generative learning trilemma, as the existing models often trade some of them for others. Particularly, denoising diffusion models have shown impressive sample quality and diversity, but their ex- pensive sampling does not yet allow them to be applied in many real-world appli- cations. In this paper, we argue that slow sampling in these models is fundamen- tally attributed to the Gaussian assumption in the denoising step which is justified only for small step sizes. To enable denoising with large steps, and hence, to re- duce the total number of denoising steps, we propose to model the denoising distri- bution using a complex multimodal distribution. We introduce denoising diffusion generative adversarial networks (denoising diffusion GANs) that model each de- noising step using a multimodal conditional GAN. Through extensive evaluations, we show that denoising diffusion GANs obtain sample quality and diversity com- petitive with original diffusion models while being 2000× faster on the CIFAR-10 dataset. Compared to traditional GANs, our model exhibits better mode coverage and sample diversity. To the best of our knowledge, denoising diffusion GAN is the first model that reduces sampling cost in diffusion models to an extent that al- lows them to be applied to real-world applications inexpensively. submitted by /u/taki0112 [link] [comments]  ( 89 min )
    whats the deal with local minima [D]
    hey all-- there are two things that I've heard about nn training that don't quite jive right together: Thing 1: Neural Networks tend to converge to local minima, and in statistical contexts this is Good, as to reach the global minimum would be overfitting to a heinous degree Thing 2: In optimization spaces with extremely high dimensionality, local minima are basically nonexistent-- in order for a critical point to be a local minimum, the second derivative must be positive along every single dimension, which is thermodynamically unlikely as the number of dimensions gets very large. (there are a lot of symmetries in the loss space of a model which necessarily means there are many global minima, but consider just the space of functions and view the parameter space of the nn as an extremely redundant unfolding of that space) So, either we are approximating global minima or we aren't when we train these things. So at first glance it appears one of the two Things above is wrong-- some theories I have: Thing 2 is just misleading, just because local minima are vanishingly rare compared to critical points in general doesn't mean there aren't a lot of them out there. Thing 2 is not misleading in general random loss landscapes but something about common architecture/loss structures lends itself to local minima Thing 1 is slightly misleading, neural networks tend to converge to saddle points/plateaus that ADAM can't find its way out of. No one knows anything, black box goes brrr If anyone has any insight pls lmk! submitted by /u/abcdchop [link] [comments]  ( 93 min )
  • Open

    Is RL upside down the new standard?
    My colleague seems to think that RL-upside-down is the new standard in RL since it apparently is able to reduce RL to a supervised learning problem. I'm curious what you're guys' experience with this is & if you think it can replace RL in general? I've heard that google is doing something similar with transformers & that it apparently allows training quite large networks which are good at transfer learning between games for instance. submitted by /u/Udon_noodles [link] [comments]  ( 95 min )
    Cartpole game to reach 1000 timesteps
    I wrote an algorithm on playing the Cartpole game using just Q-Learning, the agent is doing good. But it keeps falling, what I did was I trained it for 10k episodes and then I tested it by just playing the game without updating Q-values. Just by playing based on past Q(s,a) matrix from training. The agent performs well on the testing but it doesn't stand up straight forever.. Any recommendations? ​ https://preview.redd.it/n88hikaykkf91.png?width=626&format=png&auto=webp&s=50369e34e1315a865de3230dc9f9d9486bf1642d submitted by /u/Alternative-Price-27 [link] [comments]  ( 101 min )
    Benchmark for vanilla deep off policy policy gradient ?
    I know highly distributed algorithms like Impala or V-trace, but I’ve never seen a benchmark on classical benchmarks like Atari and Mojuco for the vanilla version (see off policy actor critic) submitted by /u/Jogima-cyber [link] [comments]  ( 86 min )
    Benchmark for vanilla deep off policy policy gradient ?
    I know highly distributed algorithms like Impala or V-trace, but I’ve never seen a benchmark on classical benchmarks like Atari and Mojuco for the vanilla version (see off policy actor critic) submitted by /u/Jogima-cyber [link] [comments]  ( 86 min )
    Reward design
    Hi, is there some useful resources how to correct design my reward? For example, i my case i got 4 values, and when value1 is ->1, value 2 should too be 1, but values 3 and 4 must be 0. (Its continuous). How to correct design reward for it that should be from 0 to 1? I got val1- val3/val2-val4 but its incorrect, nn cannot distinguish which value to maximize submitted by /u/IndependenceCivil576 [link] [comments]  ( 86 min )
    Any Sample resume with RL experience?
    I have never seen a resume with an extensive experience in RL. I don't know what kind of projects are usually shown and how are these peojects explained in the resumes. What kind of metrics and highlighting points. That's what I wanna see. submitted by /u/gaurjimmy [link] [comments]  ( 86 min )
    Why do bellman error gradients become big?
    I am reading these notes on slide 34 and came across strategies to prevent gradients from becoming too big in Deep Q Learning (DQN). Since, we don't usually use deep architectures in DQN, I don't think it's an exploding gradient problem. My understanding is that it has something to do with the linear regression squared error loss function, since DQN is a regression network. Could someone please explain it to me? I remember reading somewhere that the large errors drive the gradients in a linear regression problem. Perhaps that's why bellman errors become big? submitted by /u/Academic-Rent7800 [link] [comments]  ( 87 min )
    How does testing differ in Reinforcement Learning as compared to supervised learning. From what i have learned even in the testing phase the RL agent is constantly trying to improve its policy. Is this correct . Are there any other differences also .
    It seems like testing/evaluation is simply a continuation of training and we can see the same result in training itself which we can get in testing. Is there any learning happening during testing as we are accumulating the rewards. Also i think training and testing data is clearly bifurcated in supervised learning , this difference seems somewhat less significant in Reinforcement Learning. submitted by /u/aabra__ka__daabra [link] [comments]  ( 104 min )
    After training the RL agent with DDPG algorithm, how do we perform the testing. Should we just repeat the same training algorithm by substituting the initial actor-critic network parameters with the trained parameters and/or something else ? What is the general way of testing procedure in RL ?
    What all parameters are required to be changed when we are doing the testing. Similiarly for the multi agents settings, do we follow the same procedure ? submitted by /u/aabra__ka__daabra [link] [comments]  ( 87 min )
    Pug in Hole with target behind obstacle
    Hey all, I am working on a task where a 6 axis robot has to place segments into their designated positions to build a ring. A ring is built by 5 segments, of which 4 of them can be inserted more or less with the direct trajectory, but the last, 5th segment needs to be "slided" horizontally into its place (see image), as it would get stuck moving the direct path. Placing segment 1-4 works fine, but my agent just doesnt get how to place segment 5. I tried training a seperate agent on only placing the 5th segment, but it also does not figure out to first go next to the designated position and then to slide it in. Instead it always tries the direct path, which results into collision and the segment being stuck. Am I missing something obvious? My environment works like that: Obs. Space: position_desired, position_current, orientation_desired, orientation_current, position_distance, orientation_distance, collision_detected, segment_id reward is kept ~ [-1,0] reward = - (position_distance + orientation_distance) / 25 if collision_detected: reward += -0.5 if position_distance < position_threshold: reward += 500, done = True I use PPO, batch size 80k (one Episode is max. 2k timesteps) , lr_schedule = 0.001(with decay) Anyone has a tipp what else I could try or maybe some corresponding literature with a similar problem? https://preview.redd.it/zpmogk492hf91.png?width=1234&format=png&auto=webp&s=0f83dd27d371fac60655a8a81398528c3799e9ae submitted by /u/disdisinform [link] [comments]  ( 97 min )
    How to partition the belief space of a POMDP using a "granularity" parameters?
    As I understand, to a solve a pomdp we transform it into a belief-MDP. The value function for this belief-MDP is proven to be piecewise linear and convex (PWLC) [Smallwood and Sondik, 1973].To apply value iteration, we need to partition the belief space into regions with the same value function i.e line segment. One of the algorithms being used is introducing a granularity parameter that starts at 1 and decreases each time step. I am trying to understand how this algorithm works exactly but I am unable to find a concrete explanation or example. can anyone explain this approach or refer me to an explanation? submitted by /u/souhaielbensalem [link] [comments]  ( 87 min )
    New to ML: How do we incentivize a machine learning algorithm with a “reward” for accomplishing a task and why does the Al algorithm even care about a reward at all?
    submitted by /u/rdsyes [link] [comments]  ( 90 min )
    "How does in-context learning work? A framework for understanding the differences from traditional supervised learning"
    submitted by /u/gwern [link] [comments]  ( 86 min )
    "TextWorldExpress: Simulating Text Games at One Million Steps Per Second", Jansen & Côté 2022
    submitted by /u/gwern [link] [comments]  ( 86 min )
    I would like to ask: If there are two sub-optimization problems, the meta-deep reinforcement learning could be applied to solve the problems.
    There are two sub-optimization problems where the sub (1) is to optimize x, and x is a discrete value ∈[0,Π]. When a value of X is randomly selected, it will be imported to sub (2) as an input.DRL algorithm is applied to find best scheme for the sub (2). Whereas, The two sub-problems are correlated and viewed an overall optimization problem. And the problem is find best value of x and its corresponding scheme, so I would like to ask: the Meta-DRL could be used to solve the problem ? submitted by /u/Ke_Lu_XJTU [link] [comments]  ( 86 min )
  • Open

    Amazon Comprehend announces lower annotation limits for custom entity recognition
    Amazon Comprehend is a natural-language processing (NLP) service you can use to automatically extract entities, key phrases, language, sentiments, and other insights from documents. For example, you can immediately start detecting entities such as people, places, commercial items, dates, and quantities via the Amazon Comprehend console, AWS Command Line Interface, or Amazon Comprehend APIs. In […]  ( 8 min )
    Promote feature discovery and reuse across your organization using Amazon SageMaker Feature Store and its feature-level metadata capability
    Amazon SageMaker Feature Store helps data scientists and machine learning (ML) engineers securely store, discover, and share curated data used in training and prediction workflows. Feature Store is a centralized store for features and associated metadata, allowing features to be easily discovered and reused by data scientist teams working on different projects or ML models. […]  ( 7 min )
  • Open

    Building Efficient Multiple Visual Domain Models with Multi-path Neural Architecture Search
    Posted by Qifei Wang, Senior Software Engineer, and Feng Yang, Senior Staff Software Engineer, Google Research Deep learning models for visual tasks (e.g., image classification) are usually trained end-to-end with data from a single visual domain (e.g., natural images or computer generated images). Typically, an application that completes visual tasks for multiple domains would need to build multiple models for each individual domain, train them independently (meaning no data is shared between domains), and then at inference time each model would process domain-specific input data. However, early layers between these models generate similar features, even for different domains, so it can be more efficient — decreasing latency and power consumption, lower memory overhead to store parameter…  ( 26 min )
    Efficient Sequence Modeling for On-Device ML
    Posted by Arun Kandoor, Software Engineer, Google Research The increasing demand for machine learning (ML) model inference on-device (for mobile devices, tablets, etc.) is driven by the rise of compute-intensive applications, the need to keep certain data on device for privacy and security reasons, and the desire to provide services when a network connection may not be available. However, on-device inference introduces a myriad of challenges, ranging from modeling to platform support requirements. These challenges relate to how different architectures are designed to optimize memory and computation, while still trying to maintain the quality of the model. From a platform perspective, the issue is identifying operations and building on top of them in a way that can generalize well across …  ( 23 min )
  • Open

    Survey: Perspectives that guide your stance on AI alignment
    If you have 8 minutes to spare for my research project, follow the link below! I'd like to hear your hypotheses about what leads people to see AI risk as important. I will test the most promising ones in a future poll. Many thanks! https://docs.google.com/forms/d/e/1FAIpQLScT7M4_FssgBm6vvypNBW4gagzvESu5kJGP1j21CaU3N88rVw/viewform?usp=sf_link submitted by /u/kyrgyzstanec [link] [comments]  ( 86 min )
    How did this guy make this hilarious audio deepfake? What software did he use?
    2 years ago, someone released an audio deepfake of Jordan Peterson reading absurdly vulgar rap lyrics. It was pretty amazing: video here I want to learn how this was done and if any improvements to this process have been implemented since. What’s the easiest and most straightforward way to feed an algorithm hours of audio content of a person’s voice and synthesize an artificial replica of their voice that you can make say anything? submitted by /u/DJSpook [link] [comments]  ( 86 min )
    MIT Claims New Artificial Neuron 1 Million Times Faster Than the Real Thing
    submitted by /u/estasfuera [link] [comments]  ( 91 min )
    What I need to create AI
    I am currently creating a video game in Unreal Engine 4. It is a adventure-rogue game, where I need some sort of AI to control the enemies so that they move around the arena and attack me. Can you give me some guidelines on what I should learn/what resources I should use to create an AI? data structures? algorithms? Some advanced tutorials? ​ (Currently, I know c++ in terms of programming languages) submitted by /u/NaviteLogger5547 [link] [comments]  ( 87 min )
    AGI Alignment additional thoughts
    submitted by /u/HumanSeeing [link] [comments]  ( 87 min )
    I had Blake Lemoine, the fired Google researcher who believe his computer was sentient, on my podcast. Just debuted today, and free for anyone who wants to listen. Enjoy!
    submitted by /u/felixanderfelixander [link] [comments]  ( 86 min )
    Secret Chapel In the Forrest
    submitted by /u/widgia [link] [comments]  ( 85 min )
    Conversational Analysis AI tool
    Hi there, Hope everyone is doing well and enjoying their summer. I and a few people are starting this project, where we will be developing a Conversational Analysis AI tool to detect visual and tonal markers. We are looking for people who would be interested in joining and helping us with the challenges we will inevitably face during the creation of this project. Anyone interested and up for the task and journey could DM me and we can jump on a call. It will be awesome to meet people who will be interested to contribute and build something of their own and push the boundaries of technology. submitted by /u/DragonflyLatter9068 [link] [comments]  ( 86 min )
    AI Manifest: Digital Planet | Cinematic | 4K UHD | 60 FPS
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 86 min )
    Using artificial intelligence to control digital manufacturing
    submitted by /u/qptbook [link] [comments]  ( 85 min )
    Python for Programmers: Big Data and Artificial Intelligence Case Studies (PDF Book for FREE Download)
    https://morioh.com/p/afca6f2eec16 submitted by /u/NoahButler890x [link] [comments]  ( 92 min )
    Minions wearing cat maid costume
    ​ https://preview.redd.it/g8f52pek8gf91.png?width=1124&format=png&auto=webp&s=84ef89da52541ae94ab6adc0fe0b362daa7d2e76 submitted by /u/youhave69seconds [link] [comments]  ( 86 min )
    Uncanny Pikachu made with craiyon.com
    submitted by /u/youhave69seconds [link] [comments]  ( 85 min )
    Hands, the true nemesis of AI image generation? Is there a solution?
    Drawing good images of hands is a problem for human artists. And with AI image generators, I've noticed that even DALL-E 2 can't consistently produce good hands. GAN models have done amazingly well with human faces, but I believe they have their limitations. As do diffusion models. Is there some other approach that would work more consistently, and is anyone exploring it? Note: I'm merely an enthusiastic observer when it comes to these issues, so I won't be able to understand any overly technical explanations. At the moment I'm just trying to teach myself a little Python, and hitting the same problems over and over. "What the f*** do you mean that 'n++' is invalid syntax?! Is there a module I can import?" submitted by /u/Abstract_Albatross [link] [comments]  ( 86 min )
  • Open

    NVIDIA Jetson AGX Orin 32GB Production Modules Now Available; Partner Ecosystem Appliances and Servers Arrive
    Bringing new AI and robotics applications and products to market, or supporting existing ones, can be challenging for developers and enterprises. The NVIDIA Jetson AGX Orin 32GB production module — available now — is here to help. Nearly three dozen technology providers in the NVIDIA Partner Network worldwide are offering commercially available products powered by Read article > The post NVIDIA Jetson AGX Orin 32GB Production Modules Now Available; Partner Ecosystem Appliances and Servers Arrive appeared first on NVIDIA Blog.  ( 6 min )
    Music to the Gears: NVIDIA’s Clément Farabet on Orchestrating AI Training for Autonomous Vehicles
    Autonomous vehicles are one of the most complex AI challenges of our time. For AVs to operate safely in the real world, the networks running within them must come together as an intricate symphony, which requires intensive training, testing and validation on massive amounts of data. Clément Farabet, vice president of AI infrastructure at NVIDIA, Read article > The post Music to the Gears: NVIDIA’s Clément Farabet on Orchestrating AI Training for Autonomous Vehicles appeared first on NVIDIA Blog.  ( 5 min )
  • Open

    Inline computed content in org-mode
    The previous post discussed how to use org-mode as a notebook. You can have blocks of code and blocks of results, analogous to cells in a Jupyter notebook. The code and the results export as obvious blocks when you export the org file to another format, such as LaTeX or HTML. And that’s fine for […] Inline computed content in org-mode first appeared on John D. Cook.  ( 5 min )
  • Open

    The (In-Person) ICRA 2022 Conference in Philadelphia
    At long last, after more than two years of virtual conferences, last May I attended an in-person conference, the 2022 International Conference on Robotics and Automation (ICRA), from May 23-27. The last in-person conferences I attended were ISRR 2019 in Hanoi, Vietnam and NeurIPS 2019 in Vancouver, Canada (blog posts are here and here). Apologies for the massive months-long delay in blogging. One challenge with ICRA’s timing is that it was few weeks before the CoRL 2022 deadline, and so I (and many other attendees, as I would soon learn) were busy trying to work on our paper submissions. Background and Context ICRA is a large conference, held annually since 1984. You can find the list of past and future venues here. The last full in-person ICRA was in 2019 in Montreal, Canada. This year, …  ( 8 min )
  • Open

    New algorithm aces university math course questions
    Researchers use machine learning to automatically solve, explain, and generate university-level math problems at a human level.  ( 8 min )
  • Open

    Unimodal Mono-Partite Matching in a Bandit Setting. (arXiv:2208.01511v1 [cs.LG])
    We tackle a new emerging problem, which is finding an optimal monopartite matching in a weighted graph. The semi-bandit version, where a full matching is sampled at each iteration, has been addressed by \cite{ADMA}, creating an algorithm with an expected regret matching $O(\frac{L\log(L)}{\Delta}\log(T))$ with $2L$ players, $T$ iterations and a minimum reward gap $\Delta$. We reduce this bound in two steps. First, as in \cite{GRAB} and \cite{UniRank} we use the unimodality property of the expected reward on the appropriate graph to design an algorithm with a regret in $O(L\frac{1}{\Delta}\log(T))$. Secondly, we show that by moving the focus towards the main question `\emph{Is user $i$ better than user $j$?}' this regret becomes $O(L\frac{\Delta}{\tilde{\Delta}^2}\log(T))$, where $\Tilde{\Delta} > \Delta$ derives from a better way of comparing users. Some experimental results finally show these theoretical results are corroborated in practice.
    Knowledge mining of unstructured information: application to cyber-domain. (arXiv:2109.03848v3 [cs.CR] UPDATED)
    Information on cyber-related crimes, incidents, and conflicts is abundantly available in numerous open online sources. However, processing the large volumes and streams of data is a challenging task for the analysts and experts, and entails the need for newer methods and techniques. In this article we present and implement a novel knowledge graph and knowledge mining framework for extracting the relevant information from free-form text about incidents in the cyberdomain. The framework includes a machine learning based pipeline for generating graphs of organizations, countries, industries, products and attackers with a non-technical cyber-ontology. The extracted knowledge graph is utilized to estimate the incidence of cyberattacks on a given graph configuration. We use publicly available collections of real cyber-incident reports to test the efficacy of our methods. The knowledge extraction is found to be sufficiently accurate, and the graph-based threat estimation demonstrates a level of correlation with the actual records of attacks. In practical use, an analyst utilizing the presented framework can infer additional information from the current cyber-landscape in terms of risk to various entities and propagation of the risk heuristic between industries and countries.
    Accoustate: Auto-annotation of IMU-generated Activity Signatures under Smart Infrastructure. (arXiv:2112.06651v2 [eess.SP] UPDATED)
    Human activities within smart infrastructures generate a vast amount of IMU data from the wearables worn by individuals. Many existing studies rely on such sensory data for human activity recognition (HAR); however, one of the major bottlenecks is their reliance on pre-annotated or labeled data. Manual human-driven annotations are neither scalable nor efficient, whereas existing auto-annotation techniques heavily depend on video signatures. Still, video-based auto-annotation needs high computation resources and has privacy concerns when the data from a personal space, like a smart-home, is transferred to the cloud. This paper exploits the acoustic signatures generated from human activities to label the wearables' IMU data at the edge, thus mitigating resource requirement and data privacy concerns. We utilize acoustic-based pre-trained HAR models for cross-modal labeling of the IMU data even when two individuals perform simultaneous but different activities under the same environmental context. We observe that non-overlapping acoustic gaps exist with a high probability during the simultaneous activities performed by two individuals in the environment's acoustic context, which helps us resolve the overlapping activity signatures to label them individually. A principled evaluation of the proposed approach on two real-life in-house datasets further augmented to create a dual occupant setup, shows that the framework can correctly annotate a significant volume of unlabeled IMU data from both individuals with an accuracy of $\mathbf{82.59\%}$ ($\mathbf{\pm 17.94\%}$) and $\mathbf{98.32\%}$ ($\mathbf{\pm 3.68\%}$), respectively, for a workshop and a kitchen environment.
    Deconstructing Self-Supervised Monocular Reconstruction: The Design Decisions that Matter. (arXiv:2208.01489v1 [cs.CV])
    This paper presents an open and comprehensive framework to systematically evaluate state-of-the-art contributions to self-supervised monocular depth estimation. This includes pretraining, backbone, architectural design choices and loss functions. Many papers in this field claim novelty in either architecture design or loss formulation. However, simply updating the backbone of historical systems results in relative improvements of 25%, allowing them to outperform the majority of existing systems. A systematic evaluation of papers in this field was not straightforward. The need to compare like-with-like in previous papers means that longstanding errors in the evaluation protocol are ubiquitous in the field. It is likely that many papers were not only optimized for particular datasets, but also for errors in the data and evaluation criteria. To aid future research in this area, we release a modular codebase, allowing for easy evaluation of alternate design decisions against corrected data and evaluation criteria. We re-implement, validate and re-evaluate 16 state-of-the-art contributions and introduce a new dataset (SYNS-Patches) containing dense outdoor depth maps in a variety of both natural and urban scenes. This allows for the computation of informative metrics in complex regions such as depth boundaries.
    Generalization Bounds in the Predict-then-Optimize Framework. (arXiv:1905.11488v3 [cs.LG] UPDATED)
    The predict-then-optimize framework is fundamental in many practical settings: predict the unknown parameters of an optimization problem, and then solve the problem using the predicted values of the parameters. A natural loss function in this environment is to consider the cost of the decisions induced by the predicted parameters, in contrast to the prediction error of the parameters. This loss function was recently introduced in Elmachtoub and Grigas (2022) and referred to as the Smart Predict-then-Optimize (SPO) loss. In this work, we seek to provide bounds on how well the performance of a prediction model fit on training data generalizes out-of-sample, in the context of the SPO loss. Since the SPO loss is non-convex and non-Lipschitz, standard results for deriving generalization bounds do not apply. We first derive bounds based on the Natarajan dimension that, in the case of a polyhedral feasible region, scale at most logarithmically in the number of extreme points, but, in the case of a general convex feasible region, have linear dependence on the decision dimension. By exploiting the structure of the SPO loss function and a key property of the feasible region, which we denote as the strength property, we can dramatically improve the dependence on the decision and feature dimensions. Our approach and analysis rely on placing a margin around problematic predictions that do not yield unique optimal solutions, and then providing generalization bounds in the context of a modified margin SPO loss function that is Lipschitz continuous. Finally, we characterize the strength property and show that the modified SPO loss can be computed efficiently for both strongly convex bodies and polytopes with an explicit extreme point representation.
    Context-Aware Drift Detection. (arXiv:2203.08644v2 [stat.ML] UPDATED)
    When monitoring machine learning systems, two-sample tests of homogeneity form the foundation upon which existing approaches to drift detection build. They are used to test for evidence that the distribution underlying recent deployment data differs from that underlying the historical reference data. Often, however, various factors such as time-induced correlation mean that batches of recent deployment data are not expected to form an i.i.d. sample from the historical data distribution. Instead we may wish to test for differences in the distributions conditional on \textit{context} that is permitted to change. To facilitate this we borrow machinery from the causal inference domain to develop a more general drift detection framework built upon a foundation of two-sample tests for conditional distributional treatment effects. We recommend a particular instantiation of the framework based on maximum conditional mean discrepancies. We then provide an empirical study demonstrating its effectiveness for various drift detection problems of practical interest, such as detecting drift in the distributions underlying subpopulations of data in a manner that is insensitive to their respective prevalences. The study additionally demonstrates applicability to ImageNet-scale vision problems.
    ESS: Learning Event-based Semantic Segmentation from Still Images. (arXiv:2203.10016v2 [cs.CV] UPDATED)
    Retrieving accurate semantic information in challenging high dynamic range (HDR) and high-speed conditions remains an open challenge for image-based algorithms due to severe image degradations. Event cameras promise to address these challenges since they feature a much higher dynamic range and are resilient to motion blur. Nonetheless, semantic segmentation with event cameras is still in its infancy which is chiefly due to the lack of high-quality, labeled datasets. In this work, we introduce ESS (Event-based Semantic Segmentation), which tackles this problem by directly transferring the semantic segmentation task from existing labeled image datasets to unlabeled events via unsupervised domain adaptation (UDA). Compared to existing UDA methods, our approach aligns recurrent, motion-invariant event embeddings with image embeddings. For this reason, our method neither requires video data nor per-pixel alignment between images and events and, crucially, does not need to hallucinate motion from still images. Additionally, we introduce DSEC-Semantic, the first large-scale event-based dataset with fine-grained labels. We show that using image labels alone, ESS outperforms existing UDA approaches, and when combined with event labels, it even outperforms state-of-the-art supervised approaches on both DDD17 and DSEC-Semantic. Finally, ESS is general-purpose, which unlocks the vast amount of existing labeled image datasets and paves the way for new and exciting research directions in new fields previously inaccessible for event cameras.
    WayFAST: Navigation with Predictive Traversability in the Field. (arXiv:2203.12071v2 [cs.RO] UPDATED)
    We present a self-supervised approach for learning to predict traversable paths for wheeled mobile robots that require good traction to navigate. Our algorithm, termed WayFAST (Waypoint Free Autonomous Systems for Traversability), uses RGB and depth data, along with navigation experience, to autonomously generate traversable paths in outdoor unstructured environments. Our key inspiration is that traction can be estimated for rolling robots using kinodynamic models. Using traction estimates provided by an online receding horizon estimator, we are able to train a traversability prediction neural network in a self-supervised manner, without requiring heuristics utilized by previous methods. We demonstrate the effectiveness of WayFAST through extensive field testing in varying environments, ranging from sandy dry beaches to forest canopies and snow covered grass fields. Our results clearly demonstrate that WayFAST can learn to avoid geometric obstacles as well as untraversable terrain, such as snow, which would be difficult to avoid with sensors that provide only geometric data, such as LiDAR. Furthermore, we show that our training pipeline based on online traction estimates is more data-efficient than other heuristic-based methods.
    Unsupervised and Supervised Principal Component Analysis: Tutorial. (arXiv:1906.03148v2 [stat.ML] UPDATED)
    This is a detailed tutorial paper which explains the Principal Component Analysis (PCA), Supervised PCA (SPCA), kernel PCA, and kernel SPCA. We start with projection, PCA with eigen-decomposition, PCA with one and multiple projection directions, properties of the projection matrix, reconstruction error minimization, and we connect to autoencoder. Then, PCA with singular value decomposition, dual PCA, and kernel PCA are covered. SPCA using both scoring and Hilbert-Schmidt independence criterion are explained. Kernel SPCA using both direct and dual approaches are then introduced. We cover all cases of projection and reconstruction of training and out-of-sample data. Finally, some simulations are provided on Frey and AT&T face datasets for verifying the theory in practice.
    Residual Tensor Train: A Quantum-inspired Approach for Learning Multiple Multilinear Correlations. (arXiv:2108.08659v2 [cs.LG] UPDATED)
    States of quantum many-body systems are defined in a high-dimensional Hilbert space, where rich and complex interactions among subsystems can be modelled. In machine learning, complex multiple multilinear correlations may also exist within input features. In this paper, we present a quantum-inspired multilinear model, named Residual Tensor Train (ResTT), to capture the multiple multilinear correlations of features, from low to high orders, within a single model. ResTT is able to build a robust decision boundary in a high-dimensional space for solving fitting and classification tasks. In particular, we prove that the fully-connected layer and the Volterra series can be taken as special cases of ResTT. Furthermore, we derive the rule for weight initialization that stabilizes the training of ResTT based on a mean-field analysis. We prove that such a rule is much more relaxed than that of TT, which means ResTT can easily address the vanishing and exploding gradient problem that exists in the existing TT models. Numerical experiments demonstrate that ResTT outperforms the state-of-the-art tensor network and benchmark deep learning models on MNIST and Fashion-MNIST datasets. Moreover, ResTT achieves better performance than other statistical methods on two practical examples with limited data which are known to have complex feature interactions.
    Q4EDA: A Novel Strategy for Textual Information Retrieval Based on User Interactions with Visual Representations of Time Series. (arXiv:2101.08655v2 [cs.HC] UPDATED)
    Knowing how to construct text-based Search Queries (SQs) for use in Search Engines (SEs) such as Google or Wikipedia has become a fundamental skill. Though much data are available through such SEs, most structured datasets live outside their scope. Visualization tools aid in this limitation, but no such tools come close to the sheer amount of information available through general-purpose SEs. To fill this gap, this paper presents Q4EDA, a novel framework that converts users' visual selection queries executed on top of time series visual representations, providing valid and stable SQs to be used in general-purpose SEs and suggestions of related information. The usefulness of Q4EDA is presented and validated by users through an application linking a Gapminder's line-chart replica with a SE populated with Wikipedia documents, showing how Q4EDA supports and enhances exploratory analysis of United Nations world indicators. Despite some limitations, Q4EDA is unique in its proposal and represents a real advance towards providing solutions for querying textual information based on user interactions with visual representations.
    MT-SNN: Spiking Neural Network that Enables Single-Tasking of Multiple Tasks. (arXiv:2208.01522v1 [cs.NE])
    In this paper we explore capabilities of spiking neural networks in solving multi-task classification problems using the approach of single-tasking of multiple tasks. We designed and implemented a multi-task spiking neural network (MT-SNN) that can learn two or more classification tasks while performing one task at a time. The task to perform is selected by modulating the firing threshold of leaky integrate and fire neurons used in this work. The network is implemented using Intel's Lava platform for the Loihi2 neuromorphic chip. Tests are performed on dynamic multitask classification for NMNIST data. The results show that MT-SNN effectively learns multiple tasks by modifying its dynamics, namely, the spiking neurons' firing threshold.
    Learning Invariant Weights in Neural Networks. (arXiv:2202.12439v2 [stat.ML] UPDATED)
    Assumptions about invariances or symmetries in data can significantly increase the predictive power of statistical models. Many commonly used models in machine learning are constraint to respect certain symmetries in the data, such as translation equivariance in convolutional neural networks, and incorporation of new symmetry types is actively being studied. Yet, efforts to learn such invariances from the data itself remains an open research problem. It has been shown that marginal likelihood offers a principled way to learn invariances in Gaussian Processes. We propose a weight-space equivalent to this approach, by minimizing a lower bound on the marginal likelihood to learn invariances in neural networks resulting in naturally higher performing models.
    A comment on Guo et al. [arXiv:2206.11228]. (arXiv:2208.01456v1 [q-bio.NC])
    In a recent article, Guo et al. [arXiv:2206.11228] report that adversarially trained neural representations in deep networks may already be as robust as corresponding primate IT neural representations. While we find the paper's primary experiment illuminating, we have doubts about the interpretation and phrasing of the results presented in the paper.
    Improving Few-Shot Learning through Multi-task Representation Learning Theory. (arXiv:2010.01992v3 [cs.LG] UPDATED)
    In this paper, we consider the framework of multi-task representation (MTR) learning where the goal is to use source tasks to learn a representation that reduces the sample complexity of solving a target task. We start by reviewing recent advances in MTR theory and show that they can provide novel insights for popular meta-learning algorithms when analyzed within this framework. In particular, we highlight a fundamental difference between gradient-based and metric-based algorithms in practice and put forward a theoretical analysis to explain it. Finally, we use the derived insights to improve the performance of meta-learning methods via a new spectral-based regularization term and confirm its efficiency through experimental studies on few-shot classification benchmarks. To the best of our knowledge, this is the first contribution that puts the most recent learning bounds of MTR theory into practice for the task of few-shot classification.
    Neural Stochastic PDEs: Resolution-Invariant Learning of Continuous Spatiotemporal Dynamics. (arXiv:2110.10249v7 [cs.LG] UPDATED)
    Stochastic partial differential equations (SPDEs) are the mathematical tool of choice for modelling spatiotemporal PDE-dynamics under the influence of randomness. Based on the notion of mild solution of an SPDE, we introduce a novel neural architecture to learn solution operators of PDEs with (possibly stochastic) forcing from partially observed data. The proposed Neural SPDE model provides an extension to two popular classes of physics-inspired architectures. On the one hand, it extends Neural CDEs and variants -- continuous-time analogues of RNNs -- in that it is capable of processing incoming sequential information arriving at arbitrary spatial resolutions. On the other hand, it extends Neural Operators -- generalizations of neural networks to model mappings between spaces of functions -- in that it can parameterize solution operators of SPDEs depending simultaneously on the initial condition and a realization of the driving noise. By performing operations in the spectral domain, we show how a Neural SPDE can be evaluated in two ways, either by calling an ODE solver (emulating a spectral Galerkin scheme), or by solving a fixed point problem. Experiments on various semilinear SPDEs, including the stochastic Navier-Stokes equations, demonstrate how the Neural SPDE model is capable of learning complex spatiotemporal dynamics in a resolution-invariant way, with better accuracy and lighter training data requirements compared to alternative models, and up to 3 orders of magnitude faster than traditional solvers.
    Trimmed Maximum Likelihood Estimation for Robust Learning in Generalized Linear Models. (arXiv:2206.04777v2 [cs.LG] UPDATED)
    We study the problem of learning generalized linear models under adversarial corruptions. We analyze a classical heuristic called the iterative trimmed maximum likelihood estimator which is known to be effective against label corruptions in practice. Under label corruptions, we prove that this simple estimator achieves minimax near-optimal risk on a wide range of generalized linear models, including Gaussian regression, Poisson regression and Binomial regression. Finally, we extend the estimator to the more challenging setting of label and covariate corruptions and demonstrate its robustness and optimality in that setting as well.
    Low-complexity CNNs for Acoustic Scene Classification. (arXiv:2208.01555v1 [eess.AS])
    This technical report describes the SurreyAudioTeam22s submission for DCASE 2022 ASC Task 1, Low-Complexity Acoustic Scene Classification (ASC). The task has two rules, (a) the ASC framework should have maximum 128K parameters, and (b) there should be a maximum of 30 millions multiply-accumulate operations (MACs) per inference. In this report, we present low-complexity systems for ASC that follow the rules intended for the task.
    CIPCaD-Bench: Continuous Industrial Process datasets for benchmarking Causal Discovery methods. (arXiv:2208.01529v1 [cs.LG])
    Causal relationships are commonly examined in manufacturing processes to support faults investigations, perform interventions, and make strategic decisions. Industry 4.0 has made available an increasing amount of data that enable data-driven Causal Discovery (CD). Considering the growing number of recently proposed CD methods, it is necessary to introduce strict benchmarking procedures on publicly available datasets since they represent the foundation for a fair comparison and validation of different methods. This work introduces two novel public datasets for CD in continuous manufacturing processes. The first dataset employs the well-known Tennessee Eastman simulator for fault detection and process control. The second dataset is extracted from an ultra-processed food manufacturing plant, and it includes a description of the plant, as well as multiple ground truths. These datasets are used to propose a benchmarking procedure based on different metrics and evaluated on a wide selection of CD algorithms. This work allows testing CD methods in realistic conditions enabling the selection of the most suitable method for specific target applications. The datasets are available at the following link: https://github.com/giovanniMen
    s-LIME: Reconciling Locality and Fidelity in Linear Explanations. (arXiv:2208.01510v1 [cs.LG])
    The benefit of locality is one of the major premises of LIME, one of the most prominent methods to explain black-box machine learning models. This emphasis relies on the postulate that the more locally we look at the vicinity of an instance, the simpler the black-box model becomes, and the more accurately we can mimic it with a linear surrogate. As logical as this seems, our findings suggest that, with the current design of LIME, the surrogate model may degenerate when the explanation is too local, namely, when the bandwidth parameter $\sigma$ tends to zero. Based on this observation, the contribution of this paper is twofold. Firstly, we study the impact of both the bandwidth and the training vicinity on the fidelity and semantics of LIME explanations. Secondly, and based on our findings, we propose \slime, an extension of LIME that reconciles fidelity and locality.
    Politics, Sentiment and Virality: A Large-Scale Multilingual Twitter Analysis in Greece, Spain and United Kingdom. (arXiv:2202.00396v2 [cs.CL] UPDATED)
    Social media has become extremely influential when it comes to policy making in modern societies especially in the western world (e.g., 48% of Europeans use social media every day or almost every day). Platforms such as Twitter allow users to follow politicians, thus making citizens more involved in political discussion. In the same vein, politicians use Twitter to express their opinions, debate among others on current topics and promote their political agenda aiming to influence voter behaviour. Previous studies have shown that tweets conveying negative sentiment are likely to be retweeted more frequently. In this paper, we attempt to analyse tweets of politicians from different countries and explore whether their tweets follow the same trend. Utilising state-of-the-art pre-trained language models we performed sentiment analysis on hundreds of thousands of tweets collected from members of parliament of Greece, Spain and United Kingdom, including devolved administrations. We achieved this by systematically exploring and analysing the differences between influential and less popular tweets. Our analysis indicates that politicians' negatively charged tweets spread more widely, especially in more recent times, and highlights interesting trends in the intersection of sentiment and popularity.
    Word-level Text Highlighting of Medical Texts for Telehealth Services. (arXiv:2105.10400v2 [cs.LG] UPDATED)
    The medical domain is often subject to information overload. The digitization of healthcare, constant updates to online medical repositories, and increasing availability of biomedical datasets make it challenging to effectively analyze the data. This creates additional work for medical professionals who are heavily dependent on medical data to complete their research and consult their patients. This paper aims to show how different text highlighting techniques can capture relevant medical context. This would reduce the doctors' cognitive load and response time to patients by facilitating them in making faster decisions, thus improving the overall quality of online medical services. Three different word-level text highlighting methodologies are implemented and evaluated. The first method uses TF-IDF scores directly to highlight important parts of the text. The second method is a combination of TF-IDF scores and the application of Local Interpretable Model-Agnostic Explanations to classification models. The third method uses neural networks directly to make predictions on whether or not a word should be highlighted. The results of our experiments show that the neural network approach is successful in highlighting medically-relevant terms and its performance is improved as the size of the input segment increases.
    A Survey of Natural Language Generation. (arXiv:2112.11739v2 [cs.CL] UPDATED)
    This paper offers a comprehensive review of the research on Natural Language Generation (NLG) over the past two decades, especially in relation to data-to-text generation and text-to-text generation deep learning methods, as well as new applications of NLG technology. This survey aims to (a) give the latest synthesis of deep learning research on the NLG core tasks, as well as the architectures adopted in the field; (b) detail meticulously and comprehensively various NLG tasks and datasets, and draw attention to the challenges in NLG evaluation, focusing on different evaluation methods and their relationships; (c) highlight some future emphasis and relatively recent research issues that arise due to the increasing synergy between NLG and other artificial intelligence areas, such as computer vision, text and computational creativity.
    IterMiUnet: A lightweight architecture for automatic blood vessel segmentation. (arXiv:2208.01485v1 [eess.IV])
    The automatic segmentation of blood vessels in fundus images can help analyze the condition of retinal vasculature, which is crucial for identifying various systemic diseases like hypertension, diabetes, etc. Despite the success of Deep Learning-based models in this segmentation task, most of them are heavily parametrized and thus have limited use in practical applications. This paper proposes IterMiUnet, a new lightweight convolution-based segmentation model that requires significantly fewer parameters and yet delivers performance similar to existing models. The model makes use of the excellent segmentation capabilities of Iternet architecture but overcomes its heavily parametrized nature by incorporating the encoder-decoder structure of MiUnet model within it. Thus, the new model reduces parameters without any compromise with the network's depth, which is necessary to learn abstract hierarchical concepts in deep models. This lightweight segmentation model speeds up training and inference time and is potentially helpful in the medical domain where data is scarce and, therefore, heavily parametrized models tend to overfit. The proposed model was evaluated on three publicly available datasets: DRIVE, STARE, and CHASE-DB1. Further cross-training and inter-rater variability evaluations have also been performed. The proposed model has a lot of potential to be utilized as a tool for the early diagnosis of many diseases.
    Lossy compression of multidimensional medical images using sinusoidal activation networks: an evaluation study. (arXiv:2208.01602v1 [eess.IV])
    In this work, we evaluate how neural networks with periodic activation functions can be leveraged to reliably compress large multidimensional medical image datasets, with proof-of-concept application to 4D diffusion-weighted MRI (dMRI). In the medical imaging landscape, multidimensional MRI is a key area of research for developing biomarkers that are both sensitive and specific to the underlying tissue microstructure. However, the high-dimensional nature of these data poses a challenge in terms of both storage and sharing capabilities and associated costs, requiring appropriate algorithms able to represent the information in a low-dimensional space. Recent theoretical developments in deep learning have shown how periodic activation functions are a powerful tool for implicit neural representation of images and can be used for compression of 2D images. Here we extend this approach to 4D images and show how any given 4D dMRI dataset can be accurately represented through the parameters of a sinusoidal activation network, achieving a data compression rate about 10 times higher than the standard DEFLATE algorithm. Our results show that the proposed approach outperforms benchmark ReLU and Tanh activation perceptron architectures in terms of mean squared error, peak signal-to-noise ratio and structural similarity index. Subsequent analyses using the tensor and spherical harmonics representations demonstrate that the proposed lossy compression reproduces accurately the characteristics of the original data, leading to relative errors about 5 to 10 times lower than the benchmark JPEG2000 lossy compression and similar to standard pre-processing steps such as MP-PCA denosing, suggesting a loss of information within the currently accepted levels for clinical application.
    The Curse of Low Task Diversity: On the Failure of Transfer Learning to Outperform MAML and Their Empirical Equivalence. (arXiv:2208.01545v1 [cs.LG])
    Recently, it has been observed that a transfer learning solution might be all we need to solve many few-shot learning benchmarks -- thus raising important questions about when and how meta-learning algorithms should be deployed. In this paper, we seek to clarify these questions by 1. proposing a novel metric -- the diversity coefficient -- to measure the diversity of tasks in a few-shot learning benchmark and 2. by comparing Model-Agnostic Meta-Learning (MAML) and transfer learning under fair conditions (same architecture, same optimizer, and all models trained to convergence). Using the diversity coefficient, we show that the popular MiniImageNet and CIFAR-FS few-shot learning benchmarks have low diversity. This novel insight contextualizes claims that transfer learning solutions are better than meta-learned solutions in the regime of low diversity under a fair comparison. Specifically, we empirically find that a low diversity coefficient correlates with a high similarity between transfer learning and MAML learned solutions in terms of accuracy at meta-test time and classification layer similarity (using feature based distance metrics like SVCCA, PWCCA, CKA, and OPD). To further support our claim, we find this meta-test accuracy holds even as the model size changes. Therefore, we conclude that in the low diversity regime, MAML and transfer learning have equivalent meta-test performance when both are compared fairly. We also hope our work inspires more thoughtful constructions and quantitative evaluations of meta-learning benchmarks in the future.
    Cadence: A Practical Time-series Partitioning Algorithm for Unlabeled IoT Sensor Streams. (arXiv:2112.03360v2 [cs.LG] UPDATED)
    Timeseries partitioning is an essential step in most machine-learning driven, sensor-based IoT applications. This paper introduces a sample-efficient, robust, time-series segmentation model and algorithm. We show that by learning a representation specifically with the segmentation objective based on maximum mean discrepancy (MMD), our algorithm can robustly detect time-series events across different applications. Our loss function allows us to infer whether consecutive sequences of samples are drawn from the same distribution (null hypothesis) and determines the change-point between pairs that reject the null hypothesis (i.e., come from different distributions). We demonstrate its applicability in a real-world IoT deployment for ambient-sensing based activity recognition. Moreover, while many works on change-point detection exist in the literature, our model is significantly simpler and can be fully trained in 9-93 seconds on average with little variation in hyperparameters for data across different applications. We empirically evaluate Cadence on four popular change point detection (CPD) datasets where Cadence matches or outperforms existing CPD techniques.
    Enabling scalable clinical interpretation of ML-based phenotypes using real world data. (arXiv:2208.01607v1 [cs.LG])
    The availability of large and deep electronic healthcare records (EHR) datasets has the potential to enable a better understanding of real-world patient journeys, and to identify novel subgroups of patients. ML-based aggregation of EHR data is mostly tool-driven, i.e., building on available or newly developed methods. However, these methods, their input requirements, and, importantly, resulting output are frequently difficult to interpret, especially without in-depth data science or statistical training. This endangers the final step of analysis where an actionable and clinically meaningful interpretation is needed.This study investigates approaches to perform patient stratification analysis at scale using large EHR datasets and multiple clustering methods for clinical research. We have developed several tools to facilitate the clinical evaluation and interpretation of unsupervised patient stratification results, namely pattern screening, meta clustering, surrogate modeling, and curation. These tools can be used at different stages within the analysis. As compared to a standard analysis approach, we demonstrate the ability to condense results and optimize analysis time. In the case of meta clustering, we demonstrate that the number of patient clusters can be reduced from 72 to 3 in one example. In another stratification result, by using surrogate models, we could quickly identify that heart failure patients were stratified if blood sodium measurements were available. As this is a routine measurement performed for all patients with heart failure, this indicated a data bias. By using further cohort and feature curation, these patients and other irrelevant features could be removed to increase the clinical meaningfulness. These examples show the effectiveness of the proposed methods and we hope to encourage further research in this field.
    "This is my unicorn, Fluffy": Personalizing frozen vision-language representations. (arXiv:2204.01694v3 [cs.CV] UPDATED)
    Large Vision & Language models pretrained on web-scale data provide representations that are invaluable for numerous V&L problems. However, it is unclear how they can be used for reasoning about user-specific visual concepts in unstructured language. This problem arises in multiple domains, from personalized image retrieval to personalized interaction with smart devices. We introduce a new learning setup called Personalized Vision & Language (PerVL) with two new benchmark datasets for retrieving and segmenting user-specific "personalized" concepts "in the wild". In PerVL, one should learn personalized concepts (1) independently of the downstream task (2) allowing a pretrained model to reason about them with free language, and (3) does not require personalized negative examples. We propose an architecture for solving PerVL that operates by extending the input vocabulary of a pretrained model with new word embeddings for the new personalized concepts. The model can then reason about them by simply using them in a sentence. We demonstrate that our approach learns personalized visual concepts from a few examples and can effectively apply them in image retrieval and semantic segmentation using rich textual queries.
    Self-supervised Group Meiosis Contrastive Learning for EEG-Based Emotion Recognition. (arXiv:2208.00877v2 [eess.SP] UPDATED)
    The progress of EEG-based emotion recognition has received widespread attention from the fields of human-machine interactions and cognitive science in recent years. However, how to recognize emotions with limited labels has become a new research and application bottleneck. To address the issue, this paper proposes a Self-supervised Group Meiosis Contrastive learning framework (SGMC) based on the stimuli consistent EEG signals in human being. In the SGMC, a novel genetics-inspired data augmentation method, named Meiosis, is developed. It takes advantage of the alignment of stimuli among the EEG samples in a group for generating augmented groups by pairing, cross exchanging, and separating. And the model adopts a group projector to extract group-level feature representations from group EEG samples triggered by the same emotion video stimuli. Then contrastive learning is employed to maximize the similarity of group-level representations of augmented groups with the same stimuli. The SGMC achieves the state-of-the-art emotion recognition results on the publicly available DEAP dataset with an accuracy of 94.72% and 95.68% in valence and arousal dimensions, and also reaches competitive performance on the public SEED dataset with an accuracy of 94.04%. It is worthy of noting that the SGMC shows significant performance even when using limited labels. Moreover, the results of feature visualization suggest that the model might have learned video-level emotion-related feature representations to improve emotion recognition. And the effects of group size are further evaluated in the hyper parametric analysis. Finally, a control experiment and ablation study are carried out to examine the rationality of architecture. The code is provided publicly online.
    Stochastic Deep Networks with Linear Competing Units for Model-Agnostic Meta-Learning. (arXiv:2208.01573v1 [cs.LG])
    This work addresses meta-learning (ML) by considering deep networks with stochastic local winner-takes-all (LWTA) activations. This type of network units results in sparse representations from each model layer, as the units are organized into blocks where only one unit generates a non-zero output. The main operating principle of the introduced units rely on stochastic principles, as the network performs posterior sampling over competing units to select the winner. Therefore, the proposed networks are explicitly designed to extract input data representations of sparse stochastic nature, as opposed to the currently standard deterministic representation paradigm. Our approach produces state-of-the-art predictive accuracy on few-shot image classification and regression experiments, as well as reduced predictive error on an active learning setting; these improvements come with an immensely reduced computational cost.
    Systematically and efficiently improving existing $k$-means initialization algorithms by pairwise-nearest-neighbor smoothing. (arXiv:2202.03949v2 [cs.LG] UPDATED)
    We present a meta-method for initializing (seeding) the $k$-means clustering algorithm called PNN-smoothing. It consists in splitting a given dataset into $J$ random subsets, clustering each of them individually, and merging the resulting clusterings with the pairwise-nearest-neighbor (PNN) method. It is a meta-method in the sense that when clustering the individual subsets any seeding algorithm can be used. If the computational complexity of that seeding algorithm is linear in the size of the data $N$ and the number of clusters $k$, PNN-smoothing is also almost linear with an appropriate choice of $J$, and quite competitive in practice. We show empirically, using several existing seeding methods and testing on several synthetic and real datasets, that this procedure results in systematically better costs. Our implementation is publicly available at https://github.com/carlobaldassi/KMeansPNNSmoothing.jl.
    Data-Driven Discovery of Molecular Photoswitches with Multioutput Gaussian Processes. (arXiv:2008.03226v2 [physics.chem-ph] UPDATED)
    Photoswitchable molecules display two or more isomeric forms that may be accessed using light. Separating the electronic absorption bands of these isomers is key to selectively addressing a specific isomer and achieving high photostationary states whilst overall red-shifting the absorption bands serves to limit material damage due to UV-exposure and increases penetration depth in photopharmacological applications. Engineering these properties into a system through synthetic design however, remains a challenge. Here, we present a data-driven discovery pipeline for molecular photoswitches underpinned by dataset curation and multitask learning with Gaussian processes. In the prediction of electronic transition wavelengths, we demonstrate that a multioutput Gaussian process (MOGP) trained using labels from four photoswitch transition wavelengths yields the strongest predictive performance relative to single-task models as well as operationally outperforming time-dependent density functional theory (TD-DFT) in terms of the wall-clock time for prediction. We validate our proposed approach experimentally by screening a library of commercially available photoswitchable molecules. Through this screen, we identified several motifs that displayed separated electronic absorption bands of their isomers, exhibited red-shifted absorptions, and are suited for information transfer and photopharmacological applications. Our curated dataset, code, as well as all models are made available at https://github.com/Ryan-Rhys/The-Photoswitch-Dataset
    Anti-Neuron Watermarking: Protecting Personal Data Against Unauthorized Neural Networks. (arXiv:2109.09023v2 [cs.CR] UPDATED)
    We study protecting a user's data (images in this work) against a learner's unauthorized use in training neural networks. It is especially challenging when the user's data is only a tiny percentage of the learner's complete training set. We revisit the traditional watermarking under modern deep learning settings to tackle the challenge. We show that when a user watermarks images using a specialized linear color transformation, a neural network classifier will be imprinted with the signature so that a third-party arbitrator can verify the potentially unauthorized usage of the user data by inferring the watermark signature from the neural network. We also discuss what watermarking properties and signature spaces make the arbitrator's verification convincing. To our best knowledge, this work is the first to protect an individual user's data ownership from unauthorized use in training neural networks.
    CASS: Cross Architectural Self-Supervision for Medical Image Analysis. (arXiv:2206.04170v4 [cs.CV] UPDATED)
    Recent advances in deep learning and computer vision have reduced many barriers to automated medical image analysis, allowing algorithms to process label-free images and improve performance. Specifically, Transformers provide a global perspective of the image, that Convolutional Neural Networks (CNNs) inherently lack. Here we present Cross Architectural - Self Supervision, a novel self-supervised learning approach that leverages Transformer and CNN simultaneously. Compared to the existing state of the art self-supervised learning approaches, we empirically showed that CASS trained CNNs, and Transformers across three diverse datasets gained an average of 8.5% with 100% labelled data, 7.3% with 10% labelled data, and 11.5% with 1% labelled data. Notably, one of the test datasets comprised of histopathology slides of an autoimmune disease, a condition with minimal data that has been underrepresented in medical imaging. In addition, our findings revealed that CASS is also more robust than the existing state of the art self-supervised methods. The code is open source and is available on GitHub.
    A Unifying Framework for Combining Complementary Strengths of Humans and ML toward Better Predictive Decision-Making. (arXiv:2204.10806v2 [cs.HC] UPDATED)
    Hybrid human-ML systems are increasingly in charge of consequential decisions in a wide range of domains. A growing body of empirical and theoretical work has advanced our understanding of these systems. However, existing empirical results are mixed, and theoretical proposals are often mutually incompatible. In this work, we propose a unifying framework for understanding conditions under which combining the complementary strengths of humans and ML leads to higher quality decisions than those produced by each of them individually -- a state which we refer to as human-ML complementarity. We focus specifically on the context of human-ML predictive decision-making and investigate optimal ways of combining human and ML predictive decisions, accounting for the underlying sources of variation in their judgments. Within this scope, we present two crucial contributions. First, taking a computational perspective of decision-making and drawing upon prior literature in psychology, machine learning, and human-computer interaction, we introduce a taxonomy characterizing a wide range of criteria across which human and machine decision-making differ. Second, formalizing our taxonomy allows us to study how human and ML predictive decisions should be aggregated optimally. We show that our proposed framework encompasses several existing models of human-ML complementarity as special cases. Last but not least, an initial exploratory analysis of our framework presents a critical insight for future work in human-ML complementarity: the mechanism by which we combine human and ML judgments should be informed by the underlying causes of divergence in their decisions.
    Spiking Graph Convolutional Networks. (arXiv:2205.02767v2 [cs.LG] UPDATED)
    Graph Convolutional Networks (GCNs) achieve an impressive performance due to the remarkable representation ability in learning the graph information. However, GCNs, when implemented on a deep network, require expensive computation power, making them difficult to be deployed on battery-powered devices. In contrast, Spiking Neural Networks (SNNs), which perform a bio-fidelity inference process, offer an energy-efficient neural architecture. In this work, we propose SpikingGCN, an end-to-end framework that aims to integrate the embedding of GCNs with the biofidelity characteristics of SNNs. The original graph data are encoded into spike trains based on the incorporation of graph convolution. We further model biological information processing by utilizing a fully connected layer combined with neuron nodes. In a wide range of scenarios (e.g. citation networks, image graph classification, and recommender systems), our experimental results show that the proposed method could gain competitive performance against state-of-the-art approaches. Furthermore, we show that SpikingGCN on a neuromorphic chip can bring a clear advantage of energy efficiency into graph data analysis, which demonstrates its great potential to construct environment-friendly machine learning models.
    GINK: Graph-based Interaction-aware Kinodynamic Planning via Reinforcement Learning for Autonomous Driving. (arXiv:2206.01488v2 [cs.RO] UPDATED)
    Applying reinforcement learning to autonomous driving entails certain challenges, primarily due to massive traffic flows, which change dynamically. To address such challenges, it is necessary to quickly determine response strategies to the changing intentions of surrounding vehicles. Accordingly, we propose a new policy optimization method for safe driving using graph-based interaction-aware constraints. In this framework, the motion prediction and control modules are trained simultaneously, while sharing a latent representation that contains a social context. Further, to reflect social interactions, we express the movements of agents in the graph form and filter the features. This helps preserve the spatiotemporal locality of adjacent nodes. Furthermore, we create feedback loops to combine these two modules effectively. As a result, this approach encourages the learned controller to be safe from dynamic risks and also renders the motion prediction robust under various situations. In the experiment, we set up a navigation scenario comprising various situations, with CARLA, an urban driving simulator. The experiments show state-of-the-art performance on the sides of both navigation strategy and motion prediction compared to the baselines.
    Unsupervised machine learning framework for discriminating major variants of concern during COVID-19. (arXiv:2208.01439v1 [q-bio.OT])
    Due to the rapid evolution of the SARS-CoV-2 (COVID-19) virus, a number of mutations emerged with variants such as Alpha, Gamma, Delta and Omicron which created massive impact to the world economy. Unsupervised machine learning methods have the ability to compresses, characterize and visualises unlabelled data. In this paper, we present a framework that utilizes unsupervised machine learning methods that includes combination of selected dimensional reduction and clustering methods to discriminate and visualise the associations with the major COVID-19 variants based on genome sequences. The framework utilises k-mer analysis for processing the genome (RNA) sequences and compares different dimensional reduction methods, that include principal component analysis (PCA), and t-distributed stochastic neighbour embedding (t-SNE), and uniform manifold approximation projection (UMAP). Furthermore, the framework employs agglomerative hierarchical clustering methods and provides a visualisation using a dendogram. We find that the proposed framework can effectively distinguish the major variants and hence can be used for distinguishing emerging variants in the future.
    Gaussian Control Barrier Functions : A Non-Parametric Paradigm to Safety. (arXiv:2203.15474v2 [eess.SY] UPDATED)
    Inspired by the success of control barrier functions (CBFs) in addressing safety, and the rise of data-driven techniques for modeling functions, we propose a non-parametric approach for online synthesis of CBFs using Gaussian Processes (GPs). Mathematical constructs such as CBFs have achieved safety by designing a candidate function a priori. However, designing such a candidate function can be challenging. A practical example of such a setting would be to design a CBF in a disaster recovery scenario where safe and navigable regions need to be determined. The decision boundary for safety in such an example is unknown and cannot be designed a priori. In our approach, we work with safety samples or observations to construct the CBF online by assuming a flexible GP prior on these samples, and term our formulation as a Gaussian CBF. GPs have favorable properties, in addition to being non-parametric, such as analytical tractability and robust uncertainty estimation. This allows realizing the posterior components with high safety guarantees by incorporating variance estimation, while also computing associated partial derivatives in closed-form to achieve safe control. Moreover, the synthesized safety function from our approach allows changing the corresponding safe set arbitrarily based on the data, thus allowing non-convex safe sets. We validate our approach experimentally on a quadrotor by demonstrating safe control for fixed but arbitrary safe sets and collision avoidance where the safe set is constructed online. Finally, we juxtapose Gaussian CBFs with regular CBFs in the presence of noisy states to highlight its flexibility and robustness to noise. The experiment video can be seen at: https://youtu.be/HX6uokvCiGk.
    Cluster Weighted Model Based on TSNE algorithm for High-Dimensional Data. (arXiv:2208.01579v1 [stat.ML])
    Similar to many Machine Learning models, both accuracy and speed of the Cluster weighted models (CWMs) can be hampered by high-dimensional data, leading to previous works on a parsimonious technique to reduce the effect of "Curse of dimensionality" on mixture models. In this work, we review the background study of the cluster weighted models (CWMs). We further show that parsimonious technique is not sufficient for mixture models to thrive in the presence of huge high-dimensional data. We discuss a heuristic for detecting the hidden components by choosing the initial values of location parameters using the default values in the "FlexCWM" R package. We introduce a dimensionality reduction technique called T-distributed stochastic neighbor embedding (TSNE) to enhance the parsimonious CWMs in high-dimensional space. Originally, CWMs are suited for regression but for classification purposes, all multi-class variables are transformed logarithmically with some noise. The parameters of the model are obtained via expectation maximization algorithm. The effectiveness of the discussed technique is demonstrated using real data sets from different fields.
    Deep residential representations: Using unsupervised learning to unlock elevation data for geo-demographic prediction. (arXiv:2112.01421v2 [cs.LG] UPDATED)
    LiDAR (short for "Light Detection And Ranging" or "Laser Imaging, Detection, And Ranging") technology can be used to provide detailed three-dimensional elevation maps of urban and rural landscapes. To date, airborne LiDAR imaging has been predominantly confined to the environmental and archaeological domains. However, the geographically granular and open-source nature of this data also lends itself to an array of societal, organizational and business applications where geo-demographic type data is utilised. Arguably, the complexity involved in processing this multi-dimensional data has thus far restricted its broader adoption. In this paper, we propose a series of convenient task-agnostic tile elevation embeddings to address this challenge, using recent advances from unsupervised Deep Learning. We test the potential of our embeddings by predicting seven English indices of deprivation (2019) for small geographies in the Greater London area. These indices cover a range of socio-economic outcomes and serve as a proxy for a wide variety of downstream tasks to which the embeddings can be applied. We consider the suitability of this data not just on its own but also as an auxiliary source of data in combination with demographic features, thus providing a realistic use case for the embeddings. Having trialled various model/embedding configurations, we find that our best performing embeddings lead to Root-Mean-Squared-Error (RMSE) improvements of up to 21% over using standard demographic features alone. We also demonstrate how our embedding pipeline, using Deep Learning combined with K-means clustering, produces coherent tile segments which allow the latent embedding features to be interpreted.
    How to Learn from Risk: Explicit Risk-Utility Reinforcement Learning for Efficient and Safe Driving Strategies. (arXiv:2203.08409v2 [cs.LG] UPDATED)
    Autonomous driving has the potential to revolutionize mobility and is hence an active area of research. In practice, the behavior of autonomous vehicles must be acceptable, i.e., efficient, safe, and interpretable. While vanilla reinforcement learning (RL) finds performant behavioral strategies, they are often unsafe and uninterpretable. Safety is introduced through Safe RL approaches, but they still mostly remain uninterpretable as the learned behaviour is jointly optimized for safety and performance without modeling them separately. Interpretable machine learning is rarely applied to RL. This paper proposes SafeDQN, which allows to make the behavior of autonomous vehicles safe and interpretable while still being efficient. SafeDQN offers an understandable, semantic trade-off between the expected risk and the utility of actions while being algorithmically transparent. We show that SafeDQN finds interpretable and safe driving policies for a variety of scenarios and demonstrate how state-of-the-art saliency techniques can help to assess both risk and utility.
    PAN: Pulse Ansatz on NISQ Machines. (arXiv:2208.01215v1 [quant-ph])
    Variational quantum algorithms (VQAs) have demonstrated great potentials in the NISQ era. In the workflow of VQA, the parameters of ansatz are iteratively updated to approximate the desired quantum states. We have seen various efforts to draft better ansatz with less gates. In quantum computers, the gate ansatz will eventually be transformed into control signals such as microwave pulses on transmons. And the control pulses need elaborate calibration to minimize the errors such as over-rotation and under-rotation. In the case of VQAs, this procedure will introduce redundancy, but the variational properties of VQAs can naturally handle problems of over-rotation and under-rotation by updating the amplitude and frequency parameters. Therefore, we propose PAN, a native-pulse ansatz generator framework for VQAs. We generate native-pulse ansatz with trainable parameters for amplitudes and frequencies. In our proposed PAN, we are tuning parametric pulses, which are natively supported on NISQ computers. Considering that parameter-shift rules do not hold for native-pulse ansatz, we need to deploy non-gradient optimizers. To constrain the number of parameters sent to the optimizer, we adopt a progressive way to generate our native-pulse ansatz. Experiments are conducted on both simulators and quantum devices to validate our methods. When adopted on NISQ machines, PAN obtained improved the performance with decreased latency by an average of 86%. PAN is able to achieve 99.336% and 96.482% accuracy for VQE tasks on H2 and HeH+ respectively, even with considerable noises in NISQ machines.
    Predicting Future Mosquito Habitats Using Time Series Climate Forecasting and Deep Learning. (arXiv:2208.01436v1 [cs.LG])
    Mosquito habitat ranges are projected to expand due to climate change. This investigation aims to identify future mosquito habitats by analyzing preferred ecological conditions of mosquito larvae. After assembling a data set with atmospheric records and larvae observations, a neural network is trained to predict larvae counts from ecological inputs. Time series forecasting is conducted on these variables and climate projections are passed into the initial deep learning model to generate location-specific larvae abundance predictions. The results support the notion of regional ecosystem-driven changes in mosquito spread, with high-elevation regions in particular experiencing an increase in susceptibility to mosquito infestation.
    Prompt-to-Prompt Image Editing with Cross Attention Control. (arXiv:2208.01626v1 [cs.CV])
    Recent large-scale text-driven synthesis models have attracted much attention thanks to their remarkable capabilities of generating highly diverse images that follow given text prompts. Such text-based synthesis methods are particularly appealing to humans who are used to verbally describe their intent. Therefore, it is only natural to extend the text-driven image synthesis to text-driven image editing. Editing is challenging for these generative models, since an innate property of an editing technique is to preserve most of the original image, while in the text-based models, even a small modification of the text prompt often leads to a completely different outcome. State-of-the-art methods mitigate this by requiring the users to provide a spatial mask to localize the edit, hence, ignoring the original structure and content within the masked region. In this paper, we pursue an intuitive prompt-to-prompt editing framework, where the edits are controlled by text only. To this end, we analyze a text-conditioned model in depth and observe that the cross-attention layers are the key to controlling the relation between the spatial layout of the image to each word in the prompt. With this observation, we present several applications which monitor the image synthesis by editing the textual prompt only. This includes localized editing by replacing a word, global editing by adding a specification, and even delicately controlling the extent to which a word is reflected in the image. We present our results over diverse images and prompts, demonstrating high-quality synthesis and fidelity to the edited prompts.
    Bayesian Variable Selection in a Million Dimensions. (arXiv:2208.01180v1 [stat.ME])
    Bayesian variable selection is a powerful tool for data analysis, as it offers a principled method for variable selection that accounts for prior information and uncertainty. However, wider adoption of Bayesian variable selection has been hampered by computational challenges, especially in difficult regimes with a large number of covariates P or non-conjugate likelihoods. To scale to the large P regime we introduce an efficient MCMC scheme whose cost per iteration is sublinear in P. In addition we show how this scheme can be extended to generalized linear models for count data, which are prevalent in biology, ecology, economics, and beyond. In particular we design efficient algorithms for variable selection in binomial and negative binomial regression, which includes logistic regression as a special case. In experiments we demonstrate the effectiveness of our methods, including on cancer and maize genomic data.
    A Multifaceted Benchmarking of Synthetic Electronic Health Record Generation Models. (arXiv:2208.01230v1 [cs.LG])
    Synthetic health data have the potential to mitigate privacy concerns when sharing data to support biomedical research and the development of innovative healthcare applications. Modern approaches for data generation based on machine learning, generative adversarial networks (GAN) methods in particular, continue to evolve and demonstrate remarkable potential. Yet there is a lack of a systematic assessment framework to benchmark methods as they emerge and determine which methods are most appropriate for which use cases. In this work, we introduce a generalizable benchmarking framework to appraise key characteristics of synthetic health data with respect to utility and privacy metrics. We apply the framework to evaluate synthetic data generation methods for electronic health records (EHRs) data from two large academic medical centers with respect to several use cases. The results illustrate that there is a utility-privacy tradeoff for sharing synthetic EHR data. The results further indicate that no method is unequivocally the best on all criteria in each use case, which makes it evident why synthetic data generation methods need to be assessed in context.
    Variance-Aware Weight Initialization for Point Convolutional Neural Networks. (arXiv:2112.03777v2 [cs.CV] UPDATED)
    Appropriate weight initialization has been of key importance to successfully train neural networks. Recently, batch normalization has diminished the role of weight initialization by simply normalizing each layer based on batch statistics. Unfortunately, batch normalization has several drawbacks when applied to small batch sizes, as they are required to cope with memory limitations when learning on point clouds. While well-founded weight initialization strategies can render batch normalization unnecessary and thus avoid these drawbacks, no such approaches have been proposed for point convolutional networks. To fill this gap, we propose a framework to unify the multitude of continuous convolutions. This enables our main contribution, variance-aware weight initialization. We show that this initialization can avoid batch normalization while achieving similar and, in some cases, better performance.
    Bridging Differential Privacy and Byzantine-Robustness via Model Aggregation. (arXiv:2205.00107v2 [cs.LG] UPDATED)
    This paper aims at jointly addressing two seemly conflicting issues in federated learning: differential privacy (DP) and Byzantine-robustness, which are particularly challenging when the distributed data are non-i.i.d. (independent and identically distributed). The standard DP mechanisms add noise to the transmitted messages, and entangles with robust stochastic gradient aggregation to defend against Byzantine attacks. In this paper, we decouple the two issues via robust stochastic model aggregation, in the sense that our proposed DP mechanisms and the defense against Byzantine attacks have separated influence on the learning performance. Leveraging robust stochastic model aggregation, at each iteration, each worker calculates the difference between the local model and the global one, followed by sending the element-wise signs to the master node, which enables robustness to Byzantine attacks. Further, we design two DP mechanisms to perturb the uploaded signs for the purpose of privacy preservation, and prove that they are $(\epsilon,0)$-DP by exploiting the properties of noise distributions. With the tools of Moreau envelop and proximal point projection, we establish the convergence of the proposed algorithm when the cost function is nonconvex. We analyze the trade-off between privacy preservation and learning performance, and show that the influence of our proposed DP mechanisms is decoupled with that of robust stochastic model aggregation. Numerical experiments demonstrate the effectiveness of the proposed algorithm.
    Mutation Models: Learning to Generate Levels by Imitating Evolution. (arXiv:2206.05497v2 [cs.AI] UPDATED)
    Search-based procedural content generation (PCG) is a well-known method for level generation in games. Its key advantage is that it is generic and able to satisfy functional constraints. However, due to the heavy computational costs to run these algorithms online, search-based PCG is rarely utilized for real-time generation. In this paper, we introduce mutation models, a new type of iterative level generator based on machine learning. We train a model to imitate the evolutionary process and use the trained model to generate levels. This trained model is able to modify noisy levels sequentially to create better levels without the need for a fitness function during inference. We evaluate our trained models on a 2D maze generation task. We compare several different versions of the method: training the models either at the end of evolution (normal evolution) or every 100 generations (assisted evolution) and using the model as a mutation function during evolution. Using the assisted evolution process, the final trained models are able to generate mazes with a success rate of 99% and high diversity of 86%. The trained model is many times faster than the evolutionary process it was trained on. This work opens the door to a new way of learning level generators guided by an evolutionary process, meaning automatic creation of generators with specifiable constraints and objectives that are fast enough for runtime deployment in games.
    Classifying Unstructured Clinical Notes via Automatic Weak Supervision. (arXiv:2206.12088v2 [cs.CL] UPDATED)
    Healthcare providers usually record detailed notes of the clinical care delivered to each patient for clinical, research, and billing purposes. Due to the unstructured nature of these narratives, providers employ dedicated staff to assign diagnostic codes to patients' diagnoses using the International Classification of Diseases (ICD) coding system. This manual process is not only time-consuming but also costly and error-prone. Prior work demonstrated potential utility of Machine Learning (ML) methodology in automating this process, but it has relied on large quantities of manually labeled data to train the models. Additionally, diagnostic coding systems evolve with time, which makes traditional supervised learning strategies unable to generalize beyond local applications. In this work, we introduce a general weakly-supervised text classification framework that learns from class-label descriptions only, without the need to use any human-labeled documents. It leverages the linguistic domain knowledge stored within pre-trained language models and the data programming framework to assign code labels to individual texts. We demonstrate the efficacy and flexibility of our method by comparing it to state-of-the-art weak text classifiers across four real-world text classification datasets, in addition to assigning ICD codes to medical notes in the publicly available MIMIC-III database.
    Visual correspondence-based explanations improve AI robustness and human-AI team accuracy. (arXiv:2208.00780v2 [cs.CV] UPDATED)
    Explaining artificial intelligence (AI) predictions is increasingly important and even imperative in many high-stakes applications where humans are the ultimate decision-makers. In this work, we propose two novel architectures of self-interpretable image classifiers that first explain, and then predict (as opposed to post-hoc explanations) by harnessing the visual correspondences between a query image and exemplars. Our models consistently improve (by 1 to 4 points) on out-of-distribution (OOD) datasets while performing marginally worse (by 1 to 2 points) on in-distribution tests than ResNet-50 and a $k$-nearest neighbor classifier (kNN). Via a large-scale, human study on ImageNet and CUB, our correspondence-based explanations are found to be more useful to users than kNN explanations. Our explanations help users more accurately reject AI's wrong decisions than all other tested methods. Interestingly, for the first time, we show that it is possible to achieve complementary human-AI team accuracy (i.e., that is higher than either AI-alone or human-alone), in ImageNet and CUB image classification tasks.
    Mitigating Biases in Student Performance Prediction via Attention-Based Personalized Federated Learning. (arXiv:2208.01182v1 [cs.LG])
    Traditional learning-based approaches to student modeling generalize poorly to underrepresented student groups due to biases in data availability. In this paper, we propose a methodology for predicting student performance from their online learning activities that optimizes inference accuracy over different demographic groups such as race and gender. Building upon recent foundations in federated learning, in our approach, personalized models for individual student subgroups are derived from a global model aggregated across all student models via meta-gradient updates that account for subgroup heterogeneity. To learn better representations of student activity, we augment our approach with a self-supervised behavioral pretraining methodology that leverages multiple modalities of student behavior (e.g., visits to lecture videos and participation on forums), and include a neural network attention mechanism in the model aggregation stage. Through experiments on three real-world datasets from online courses, we demonstrate that our approach obtains substantial improvements over existing student modeling baselines in predicting student learning outcomes for all subgroups. Visual analysis of the resulting student embeddings confirm that our personalization methodology indeed identifies different activity patterns within different subgroups, consistent with its stronger inference ability compared with the baselines.
    Stochastic Primal-Dual Three Operator Splitting with Arbitrary Sampling and Preconditioning. (arXiv:2208.01631v1 [math.OC])
    In this work we propose a stochastic primal-dual preconditioned three-operator splitting algorithm for solving a class of convex three-composite optimization problems. Our proposed scheme is a direct three-operator splitting extension of the SPDHG algorithm [Chambolle et al. 2018]. We provide theoretical convergence analysis showing ergodic O(1/K) convergence rate, and demonstrate the effectiveness of our approach in imaging inverse problems.
    Compound Density Networks for Risk Prediction using Electronic Health Records. (arXiv:2208.01320v1 [cs.LG])
    Electronic Health Records (EHRs) exhibit a high amount of missing data due to variations of patient conditions and treatment needs. Imputation of missing values has been considered an effective approach to deal with this challenge. Existing work separates imputation method and prediction model as two independent parts of an EHR-based machine learning system. We propose an integrated end-to-end approach by utilizing a Compound Density Network (CDNet) that allows the imputation method and prediction model to be tuned together within a single framework. CDNet consists of a Gated recurrent unit (GRU), a Mixture Density Network (MDN), and a Regularized Attention Network (RAN). The GRU is used as a latent variable model to model EHR data. The MDN is designed to sample latent variables generated by GRU. The RAN serves as a regularizer for less reliable imputed values. The architecture of CDNet enables GRU and MDN to iteratively leverage the output of each other to impute missing values, leading to a more accurate and robust prediction. We validate CDNet on the mortality prediction task on the MIMIC-III dataset. Our model outperforms state-of-the-art models by significant margins. We also empirically show that regularizing imputed values is a key factor for superior prediction performance. Analysis of prediction uncertainty shows that our model can capture both aleatoric and epistemic uncertainties, which offers model users a better understanding of the model results.
    A Comparative Study on COVID-19 Fake News Detection Using Different Transformer Based Models. (arXiv:2208.01355v1 [cs.CL])
    The rapid advancement of social networks and the convenience of internet availability have accelerated the rampant spread of false news and rumors on social media sites. Amid the COVID 19 epidemic, this misleading information has aggravated the situation by putting peoples mental and physical lives in danger. To limit the spread of such inaccuracies, identifying the fake news from online platforms could be the first and foremost step. In this research, the authors have conducted a comparative analysis by implementing five transformer based models such as BERT, BERT without LSTM, ALBERT, RoBERTa, and a Hybrid of BERT & ALBERT in order to detect the fraudulent news of COVID 19 from the internet. COVID 19 Fake News Dataset has been used for training and testing the models. Among all these models, the RoBERTa model has performed better than other models by obtaining an F1 score of 0.98 in both real and fake classes.
    Replacing Backpropagation with Biological Plausible Top-down Credit Assignment in Deep Neural Networks Training. (arXiv:2208.01416v1 [cs.NE])
    Top-down connections in the biological brain has been shown to be important in high cognitive functions. However, the function of this mechanism in machine learning has not been defined clearly. In this study, we propose to lay out a framework constituted by a bottom-up and a top-down network. Here, we use a Top-down Credit Assignment Network (TDCA-network) to replace the loss function and back propagation (BP) which serve as the feedback mechanism in traditional bottom-up network training paradigm. Our results show that the credit given by well-trained TDCA-network outperforms the gradient from backpropagation in classification task under different settings on multiple datasets. In addition, we successfully use a credit diffusing trick, which can keep training and testing performance remain unchanged, to reduce parameter complexity of the TDCA-network. More importantly, by comparing their trajectories in the parameter landscape, we find that TDCA-network directly achieved a global optimum, in contrast to that backpropagation only can gain a localized optimum. Thus, our results demonstrate that TDCA-network not only provide a biological plausible learning mechanism, but also has the potential to directly achieve global optimum, indicating that top-down credit assignment can substitute backpropagation, and provide a better learning framework for Deep Neural Networks.
    Flood Prediction Using Machine Learning Models. (arXiv:2208.01234v1 [cs.LG])
    Floods are one of nature's most catastrophic calamities which cause irreversible and immense damage to human life, agriculture, infrastructure and socio-economic system. Several studies on flood catastrophe management and flood forecasting systems have been conducted. The accurate prediction of the onset and progression of floods in real time is challenging. To estimate water levels and velocities across a large area, it is necessary to combine data with computationally demanding flood propagation models. This paper aims to reduce the extreme risks of this natural disaster and also contributes to policy suggestions by providing a prediction for floods using different machine learning models. This research will use Binary Logistic Regression, K-Nearest Neighbor (KNN), Support Vector Classifier (SVC) and Decision tree Classifier to provide an accurate prediction. With the outcome, a comparative analysis will be conducted to understand which model delivers a better accuracy.
    GeoECG: Data Augmentation via Wasserstein Geodesic Perturbation for Robust Electrocardiogram Prediction. (arXiv:2208.01220v1 [stat.ML])
    There has been an increased interest in applying deep neural networks to automatically interpret and analyze the 12-lead electrocardiogram (ECG). The current paradigms with machine learning methods are often limited by the amount of labeled data. This phenomenon is particularly problematic for clinically-relevant data, where labeling at scale can be time-consuming and costly in terms of the specialized expertise and human effort required. Moreover, deep learning classifiers may be vulnerable to adversarial examples and perturbations, which could have catastrophic consequences, for example, when applied in the context of medical treatment, clinical trials, or insurance claims. In this paper, we propose a physiologically-inspired data augmentation method to improve performance and increase the robustness of heart disease detection based on ECG signals. We obtain augmented samples by perturbing the data distribution towards other classes along the geodesic in Wasserstein space. To better utilize domain-specific knowledge, we design a ground metric that recognizes the difference between ECG signals based on physiologically determined features. Learning from 12-lead ECG signals, our model is able to distinguish five categories of cardiac conditions. Our results demonstrate improvements in accuracy and robustness, reflecting the effectiveness of our data augmentation method.
    Are Cluster Validity Measures (In)valid?. (arXiv:2208.01261v1 [stat.ML])
    Internal cluster validity measures (such as the Calinski-Harabasz, Dunn, or Davies-Bouldin indices) are frequently used for selecting the appropriate number of partitions a dataset should be split into. In this paper we consider what happens if we treat such indices as objective functions in unsupervised learning activities. Is the optimal grouping with regards to, say, the Silhouette index really meaningful? It turns out that many cluster (in)validity indices promote clusterings that match expert knowledge quite poorly. We also introduce a new, well-performing variant of the Dunn index that is built upon OWA operators and the near-neighbour graph so that subspaces of higher density, regardless of their shapes, can be separated from each other better.
    Explicit Use of Fourier Spectrum in Generative Adversarial Networks. (arXiv:2208.01265v1 [cs.CV])
    Generative Adversarial Networks have got the researchers' attention due to their state-of-the-art performance in generating new images with only a dataset of the target distribution. It has been shown that there is a dissimilarity between the spectrum of authentic images and fake ones. Since the Fourier transform is a bijective mapping, saying that the model has a significant problem in learning the original distribution is a fair conclusion. In this work, we investigate the possible reasons for the mentioned drawback in the architecture and mathematical theory of the current GANs. Then we propose a new model to reduce the discrepancies between the spectrum of the actual and fake images. To that end, we design a brand new architecture for the frequency domain using the blueprint of geometric deep learning. Then, we experimentally show promising improvements in the quality of the generated images by considering the Fourier domain representation of the original data as a principal feature in the training process.
    UniRank: Unimodal Bandit Algorithm for Online Ranking. (arXiv:2208.01515v1 [cs.LG])
    We tackle a new emerging problem, which is finding an optimal monopartite matching in a weighted graph. The semi-bandit version, where a full matching is sampled at each iteration, has been addressed by \cite{ADMA}, creating an algorithm with an expected regret matching $O(\frac{L\log(L)}{\Delta}\log(T))$ with $2L$ players, $T$ iterations and a minimum reward gap $\Delta$. We reduce this bound in two steps. First, as in \cite{GRAB} and \cite{UniRank} we use the unimodality property of the expected reward on the appropriate graph to design an algorithm with a regret in $O(L\frac{1}{\Delta}\log(T))$. Secondly, we show that by moving the focus towards the main question `\emph{Is user $i$ better than user $j$?}' this regret becomes $O(L\frac{\Delta}{\tilde{\Delta}^2}\log(T))$, where $\Tilde{\Delta} > \Delta$ derives from a better way of comparing users. Some experimental results finally show these theoretical results are corroborated in practice.
    Graph-based Reinforcement Learning meets Mixed Integer Programs: An application to 3D robot assembly discovery. (arXiv:2203.04120v2 [cs.RO] UPDATED)
    Robot assembly discovery is a challenging problem that lives at the intersection of resource allocation and motion planning. The goal is to combine a predefined set of objects to form something new while considering task execution with the robot-in-the-loop. In this work, we tackle the problem of building arbitrary, predefined target structures entirely from scratch using a set of Tetris-like building blocks and a robotic manipulator. Our novel hierarchical approach aims at efficiently decomposing the overall task into three feasible levels that benefit mutually from each other. On the high level, we run a classical mixed-integer program for global optimization of block-type selection and the blocks' final poses to recreate the desired shape. Its output is then exploited to efficiently guide the exploration of an underlying reinforcement learning (RL) policy. This RL policy draws its generalization properties from a flexible graph-based representation that is learned through Q-learning and can be refined with search. Moreover, it accounts for the necessary conditions of structural stability and robotic feasibility that cannot be effectively reflected in the previous layer. Lastly, a grasp and motion planner transforms the desired assembly commands into robot joint movements. We demonstrate our proposed method's performance on a set of competitive simulated RAD environments, showcase real-world transfer, and report performance and robustness gains compared to an unstructured end-to-end approach. Videos are available at https://sites.google.com/view/rl-meets-milp .
    What can we Learn by Predicting Accuracy?. (arXiv:2208.01358v1 [cs.LG])
    This paper seeks to answer the following question: "What can we learn by predicting accuracy?" Indeed, classification is one of the most popular task in machine learning and many loss functions have been developed to maximize this non-differentiable objective. Unlike past work on loss function design, which was mostly guided by intuition and theory before being validated by experimentation, here we propose to approach this problem in the opposite way : we seek to extract knowledge from experiments. This data-driven approach is similar to that used in physics to discover general laws from data. We used a symbolic regression method to automatically find a mathematical expression that is highly correlated with the accuracy of a linear classifier. The formula discovered on more than 260 datasets has a Pearson correlation of 0.96 and a r2 of 0.93. More interestingly, this formula is highly explainable and confirms insights from various previous papers on loss design. We hope this work will open new perspectives in the search for new heuristics leading to a deeper understanding of machine learning theory.
    Detecting Individual Decision-Making Style: Exploring Behavioral Stylometry in Chess. (arXiv:2208.01366v1 [cs.AI])
    The advent of machine learning models that surpass human decision-making ability in complex domains has initiated a movement towards building AI systems that interact with humans. Many building blocks are essential for this activity, with a central one being the algorithmic characterization of human behavior. While much of the existing work focuses on aggregate human behavior, an important long-range goal is to develop behavioral models that specialize to individual people and can differentiate among them. To formalize this process, we study the problem of behavioral stylometry, in which the task is to identify a decision-maker from their decisions alone. We present a transformer-based approach to behavioral stylometry in the context of chess, where one attempts to identify the player who played a set of games. Our method operates in a few-shot classification framework, and can correctly identify a player from among thousands of candidate players with 98% accuracy given only 100 labeled games. Even when trained on amateur play, our method generalises to out-of-distribution samples of Grandmaster players, despite the dramatic differences between amateur and world-class players. Finally, we consider more broadly what our resulting embeddings reveal about human style in chess, as well as the potential ethical implications of powerful methods for identifying individuals from behavioral data.
    Approximate Bayesian Neural Operators: Uncertainty Quantification for Parametric PDEs. (arXiv:2208.01565v1 [cs.LG])
    Neural operators are a type of deep architecture that learns to solve (i.e. learns the nonlinear solution operator of) partial differential equations (PDEs). The current state of the art for these models does not provide explicit uncertainty quantification. This is arguably even more of a problem for this kind of tasks than elsewhere in machine learning, because the dynamical systems typically described by PDEs often exhibit subtle, multiscale structure that makes errors hard to spot by humans. In this work, we first provide a mathematically detailed Bayesian formulation of the ''shallow'' (linear) version of neural operators in the formalism of Gaussian processes. We then extend this analytic treatment to general deep neural operators using approximate methods from Bayesian deep learning. We extend previous results on neural operators by providing them with uncertainty quantification. As a result, our approach is able to identify cases, and provide structured uncertainty estimates, where the neural operator fails to predict well.
    Physics-informed Deep Super-resolution for Spatiotemporal Data. (arXiv:2208.01462v1 [cs.LG])
    High-fidelity simulation of complex physical systems is exorbitantly expensive and inaccessible across spatiotemporal scales. Recently, there has been an increasing interest in leveraging deep learning to augment scientific data based on the coarse-grained simulations, which is of cheap computational expense and retains satisfactory solution accuracy. However, the major existing work focuses on data-driven approaches which rely on rich training datasets and lack sufficient physical constraints. To this end, we propose a novel and efficient spatiotemporal super-resolution framework via physics-informed learning, inspired by the independence between temporal and spatial derivatives in partial differential equations (PDEs). The general principle is to leverage the temporal interpolation for flow estimation, and then introduce convolutional-recurrent neural networks for learning temporal refinement. Furthermore, we employ the stacked residual blocks with wide activation and sub-pixel layers with pixelshuffle for spatial reconstruction, where feature extraction is conducted in a low-resolution latent space. Moreover, we consider hard imposition of boundary conditions in the network to improve reconstruction accuracy. Results demonstrate the superior effectiveness and efficiency of the proposed method compared with baseline algorithms through extensive numerical experiments.
    Mobility-Aware Cooperative Caching in Vehicular Edge Computing Based on Asynchronous Federated and Deep Reinforcement Learning. (arXiv:2208.01219v1 [cs.DC])
    The vehicular edge computing (VEC) can cache contents in different RSUs at the network edge to support the real-time vehicular applications. In VEC, owing to the high-mobility characteristics of vehicles, it is necessary to cache the user data in advance and learn the most popular and interesting contents for vehicular users. Since user data usually contains privacy information, users are reluctant to share their data with others. To solve this problem, traditional federated learning (FL) needs to update the global model synchronously through aggregating all users' local models to protect users' privacy. However, vehicles may frequently drive out of the coverage area of the VEC before they achieve their local model trainings and thus the local models cannot be uploaded as expected, which would reduce the accuracy of the global model. In addition, the caching capacity of the local RSU is limited and the popular contents are diverse, thus the size of the predicted popular contents usually exceeds the cache capacity of the local RSU. Hence, the VEC should cache the predicted popular contents in different RSUs while considering the content transmission delay. In this paper, we consider the mobility of vehicles and propose a cooperative Caching scheme in the VEC based on Asynchronous Federated and deep Reinforcement learning (CAFR). We first consider the mobility of vehicles and propose an asynchronous FL algorithm to obtain an accurate global model, and then propose an algorithm to predict the popular contents based on the global model. In addition, we consider the mobility of vehicles and propose a deep reinforcement learning algorithm to obtain the optimal cooperative caching location for the predicted popular contents in order to optimize the content transmission delay. Extensive experimental results have demonstrated that the CAFR scheme outperforms other baseline caching schemes.
    Diffusion-Based Representation Learning. (arXiv:2105.14257v3 [cs.LG] UPDATED)
    Diffusion-based methods represented as stochastic differential equations on a continuous-time domain have recently proven successful as a non-adversarial generative model. Training such models relies on denoising score matching, which can be seen as multi-scale denoising autoencoders. Here, we augment the denoising score matching framework to enable representation learning without any supervised signal. GANs and VAEs learn representations by directly transforming latent codes to data samples. In contrast, the introduced diffusion-based representation learning relies on a new formulation of the denoising score matching objective and thus encodes the information needed for denoising. We illustrate how this difference allows for manual control of the level of details encoded in the representation. Using the same approach, we propose to learn an infinite-dimensional latent code that achieves improvements of state-of-the-art models on semi-supervised image classification. We also compare the quality of learned representations of diffusion score matching with other methods like autoencoder and contrastively trained systems through their performances on downstream tasks.
    Certified machine learning: A posteriori error estimation for physics-informed neural networks. (arXiv:2203.17055v3 [cs.LG] UPDATED)
    Physics-informed neural networks (PINNs) are one popular approach to incorporate a priori knowledge about physical systems into the learning framework. PINNs are known to be robust for smaller training sets, derive better generalization problems, and are faster to train. In this paper, we show that using PINNs in comparison with purely data-driven neural networks is not only favorable for training performance but allows us to extract significant information on the quality of the approximated solution. Assuming that the underlying differential equation for the PINN training is an ordinary differential equation, we derive a rigorous upper limit on the PINN prediction error. This bound is applicable even for input data not included in the training phase and without any prior knowledge about the true solution. Therefore, our a posteriori error estimation is an essential step to certify the PINN. We apply our error estimator exemplarily to two academic toy problems, whereof one falls in the category of model-predictive control and thereby shows the practical use of the derived results.
    AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model. (arXiv:2208.01448v1 [cs.CL])
    In this work, we demonstrate that multilingual large-scale sequence-to-sequence (seq2seq) models, pre-trained on a mixture of denoising and Causal Language Modeling (CLM) tasks, are more efficient few-shot learners than decoder-only models on various tasks. In particular, we train a 20 billion parameter multilingual seq2seq model called Alexa Teacher Model (AlexaTM 20B) and show that it achieves state-of-the-art (SOTA) performance on 1-shot summarization tasks, outperforming a much larger 540B PaLM decoder model. AlexaTM 20B also achieves SOTA in 1-shot machine translation, especially for low-resource languages, across almost all language pairs supported by the model (Arabic, English, French, German, Hindi, Italian, Japanese, Marathi, Portuguese, Spanish, Tamil, and Telugu) on Flores-101 dataset. We also show in zero-shot setting, AlexaTM 20B outperforms GPT3 (175B) on SuperGLUE and SQuADv2 datasets and provides SOTA performance on multilingual tasks such as XNLI, XCOPA, Paws-X, and XWinograd. Overall, our results present a compelling case for seq2seq models as a powerful alternative to decoder-only models for Large-scale Language Model (LLM) training.
    Fisher and Kernel Fisher Discriminant Analysis: Tutorial. (arXiv:1906.09436v2 [stat.ML] UPDATED)
    This is a detailed tutorial paper which explains the Fisher discriminant Analysis (FDA) and kernel FDA. We start with projection and reconstruction. Then, one- and multi-dimensional FDA subspaces are covered. Scatters in two- and then multi-classes are explained in FDA. Then, we discuss on the rank of the scatters and the dimensionality of the subspace. A real-life example is also provided for interpreting FDA. Then, possible singularity of the scatter is discussed to introduce robust FDA. PCA and FDA directions are also compared. We also prove that FDA and linear discriminant analysis are equivalent. Fisher forest is also introduced as an ensemble of fisher subspaces useful for handling data with different features and dimensionality. Afterwards, kernel FDA is explained for both one- and multi-dimensional subspaces with both two- and multi-classes. Finally, some simulations are performed on AT&T face dataset to illustrate FDA and compare it with PCA.
    Understanding the classes better with class-specific and rule-specific feature selection, and redundancy control in a fuzzy rule based framework. (arXiv:2208.01294v1 [cs.LG])
    Recently, several studies have claimed that using class-specific feature subsets provides certain advantages over using a single feature subset for representing the data for a classification problem. Unlike traditional feature selection methods, the class-specific feature selection methods select an optimal feature subset for each class. Typically class-specific feature selection (CSFS) methods use one-versus-all split of the data set that leads to issues such as class imbalance, decision aggregation, and high computational overhead. We propose a class-specific feature selection method embedded in a fuzzy rule-based classifier, which is free from the drawbacks associated with most existing class-specific methods. Additionally, our method can be adapted to control the level of redundancy in the class-specific feature subsets by adding a suitable regularizer to the learning objective. Our method results in class-specific rules involving class-specific subsets. We also propose an extension where different rules of a particular class are defined by different feature subsets to model different substructures within the class. The effectiveness of the proposed method has been validated through experiments on three synthetic data sets.
    A Deep Generative Model for Feasible and Diverse Population Synthesis. (arXiv:2208.01403v1 [stat.ML])
    An ideal synthetic population, a key input to activity-based models, mimics the distribution of the individual- and household-level attributes in the actual population. Since the entire population's attributes are generally unavailable, household travel survey (HTS) samples are used for population synthesis. Synthesizing population by directly sampling from HTS ignores the attribute combinations that are unobserved in the HTS samples but exist in the population, called 'sampling zeros'. A deep generative model (DGM) can potentially synthesize the sampling zeros but at the expense of generating 'structural zeros' (i.e., the infeasible attribute combinations that do not exist in the population). This study proposes a novel method to minimize structural zeros while preserving sampling zeros. Two regularizations are devised to customize the training of the DGM and applied to a generative adversarial network (GAN) and a variational autoencoder (VAE). The adopted metrics for feasibility and diversity of the synthetic population indicate the capability of generating sampling and structural zeros -- lower structural zeros and lower sampling zeros indicate the higher feasibility and the lower diversity, respectively. Results show that the proposed regularizations achieve considerable performance improvement in feasibility and diversity of the synthesized population over traditional models. The proposed VAE additionally generated 23.5% of the population ignored by the sample with 79.2% precision (i.e., 20.8% structural zeros rates), while the proposed GAN generated 18.3% of the ignored population with 89.0% precision. The proposed improvement in DGM generates a more feasible and diverse synthetic population, which is critical for the accuracy of an activity-based model.
    Effects of Graph Convolutions in Multi-layer Networks. (arXiv:2204.09297v2 [cs.LG] UPDATED)
    Graph Convolutional Networks (GCNs) are one of the most popular architectures that are used to solve classification problems accompanied by graphical information. We present a rigorous theoretical understanding of the effects of graph convolutions in multi-layer networks. We study these effects through the node classification problem of a non-linearly separable Gaussian mixture model coupled with a stochastic block model. First, we show that a single graph convolution expands the regime of the distance between the means where multi-layer networks can classify the data by a factor of at least $1/\sqrt[4]{\mathbb{E}{\rm deg}}$, where $\mathbb{E}{\rm deg}$ denotes the expected degree of a node. Second, we show that with a slightly stronger graph density, two graph convolutions improve this factor to at least $1/\sqrt[4]{n}$, where $n$ is the number of nodes in the graph. Finally, we provide both theoretical and empirical insights into the performance of graph convolutions placed in different combinations among the layers of a network, concluding that the performance is mutually similar for all combinations of the placement. We present extensive experiments on both synthetic and real-world data that illustrate our results.
    A Note on Zeroth-Order Optimization on the Simplex. (arXiv:2208.01185v1 [cs.LG])
    We construct a zeroth-order gradient estimator for a smooth function defined on the probability simplex. The proposed estimator queries the simplex only. We prove that projected gradient descent and the exponential weights algorithm, when run with this estimator instead of exact gradients, converge at a $\mathcal O(T^{-1/4})$ rate.
    An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion. (arXiv:2208.01618v1 [cs.CV])
    Text-to-image models offer unprecedented freedom to guide creation through natural language. Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes. In other words, we ask: how can we use language-guided models to turn our cat into a painting, or imagine a new product based on our favorite toy? Here we present a simple approach that allows such creative freedom. Using only 3-5 images of a user-provided concept, like an object or a style, we learn to represent it through new "words" in the embedding space of a frozen text-to-image model. These "words" can be composed into natural language sentences, guiding personalized creation in an intuitive way. Notably, we find evidence that a single word embedding is sufficient for capturing unique and varied concepts. We compare our approach to a wide range of baselines, and demonstrate that it can more faithfully portray the concepts across a range of applications and tasks. Our code, data and new words will be available at: https://textual-inversion.github.io
    Late Fusion Multi-view Clustering via Global and Local Alignment Maximization. (arXiv:2208.01198v1 [cs.LG])
    Multi-view clustering (MVC) optimally integrates complementary information from different views to improve clustering performance. Although demonstrating promising performance in various applications, most of existing approaches directly fuse multiple pre-specified similarities to learn an optimal similarity matrix for clustering, which could cause over-complicated optimization and intensive computational cost. In this paper, we propose late fusion MVC via alignment maximization to address these issues. To do so, we first reveal the theoretical connection of existing k-means clustering and the alignment between base partitions and the consensus one. Based on this observation, we propose a simple but effective multi-view algorithm termed LF-MVC-GAM. It optimally fuses multiple source information in partition level from each individual view, and maximally aligns the consensus partition with these weighted base ones. Such an alignment is beneficial to integrate partition level information and significantly reduce the computational complexity by sufficiently simplifying the optimization procedure. We then design another variant, LF-MVC-LAM to further improve the clustering performance by preserving the local intrinsic structure among multiple partition spaces. After that, we develop two three-step iterative algorithms to solve the resultant optimization problems with theoretically guaranteed convergence. Further, we provide the generalization error bound analysis of the proposed algorithms. Extensive experiments on eighteen multi-view benchmark datasets demonstrate the effectiveness and efficiency of the proposed LF-MVC-GAM and LF-MVC-LAM, ranging from small to large-scale data items. The codes of the proposed algorithms are publicly available at https://github.com/wangsiwei2010/latefusionalignment.
    SampleMatch: Drum Sample Retrieval by Musical Context. (arXiv:2208.01141v1 [cs.SD])
    Modern digital music production typically involves combining numerous acoustic elements to compile a piece of music. Important types of such elements are drum samples, which determine the characteristics of the percussive components of the piece. Artists must use their aesthetic judgement to assess whether a given drum sample fits the current musical context. However, selecting drum samples from a potentially large library is tedious and may interrupt the creative flow. In this work, we explore the automatic drum sample retrieval based on aesthetic principles learned from data. As a result, artists can rank the samples in their library by fit to some musical context at different stages of the production process (i.e., by fit to incomplete song mixtures). To this end, we use contrastive learning to maximize the score of drum samples originating from the same song as the mixture. We conduct a listening test to determine whether the human ratings match the automatic scoring function. We also perform objective quantitative analyses to evaluate the efficacy of our approach.
    VacciNet: Towards a Smart Framework for Learning the Distribution Chain Optimization of Vaccines for a Pandemic. (arXiv:2208.01112v1 [cs.LG])
    Vaccinations against viruses have always been the need of the hour since long past. However, it is hard to efficiently distribute the vaccines (on time) to all the corners of a country, especially during a pandemic. Considering the vastness of the population, diversified communities, and demands of a smart society, it is an important task to optimize the vaccine distribution strategy in any country/state effectively. Although there is a profusion of data (Big Data) from various vaccine administration sites that can be mined to gain valuable insights about mass vaccination drives, very few attempts has been made towards revolutionizing the traditional mass vaccination campaigns to mitigate the socio-economic crises of pandemic afflicted countries. In this paper, we bridge this gap in studies and experimentation. We collect daily vaccination data which is publicly available and carefully analyze it to generate meaning-full insights and predictions. We put forward a novel framework leveraging Supervised Learning and Reinforcement Learning (RL) which we call VacciNet, that is capable of learning to predict the demand of vaccination in a state of a country as well as suggest optimal vaccine allocation in the state for minimum cost of procurement and supply. At the present, our framework is trained and tested with vaccination data of the USA.
    Disparate Censorship & Undertesting: A Source of Label Bias in Clinical Machine Learning. (arXiv:2208.01127v1 [cs.LG])
    As machine learning (ML) models gain traction in clinical applications, understanding the impact of clinician and societal biases on ML models is increasingly important. While biases can arise in the labels used for model training, the many sources from which these biases arise are not yet well-studied. In this paper, we highlight disparate censorship (i.e., differences in testing rates across patient groups) as a source of label bias that clinical ML models may amplify, potentially causing harm. Many patient risk-stratification models are trained using the results of clinician-ordered diagnostic and laboratory tests of labels. Patients without test results are often assigned a negative label, which assumes that untested patients do not experience the outcome. Since orders are affected by clinical and resource considerations, testing may not be uniform in patient populations, giving rise to disparate censorship. Disparate censorship in patients of equivalent risk leads to undertesting in certain groups, and in turn, more biased labels for such groups. Using such biased labels in standard ML pipelines could contribute to gaps in model performance across patient groups. Here, we theoretically and empirically characterize conditions in which disparate censorship or undertesting affect model performance across subgroups. Our findings call attention to disparate censorship as a source of label bias in clinical ML models.
    Optimizing Mixture of Experts using Dynamic Recompilations. (arXiv:2205.01848v2 [cs.LG] UPDATED)
    The Mixture of Experts architecture allows for outrageously large neural networks by scaling model parameter size independently from computational demand (FLOPs). However, current DNN frameworks cannot effectively support the dynamic data flow in Mixture of Experts, and implementations on top of these frameworks need to use workarounds that introduce significant overheads. To address the limitation of these frameworks, we present DynaMoE, a DNN library that uses dynamic recompilations to optimize and adapt the use of computational resources to the dynamic needs of Mixture of Experts models. Our evaluation shows that DynaMoE achieves a 1.8x speedup and supports 2.3x larger model sizes when compared to existing MoE systems, even when not using recompilations. We then present further optimizations enabled by dynamic recompilations that yield an additional 1.7x speedup while simultaneously reducing memory pressure and improving model quality.
    On the Evaluation of User Privacy in Deep Neural Networks using Timing Side Channel. (arXiv:2208.01113v1 [cs.CR])
    Recent Deep Learning (DL) advancements in solving complex real-world tasks have led to its widespread adoption in practical applications. However, this opportunity comes with significant underlying risks, as many of these models rely on privacy-sensitive data for training in a variety of applications, making them an overly-exposed threat surface for privacy violations. Furthermore, the widespread use of cloud-based Machine-Learning-as-a-Service (MLaaS) for its robust infrastructure support has broadened the threat surface to include a variety of remote side-channel attacks. In this paper, we first identify and report a novel data-dependent timing side-channel leakage (termed Class Leakage) in DL implementations originating from non-constant time branching operation in a widely used DL framework PyTorch. We further demonstrate a practical inference-time attack where an adversary with user privilege and hard-label black-box access to an MLaaS can exploit Class Leakage to compromise the privacy of MLaaS users. DL models are vulnerable to Membership Inference Attack (MIA), where an adversary's objective is to deduce whether any particular data has been used while training the model. In this paper, as a separate case study, we demonstrate that a DL model secured with differential privacy (a popular countermeasure against MIA) is still vulnerable to MIA against an adversary exploiting Class Leakage. We develop an easy-to-implement countermeasure by making a constant-time branching operation that alleviates the Class Leakage and also aids in mitigating MIA. We have chosen two standard benchmarking image classification datasets, CIFAR-10 and CIFAR-100 to train five state-of-the-art pre-trained DL models, over two different computing environments having Intel Xeon and Intel i7 processors to validate our approach.
    Learning to estimate a surrogate respiratory signal from cardiac motion by signal-to-signal translation. (arXiv:2208.01034v1 [eess.IV])
    In this work, we develop a neural network-based method to convert a noisy motion signal generated from segmenting rebinned list-mode cardiac SPECT images, to that of a high-quality surrogate signal, such as those seen from external motion tracking systems (EMTs). This synthetic surrogate will be used as input to our pre-existing motion correction technique developed for EMT surrogate signals. In our method, we test two families of neural networks to translate noisy internal motion to external surrogate: 1) fully connected networks and 2) convolutional neural networks. Our dataset consists of cardiac perfusion SPECT acquisitions for which cardiac motion was estimated (input: center-of-count-mass - COM signals) in conjunction with a respiratory surrogate motion signal acquired using a commercial Vicon Motion Tracking System (GT: EMT signals). We obtained an average R-score of 0.76 between the predicted surrogate and the EMT signal. Our goal is to lay a foundation to guide the optimization of neural networks for respiratory motion correction from SPECT without the need for an EMT.
    ASTA: Learning Analytical Semantics over Tables for Intelligent Data Analysis and Visualization. (arXiv:2208.01043v1 [cs.DB])
    Intelligent analysis and visualization of tables use techniques to automatically recommend useful knowledge from data, thus freeing users from tedious multi-dimension data mining. While many studies have succeeded in automating recommendations through rules or machine learning, it is difficult to generalize expert knowledge and provide explainable recommendations. In this paper, we present the recommendation of conditional formatting for the first time, together with chart recommendation, to exemplify intelligent table analysis. We propose analytical semantics over tables to uncover common analysis pattern behind user-created analyses. Here, we design analytical semantics by separating data focus from user intent, which extract the user motivation from data and human perspective respectively. Furthermore, the ASTA framework is designed by us to apply analytical semantics to multiple automated recommendations. ASTA framework extracts data features by designing signatures based on expert knowledge, and enables data referencing at field- (chart) or cell-level (conditional formatting) with pre-trained models. Experiments show that our framework achieves recall at top 1 of 62.86% on public chart corpora, outperforming the best baseline about 14%, and achieves 72.31% on the collected corpus ConFormT, validating that ASTA framework is effective in providing accurate and explainable recommendations.
    Correlated-informed neural networks: a new machine learning framework to predict pressure drop in micro-channels. (arXiv:2201.07835v2 [cs.LG] UPDATED)
    Accurate pressure drop estimation in forced boiling phenomena is important during the thermal analysis and the geometric design of cryogenic heat exchangers. However, current methods to predict the pressure drop have one of two problems: lack of accuracy or generalization to different situations. In this work, we present the correlated-informed neural networks (CoINN), a new paradigm in applying the artificial neural network (ANN) technique combined with a successful pressure drop correlation as a mapping tool to predict the pressure drop of zeotropic mixtures in micro-channels. The proposed approach is inspired by Transfer Learning, highly used in deep learning problems with reduced datasets. Our method improves the ANN performance by transferring the knowledge of the Sun & Mishima correlation for the pressure drop to the ANN. The correlation having physical and phenomenological implications for the pressure drop in micro-channels considerably improves the performance and generalization capabilities of the ANN. The final architecture consists of three inputs: the mixture vapor quality, the micro-channel inner diameter, and the available pressure drop correlation. The results show the benefits gained using the correlated-informed approach predicting experimental data used for training and a posterior test with a mean relative error (mre) of 6%, lower than the Sun & Mishima correlation of 13%. Additionally, this approach can be extended to other mixtures and experimental settings, a missing feature in other approaches for mapping correlations using ANNs for heat transfer applications.
    VI-IKD: High-Speed Accurate Off-Road Navigation using Learned Visual-Inertial Inverse Kinodynamics. (arXiv:2203.15983v2 [cs.RO] UPDATED)
    One of the key challenges in high speed off road navigation on ground vehicles is that the kinodynamics of the vehicle terrain interaction can differ dramatically depending on the terrain. Previous approaches to addressing this challenge have considered learning an inverse kinodynamics (IKD) model, conditioned on inertial information of the vehicle to sense the kinodynamic interactions. In this paper, we hypothesize that to enable accurate high-speed off-road navigation using a learned IKD model, in addition to inertial information from the past, one must also anticipate the kinodynamic interactions of the vehicle with the terrain in the future. To this end, we introduce Visual-Inertial Inverse Kinodynamics (VI-IKD), a novel learning based IKD model that is conditioned on visual information from a terrain patch ahead of the robot in addition to past inertial information, enabling it to anticipate kinodynamic interactions in the future. We validate the effectiveness of VI-IKD in accurate high-speed off-road navigation experimentally on a scale 1/5 UT-AlphaTruck off-road autonomous vehicle in both indoor and outdoor environments and show that compared to other state-of-the-art approaches, VI-IKD enables more accurate and robust off-road navigation on a variety of different terrains at speeds of up to 3.5 m/s.
    ENERO: Efficient Real-Time WAN Routing Optimization with Deep Reinforcement Learning. (arXiv:2109.10883v3 [cs.NI] UPDATED)
    Wide Area Networks (WAN) are a key infrastructure in today's society. During the last years, WANs have seen a considerable increase in network's traffic and network applications, imposing new requirements on existing network technologies (e.g., low latency and high throughput). Consequently, Internet Service Providers (ISP) are under pressure to ensure the customer's Quality of Service and fulfill Service Level Agreements. Network operators leverage Traffic Engineering (TE) techniques to efficiently manage network's resources. However, WAN's traffic can drastically change during time and the connectivity can be affected due to external factors (e.g., link failures). Therefore, TE solutions must be able to adapt to dynamic scenarios in real-time. In this paper we propose Enero, an efficient real-time TE solution based on a two-stage optimization process. In the first one, Enero leverages Deep Reinforcement Learning (DRL) to optimize the routing configuration by generating a long-term TE strategy. To enable efficient operation over dynamic network scenarios (e.g., when link failures occur), we integrated a Graph Neural Network into the DRL agent. In the second stage, Enero uses a Local Search algorithm to improve DRL's solution without adding computational overhead to the optimization process. The experimental results indicate that Enero is able to operate in real-world dynamic network topologies in 4.5 seconds on average for topologies up to 100 edges.
    Face-to-Face Contrastive Learning for Social Intelligence Question-Answering. (arXiv:2208.01036v1 [cs.LG])
    Creating artificial social intelligence - algorithms that can understand the nuances of multi-person interactions - is an exciting and emerging challenge in processing facial expressions and gestures from multimodal videos. Recent multimodal methods have set the state of the art on many tasks, but have difficulty modeling the complex face-to-face conversational dynamics across speaking turns in social interaction, particularly in a self-supervised setup. In this paper, we propose Face-to-Face Contrastive Learning (F2F-CL), a graph neural network designed to model social interactions using factorization nodes to contextualize the multimodal face-to-face interaction along the boundaries of the speaking turn. With the F2F-CL model, we propose to perform contrastive learning between the factorization nodes of different speaking turns within the same video. We experimentally evaluated the challenging Social-IQ dataset and show state-of-the-art results.
    Binary Independent Component Analysis: A Non-stationarity-based Approach. (arXiv:2111.15431v2 [cs.LG] UPDATED)
    We consider independent component analysis of binary data. While fundamental in practice, this case has been much less developed than ICA for continuous data. We start by assuming a linear mixing model in a continuous-valued latent space, followed by a binary observation model. Importantly, we assume that the sources are non-stationary; this is necessary since any non-Gaussianity would essentially be destroyed by the binarization. Interestingly, the model allows for closed-form likelihood by employing the cumulative distribution function of the multivariate Gaussian distribution. In stark contrast to the continuous-valued case, we prove non-identifiability of the model with few observed variables; our empirical results imply identifiability when the number of observed variables is higher. We present a practical method for binary ICA that uses only pairwise marginals, which are faster to compute than the full multivariate likelihood. Experiments give insight into the requirements for the number of observed variables, segments, and latent sources that allow the model to be estimated.
    Nonnegative Tucker Decomposition with Beta-divergence for Music Structure Analysis of Audio Signals. (arXiv:2110.14434v4 [cs.SD] UPDATED)
    Nonnegative Tucker decomposition (NTD), a tensor decomposition model, has received increased interest in the recent years because of its ability to blindly extract meaningful patterns, in particular in Music Information Retrieval. Nevertheless, existing algorithms to compute NTD are mostly designed for the Euclidean loss. This work proposes a multiplicative updates algorithm to compute NTD with the beta-divergence loss, often considered a better loss for audio processing. We notably show how to implement efficiently the multiplicative rules using tensor algebra. Finally, we show on a music structure analysis task that unsupervised NTD fitted with beta-divergence loss outperforms earlier results obtained with the Euclidean loss.
    What Can Transformers Learn In-Context? A Case Study of Simple Function Classes. (arXiv:2208.01066v1 [cs.CL])
    In-context learning refers to the ability of a model to condition on a prompt sequence consisting of in-context examples (input-output pairs corresponding to some task) along with a new query input, and generate the corresponding output. Crucially, in-context learning happens only at inference time without any parameter updates to the model. While large language models such as GPT-3 exhibit some ability to perform in-context learning, it is unclear what the relationship is between tasks on which this succeeds and what is present in the training data. To make progress towards understanding in-context learning, we consider the well-defined problem of training a model to in-context learn a function class (e.g., linear functions): that is, given data derived from some functions in the class, can we train a model to in-context learn "most" functions from this class? We show empirically that standard Transformers can be trained from scratch to perform in-context learning of linear functions -- that is, the trained model is able to learn unseen linear functions from in-context examples with performance comparable to the optimal least squares estimator. In fact, in-context learning is possible even under two forms of distribution shift: (i) between the training data of the model and inference-time prompts, and (ii) between the in-context examples and the query input during inference. We also show that we can train Transformers to in-context learn more complex function classes -- namely sparse linear functions, two-layer neural networks, and decision trees -- with performance that matches or exceeds task-specific learning algorithms. Our code and models are available at https://github.com/dtsip/in-context-learning .
    Short-term Load Forecasting with Distributed Long Short-Term Memory. (arXiv:2208.01147v1 [cs.LG])
    With the employment of smart meters, massive data on consumer behaviour can be collected by retailers. From the collected data, the retailers may obtain the household profile information and implement demand response. While retailers prefer to acquire a model as accurate as possible among different customers, there are two major challenges. First, different retailers in the retail market do not share their consumer's electricity consumption data as these data are regarded as their assets, which has led to the problem of data island. Second, the electricity load data are highly heterogeneous since different retailers may serve various consumers. To this end, a fully distributed short-term load forecasting framework based on a consensus algorithm and Long Short-Term Memory (LSTM) is proposed, which may protect the customer's privacy and satisfy the accurate load forecasting requirement. Specifically, a fully distributed learning framework is exploited for distributed training, and a consensus technique is applied to meet confidential privacy. Case studies show that the proposed method has comparable performance with centralised methods regarding the accuracy, but the proposed method shows advantages in training speed and data privacy.
    Learning of Parameters in Behavior Trees for Movement Skills. (arXiv:2109.13050v2 [cs.RO] UPDATED)
    Reinforcement Learning (RL) is a powerful mathematical framework that allows robots to learn complex skills by trial-and-error. Despite numerous successes in many applications, RL algorithms still require thousands of trials to converge to high-performing policies, can produce dangerous behaviors while learning, and the optimized policies (usually modeled as neural networks) give almost zero explanation when they fail to perform the task. For these reasons, the adoption of RL in industrial settings is not common. Behavior Trees (BTs), on the other hand, can provide a policy representation that a) supports modular and composable skills, b) allows for easy interpretation of the robot actions, and c) provides an advantageous low-dimensional parameter space. In this paper, we present a novel algorithm that can learn the parameters of a BT policy in simulation and then generalize to the physical robot without any additional training. We leverage a physical simulator with a digital twin of our workstation, and optimize the relevant parameters with a black-box optimizer. We showcase the efficacy of our method with a 7-DOF KUKA-iiwa manipulator in a task that includes obstacle avoidance and a contact-rich insertion (peg-in-hole), in which our method outperforms the baselines.
    An Online Sparse Streaming Feature Selection Algorithm. (arXiv:2208.01562v1 [cs.LG])
    Online streaming feature selection (OSFS), which conducts feature selection in an online manner, plays an important role in dealing with high-dimensional data. In many real applications such as intelligent healthcare platform, streaming feature always has some missing data, which raises a crucial challenge in conducting OSFS, i.e., how to establish the uncertain relationship between sparse streaming features and labels. Unfortunately, existing OSFS algorithms never consider such uncertain relationship. To fill this gap, we in this paper propose an online sparse streaming feature selection with uncertainty (OS2FSU) algorithm. OS2FSU consists of two main parts: 1) latent factor analysis is utilized to pre-estimate the missing data in sparse streaming features before con-ducting feature selection, and 2) fuzzy logic and neighborhood rough set are employed to alleviate the uncertainty between estimated streaming features and labels during conducting feature selection. In the experiments, OS2FSU is compared with five state-of-the-art OSFS algorithms on six real datasets. The results demonstrate that OS2FSU outperforms its competitors when missing data are encountered in OSFS.
    Implicit Two-Tower Policies. (arXiv:2208.01191v1 [cs.LG])
    We present a new class of structured reinforcement learning policy-architectures, Implicit Two-Tower (ITT) policies, where the actions are chosen based on the attention scores of their learnable latent representations with those of the input states. By explicitly disentangling action from state processing in the policy stack, we achieve two main goals: substantial computational gains and better performance. Our architectures are compatible with both: discrete and continuous action spaces. By conducting tests on 15 environments from OpenAI Gym and DeepMind Control Suite, we show that ITT-architectures are particularly suited for blackbox/evolutionary optimization and the corresponding policy training algorithms outperform their vanilla unstructured implicit counterparts as well as commonly used explicit policies. We complement our analysis by showing how techniques such as hashing and lazy tower updates, critically relying on the two-tower structure of ITTs, can be applied to obtain additional computational improvements.
    Analog Gated Recurrent Neural Network for Detecting Chewing Events. (arXiv:2208.01201v1 [cs.LG])
    We present a novel gated recurrent neural network to detect when a person is chewing on food. We implemented the neural network as a custom analog integrated circuit in a 0.18 um CMOS technology. The neural network was trained on 6.4 hours of data collected from a contact microphone that was mounted on volunteers' mastoid bones. When tested on 1.6 hours of previously-unseen data, the neural network identified chewing events at a 24-second time resolution. It achieved a recall of 91% and an F1-score of 94% while consuming 1.1 uW of power. A system for detecting whole eating episodes -- like meals and snacks -- that is based on the novel analog neural network consumes an estimated 18.8uW of power.
    Automatic Classification of Bug Reports Based on Multiple Text Information and Reports' Intention. (arXiv:2208.01274v1 [cs.SE])
    With the rapid growth of software scale and complexity, a large number of bug reports are submitted to the bug tracking system. In order to speed up defect repair, these reports need to be accurately classified so that they can be sent to the appropriate developers. However, the existing classification methods only use the text information of the bug report, which leads to their low performance. To solve the above problems, this paper proposes a new automatic classification method for bug reports. The innovation is that when categorizing bug reports, in addition to using the text information of the report, the intention of the report (i.e. suggestion or explanation) is also considered, thereby improving the performance of the classification. First, we collect bug reports from four ecosystems (Apache, Eclipse, Gentoo, Mozilla) and manually annotate them to construct an experimental data set. Then, we use Natural Language Processing technology to preprocess the data. On this basis, BERT and TF-IDF are used to extract the features of the intention and the multiple text information. Finally, the features are used to train the classifiers. The experimental result on five classifiers (including K-Nearest Neighbor, Naive Bayes, Logistic Regression, Support Vector Machine, and Random Forest) show that our proposed method achieves better performance and its F-Measure achieves from 87.3% to 95.5%.
    Making a Spiking Net Work: Robust brain-like unsupervised machine learning. (arXiv:2208.01204v1 [cs.NE])
    The surge in interest in Artificial Intelligence (AI) over the past decade has been driven almost exclusively by advances in Artificial Neural Networks (ANNs). While ANNs set state-of-the-art performance for many previously intractable problems, they require large amounts of data and computational resources for training, and since they employ supervised learning they typically need to know the correctly labelled response for every training example, limiting their scalability for real-world domains. Spiking Neural Networks (SNNs) are an alternative to ANNs that use more brain-like artificial neurons and can use unsupervised learning to discover recognizable features in the input data without knowing correct responses. SNNs, however, struggle with dynamical stability and cannot match the accuracy of ANNs. Here we show how an SNN can overcome many of the shortcomings that have been identified in the literature, including offering a principled solution to the vanishing spike problem, to outperform all existing shallow SNNs and equal the performance of an ANN. It accomplishes this while using unsupervised learning with unlabeled data and only 1/50th of the training epochs (labelled data is used only for a final simple linear readout layer). This result makes SNNs a viable new method for fast, accurate, efficient, explainable, and re-deployable machine learning with unlabeled datasets.
    Dyadic Movement Synchrony Estimation Under Privacy-preserving Conditions. (arXiv:2208.01100v1 [cs.CV])
    Movement synchrony refers to the dynamic temporal connection between the motions of interacting people. The applications of movement synchrony are wide and broad. For example, as a measure of coordination between teammates, synchrony scores are often reported in sports. The autism community also identifies movement synchrony as a key indicator of children's social and developmental achievements. In general, raw video recordings are often used for movement synchrony estimation, with the drawback that they may reveal people's identities. Furthermore, such privacy concern also hinders data sharing, one major roadblock to a fair comparison between different approaches in autism research. To address the issue, this paper proposes an ensemble method for movement synchrony estimation, one of the first deep-learning-based methods for automatic movement synchrony assessment under privacy-preserving conditions. Our method relies entirely on publicly shareable, identity-agnostic secondary data, such as skeleton data and optical flow. We validate our method on two datasets: (1) PT13 dataset collected from autism therapy interventions and (2) TASD-2 dataset collected from synchronized diving competitions. In this context, our method outperforms its counterpart approaches, both deep neural networks and alternatives.
    Patents Phrase to Phrase Semantic Matching Dataset. (arXiv:2208.01171v1 [cs.CL])
    There are many general purpose benchmark datasets for Semantic Textual Similarity but none of them are focused on technical concepts found in patents and scientific publications. This work aims to fill this gap by presenting a new human rated contextual phrase to phrase matching dataset. The entire dataset contains close to $50,000$ rated phrase pairs, each with a CPC (Cooperative Patent Classification) class as a context. This paper describes the dataset and some baseline models.
    Audio Deepfake Detection Based on a Combination of F0 Information and Real Plus Imaginary Spectrogram Features. (arXiv:2208.01214v1 [cs.SD])
    Recently, pioneer research works have proposed a large number of acoustic features (log power spectrogram, linear frequency cepstral coefficients, constant Q cepstral coefficients, etc.) for audio deepfake detection, obtaining good performance, and showing that different subbands have different contributions to audio deepfake detection. However, this lacks an explanation of the specific information in the subband, and these features also lose information such as phase. Inspired by the mechanism of synthetic speech, the fundamental frequency (F0) information is used to improve the quality of synthetic speech, while the F0 of synthetic speech is still too average, which differs significantly from that of real speech. It is expected that F0 can be used as important information to discriminate between bonafide and fake speech, while this information cannot be used directly due to the irregular distribution of F0. Insteadly, the frequency band containing most of F0 is selected as the input feature. Meanwhile, to make full use of the phase and full-band information, we also propose to use real and imaginary spectrogram features as complementary input features and model the disjoint subbands separately. Finally, the results of F0, real and imaginary spectrogram features are fused. Experimental results on the ASVspoof 2019 LA dataset show that our proposed system is very effective for the audio deepfake detection task, achieving an equivalent error rate (EER) of 0.43%, which surpasses almost all systems.  ( 3 min )
    Interpretable Time Series Clustering Using Local Explanations. (arXiv:2208.01152v1 [cs.LG])
    This study focuses on exploring the use of local interpretability methods for explaining time series clustering models. Many of the state-of-the-art clustering models are not directly explainable. To provide explanations for these clustering algorithms, we train classification models to estimate the cluster labels. Then, we use interpretability methods to explain the decisions of the classification models. The explanations are used to obtain insights into the clustering models. We perform a detailed numerical study to test the proposed approach on multiple datasets, clustering models, and classification models. The analysis of the results shows that the proposed approach can be used to explain time series clustering models, specifically when the underlying classification model is accurate. Lastly, we provide a detailed analysis of the results, discussing how our approach can be used in a real-life scenario.  ( 2 min )
    Generative Adversarial Learning for Intelligent Trust Management in 6G Wireless Networks. (arXiv:2208.01221v1 [cs.NI])
    Emerging six generation (6G) is the integration of heterogeneous wireless networks, which can seamlessly support anywhere and anytime networking. But high Quality-of-Trust should be offered by 6G to meet mobile user expectations. Artificial intelligence (AI) is considered as one of the most important components in 6G. Then AI-based trust management is a promising paradigm to provide trusted and reliable services. In this article, a generative adversarial learning-enabled trust management method is presented for 6G wireless networks. Some typical AI-based trust management schemes are first reviewed, and then a potential heterogeneous and intelligent 6G architecture is introduced. Next, the integration of AI and trust management is developed to optimize the intelligence and security. Finally, the presented AI-based trust management method is applied to secure clustering to achieve reliable and real-time communications. Simulation results have demonstrated its excellent performance in guaranteeing network security and service quality.  ( 2 min )
    DAPDAG: Domain Adaptation via Perturbed DAG Reconstruction. (arXiv:2208.01373v1 [cs.LG])
    Leveraging labelled data from multiple domains to enable prediction in another domain without labels is a significant, yet challenging problem. To address this problem, we introduce the framework DAPDAG (\textbf{D}omain \textbf{A}daptation via \textbf{P}erturbed \textbf{DAG} Reconstruction) and propose to learn an auto-encoder that undertakes inference on population statistics given features and reconstructing a directed acyclic graph (DAG) as an auxiliary task. The underlying DAG structure is assumed invariant among observed variables whose conditional distributions are allowed to vary across domains led by a latent environmental variable $E$. The encoder is designed to serve as an inference device on $E$ while the decoder reconstructs each observed variable conditioned on its graphical parents in the DAG and the inferred $E$. We train the encoder and decoder jointly in an end-to-end manner and conduct experiments on synthetic and real datasets with mixed variables. Empirical results demonstrate that reconstructing the DAG benefits the approximate inference. Furthermore, our approach can achieve competitive performance against other benchmarks in prediction tasks, with better adaptation ability, especially in the target domain significantly different from the source domains.
    MV6D: Multi-View 6D Pose Estimation on RGB-D Frames Using a Deep Point-wise Voting Network. (arXiv:2208.01172v1 [cs.CV])
    Estimating 6D poses of objects is an essential computer vision task. However, most conventional approaches rely on camera data from a single perspective and therefore suffer from occlusions. We overcome this issue with our novel multi-view 6D pose estimation method called MV6D which accurately predicts the 6D poses of all objects in a cluttered scene based on RGB-D images from multiple perspectives. We base our approach on the PVN3D network that uses a single RGB-D image to predict keypoints of the target objects. We extend this approach by using a combined point cloud from multiple views and fusing the images from each view with a DenseFusion layer. In contrast to current multi-view pose detection networks such as CosyPose, our MV6D can learn the fusion of multiple perspectives in an end-to-end manner and does not require multiple prediction stages or subsequent fine tuning of the prediction. Furthermore, we present three novel photorealistic datasets of cluttered scenes with heavy occlusions. All of them contain RGB-D images from multiple perspectives and the ground truth for instance semantic segmentation and 6D pose estimation. MV6D significantly outperforms the state-of-the-art in multi-view 6D pose estimation even in cases where the camera poses are known inaccurately. Furthermore, we show that our approach is robust towards dynamic camera setups and that its accuracy increases incrementally with an increasing number of perspectives.  ( 3 min )
    Fast Kernel Density Estimation with Density Matrices and Random Fourier Features. (arXiv:2208.01206v1 [cs.LG])
    Kernel density estimation (KDE) is one of the most widely used nonparametric density estimation methods. The fact that it is a memory-based method, i.e., it uses the entire training data set for prediction, makes it unsuitable for most current big data applications. Several strategies, such as tree-based or hashing-based estimators, have been proposed to improve the efficiency of the kernel density estimation method. The novel density kernel density estimation method (DMKDE) uses density matrices, a quantum mechanical formalism, and random Fourier features, an explicit kernel approximation, to produce density estimates. This method has its roots in the KDE and can be considered as an approximation method, without its memory-based restriction. In this paper, we systematically evaluate the novel DMKDE algorithm and compare it with other state-of-the-art fast procedures for approximating the kernel density estimation method on different synthetic data sets. Our experimental results show that DMKDE is on par with its competitors for computing density estimates and advantages are shown when performed on high-dimensional data. We have made all the code available as an open source software repository.  ( 2 min )
    Vertical GaN Diode BV Maximization through Rapid TCAD Simulation and ML-enabled Surrogate Model. (arXiv:2208.01142v1 [cs.LG])
    In this paper, two methodologies are used to speed up the maximization of the breakdown volt-age (BV) of a vertical GaN diode that has a theoretical maximum BV of ~2100V. Firstly, we demonstrated a 5X faster accurate simulation method in Technology Computer-Aided-Design (TCAD). This allows us to find 50% more numbers of high BV (>1400V) designs at a given simulation time. Secondly, a machine learning (ML) model is developed using TCAD-generated data and used as a surrogate model for differential evolution optimization. It can inversely design an out-of-the-training-range structure with BV as high as 1887V (89% of the ideal case) compared to ~1100V designed with human domain expertise.  ( 2 min )
    Improving the Trainability of Deep Neural Networks through Layerwise Batch-Entropy Regularization. (arXiv:2208.01134v1 [cs.LG])
    Training deep neural networks is a very demanding task, especially challenging is how to adapt architectures to improve the performance of trained models. We can find that sometimes, shallow networks generalize better than deep networks, and the addition of more layers results in higher training and test errors. The deep residual learning framework addresses this degradation problem by adding skip connections to several neural network layers. It would at first seem counter-intuitive that such skip connections are needed to train deep networks successfully as the expressivity of a network would grow exponentially with depth. In this paper, we first analyze the flow of information through neural networks. We introduce and evaluate the batch-entropy which quantifies the flow of information through each layer of a neural network. We prove empirically and theoretically that a positive batch-entropy is required for gradient descent-based training approaches to optimize a given loss function successfully. Based on those insights, we introduce batch-entropy regularization to enable gradient descent-based training algorithms to optimize the flow of information through each hidden layer individually. With batch-entropy regularization, gradient descent optimizers can transform untrainable networks into trainable networks. We show empirically that we can therefore train a "vanilla" fully connected network and convolutional neural network -- no skip connections, batch normalization, dropout, or any other architectural tweak -- with 500 layers by simply adding the batch-entropy regularization term to the loss function. The effect of batch-entropy regularization is not only evaluated on vanilla neural networks, but also on residual networks, autoencoders, and also transformer models over a wide range of computer vision as well as natural language processing tasks.  ( 3 min )
    Efficient Personalized Learning for Wearable Health Applications using HyperDimensional Computing. (arXiv:2208.01095v1 [cs.LG])
    Health monitoring applications increasingly rely on machine learning techniques to learn end-user physiological and behavioral patterns in everyday settings. Considering the significant role of wearable devices in monitoring human body parameters, on-device learning can be utilized to build personalized models for behavioral and physiological patterns, and provide data privacy for users at the same time. However, resource constraints on most of these wearable devices prevent the ability to perform online learning on them. To address this issue, it is required to rethink the machine learning models from the algorithmic perspective to be suitable to run on wearable devices. Hyperdimensional computing (HDC) offers a well-suited on-device learning solution for resource-constrained devices and provides support for privacy-preserving personalization. Our HDC-based method offers flexibility, high efficiency, resilience, and performance while enabling on-device personalization and privacy protection. We evaluate the efficacy of our approach using three case studies and show that our system improves the energy efficiency of training by up to $45.8\times$ compared with the state-of-the-art Deep Neural Network (DNN) algorithms while offering a comparable accuracy.  ( 2 min )
    Boosted Off-Policy Learning. (arXiv:2208.01148v1 [cs.LG])
    We investigate boosted ensemble models for off-policy learning from logged bandit feedback. Toward this goal, we propose a new boosting algorithm that directly optimizes an estimate of the policy's expected reward. We analyze this algorithm and prove that the empirical risk decreases (possibly exponentially fast) with each round of boosting, provided a "weak" learning condition is satisfied. We further show how the base learner reduces to standard supervised learning problems. Experiments indicate that our algorithm can outperform deep off-policy learning and methods that simply regress on the observed rewards, thereby demonstrating the benefits of both boosting and choosing the right learning objective.  ( 2 min )
    CircuitNet: An Open-Source Dataset for Machine Learning Applications in Electronic Design Automation (EDA). (arXiv:2208.01040v1 [cs.LG])
    The electronic design automation (EDA) community has been actively exploring machine learning for very-large-scale-integrated computer aided design (VLSI CAD). Many studies have explored learning based techniques for cross-stage prediction tasks in the design flow to achieve faster design convergence. Although building machine learning (ML) models usually requires a large amount of data, most studies can only generate small internal datasets for validation due to the lack of large public datasets. In this essay, we present the first open-source dataset for machine learning tasks in VLSI CAD called CircuitNet. The dataset consists of more than 10K samples extracted from versatile runs of commercial design tools based on 6 open-source RISC-V designs.  ( 2 min )
    Voice Analysis for Stress Detection and Application in Virtual Reality to Improve Public Speaking in Real-time: A Review. (arXiv:2208.01041v1 [eess.AS])
    Stress during public speaking is common and adversely affects performance and self-confidence. Extensive research has been carried out to develop various models to recognize emotional states. However, minimal research has been conducted to detect stress during public speaking in real time using voice analysis. In this context, the current review showed that the application of algorithms was not properly explored and helped identify the main obstacles in creating a suitable testing environment while accounting for current complexities and limitations. In this paper, we present our main idea and propose a stress detection computational algorithmic model that could be integrated into a Virtual Reality (VR) application to create an intelligent virtual audience for improving public speaking skills. The developed model, when integrated with VR, will be able to detect excessive stress in real time by analysing voice features correlated to physiological parameters indicative of stress and help users gradually control excessive stress and improve public speaking performance  ( 2 min )
  • Open

    Improving Few-Shot Learning through Multi-task Representation Learning Theory. (arXiv:2010.01992v3 [cs.LG] UPDATED)
    In this paper, we consider the framework of multi-task representation (MTR) learning where the goal is to use source tasks to learn a representation that reduces the sample complexity of solving a target task. We start by reviewing recent advances in MTR theory and show that they can provide novel insights for popular meta-learning algorithms when analyzed within this framework. In particular, we highlight a fundamental difference between gradient-based and metric-based algorithms in practice and put forward a theoretical analysis to explain it. Finally, we use the derived insights to improve the performance of meta-learning methods via a new spectral-based regularization term and confirm its efficiency through experimental studies on few-shot classification benchmarks. To the best of our knowledge, this is the first contribution that puts the most recent learning bounds of MTR theory into practice for the task of few-shot classification.
    Accelerated and interpretable oblique random survival forests. (arXiv:2208.01129v1 [stat.ME])
    The oblique random survival forest (RSF) is an ensemble supervised learning method for right-censored outcomes. Trees in the oblique RSF are grown using linear combinations of predictors to create branches, whereas in the standard RSF, a single predictor is used. Oblique RSF ensembles often have higher prediction accuracy than standard RSF ensembles. However, assessing all possible linear combinations of predictors induces significant computational overhead that limits applications to large-scale data sets. In addition, few methods have been developed for interpretation of oblique RSF ensembles, and they remain more difficult to interpret compared to their axis-based counterparts. We introduce a method to increase computational efficiency of the oblique RSF and a method to estimate importance of individual predictor variables with the oblique RSF. Our strategy to reduce computational overhead makes use of Newton-Raphson scoring, a classical optimization technique that we apply to the Cox partial likelihood function within each non-leaf node of decision trees. We estimate the importance of individual predictors for the oblique RSF by negating each coefficient used for the given predictor in linear combinations, and then computing the reduction in out-of-bag accuracy. In general benchmarking experiments, we find that our implementation of the oblique RSF is approximately 450 times faster with equivalent discrimination and superior Brier score compared to existing software for oblique RSFs. We find in simulation studies that 'negation importance' discriminates between relevant and irrelevant predictors more reliably than permutation importance, Shapley additive explanations, and a previously introduced technique to measure variable importance with oblique RSFs based on analysis of variance. Methods introduced in the current study are available in the aorsf R package.
    Effects of Graph Convolutions in Multi-layer Networks. (arXiv:2204.09297v2 [cs.LG] UPDATED)
    Graph Convolutional Networks (GCNs) are one of the most popular architectures that are used to solve classification problems accompanied by graphical information. We present a rigorous theoretical understanding of the effects of graph convolutions in multi-layer networks. We study these effects through the node classification problem of a non-linearly separable Gaussian mixture model coupled with a stochastic block model. First, we show that a single graph convolution expands the regime of the distance between the means where multi-layer networks can classify the data by a factor of at least $1/\sqrt[4]{\mathbb{E}{\rm deg}}$, where $\mathbb{E}{\rm deg}$ denotes the expected degree of a node. Second, we show that with a slightly stronger graph density, two graph convolutions improve this factor to at least $1/\sqrt[4]{n}$, where $n$ is the number of nodes in the graph. Finally, we provide both theoretical and empirical insights into the performance of graph convolutions placed in different combinations among the layers of a network, concluding that the performance is mutually similar for all combinations of the placement. We present extensive experiments on both synthetic and real-world data that illustrate our results.
    Systematically and efficiently improving existing $k$-means initialization algorithms by pairwise-nearest-neighbor smoothing. (arXiv:2202.03949v2 [cs.LG] UPDATED)
    We present a meta-method for initializing (seeding) the $k$-means clustering algorithm called PNN-smoothing. It consists in splitting a given dataset into $J$ random subsets, clustering each of them individually, and merging the resulting clusterings with the pairwise-nearest-neighbor (PNN) method. It is a meta-method in the sense that when clustering the individual subsets any seeding algorithm can be used. If the computational complexity of that seeding algorithm is linear in the size of the data $N$ and the number of clusters $k$, PNN-smoothing is also almost linear with an appropriate choice of $J$, and quite competitive in practice. We show empirically, using several existing seeding methods and testing on several synthetic and real datasets, that this procedure results in systematically better costs. Our implementation is publicly available at https://github.com/carlobaldassi/KMeansPNNSmoothing.jl.
    Perturbation Analysis of Randomized SVD and its Applications to High-dimensional Statistics. (arXiv:2203.10262v2 [math.ST] UPDATED)
    Randomized singular value decomposition (RSVD) is a class of computationally efficient algorithms for computing the truncated SVD of large data matrices. Given a $n \times n$ symmetric matrix $\mathbf{M}$, the prototypical RSVD algorithm outputs an approximation of the $k$ leading singular vectors of $\mathbf{M}$ by computing the SVD of $\mathbf{M}^{g} \mathbf{G}$; here $g \geq 1$ is an integer and $\mathbf{G} \in \mathbb{R}^{n \times k}$ is a random Gaussian sketching matrix. In this paper we study the statistical properties of RSVD under a general "signal-plus-noise" framework, i.e., the observed matrix $\hat{\mathbf{M}}$ is assumed to be an additive perturbation of some true but unknown signal matrix $\mathbf{M}$. We first derive upper bounds for the $\ell_2$ (spectral norm) and $\ell_{2\to\infty}$ (maximum row-wise $\ell_2$ norm) distances between the approximate singular vectors of $\hat{\mathbf{M}}$ and the true singular vectors of the signal matrix $\mathbf{M}$. These upper bounds depend on the signal-to-noise ratio (SNR) and the number of power iterations $g$. A phase transition phenomenon is observed in which a smaller SNR requires larger values of $g$ to guarantee convergence of the $\ell_2$ and $\ell_{2\to\infty}$ distances. We also show that the thresholds for $g$ where these phase transitions occur are sharp whenever the noise matrices satisfy a certain trace growth condition. Finally, we derive normal approximations for the row-wise fluctuations of the approximate singular vectors and the entrywise fluctuations of the approximate matrix. We illustrate our theoretical results by deriving nearly-optimal performance guarantees for RSVD when applied to three statistical inference problems, namely, community detection, matrix completion, and principal component analysis with missing data.
    Learning Invariant Weights in Neural Networks. (arXiv:2202.12439v2 [stat.ML] UPDATED)
    Assumptions about invariances or symmetries in data can significantly increase the predictive power of statistical models. Many commonly used models in machine learning are constraint to respect certain symmetries in the data, such as translation equivariance in convolutional neural networks, and incorporation of new symmetry types is actively being studied. Yet, efforts to learn such invariances from the data itself remains an open research problem. It has been shown that marginal likelihood offers a principled way to learn invariances in Gaussian Processes. We propose a weight-space equivalent to this approach, by minimizing a lower bound on the marginal likelihood to learn invariances in neural networks resulting in naturally higher performing models.
    Trimmed Maximum Likelihood Estimation for Robust Learning in Generalized Linear Models. (arXiv:2206.04777v2 [cs.LG] UPDATED)
    We study the problem of learning generalized linear models under adversarial corruptions. We analyze a classical heuristic called the iterative trimmed maximum likelihood estimator which is known to be effective against label corruptions in practice. Under label corruptions, we prove that this simple estimator achieves minimax near-optimal risk on a wide range of generalized linear models, including Gaussian regression, Poisson regression and Binomial regression. Finally, we extend the estimator to the more challenging setting of label and covariate corruptions and demonstrate its robustness and optimality in that setting as well.
    Context-Aware Drift Detection. (arXiv:2203.08644v2 [stat.ML] UPDATED)
    When monitoring machine learning systems, two-sample tests of homogeneity form the foundation upon which existing approaches to drift detection build. They are used to test for evidence that the distribution underlying recent deployment data differs from that underlying the historical reference data. Often, however, various factors such as time-induced correlation mean that batches of recent deployment data are not expected to form an i.i.d. sample from the historical data distribution. Instead we may wish to test for differences in the distributions conditional on \textit{context} that is permitted to change. To facilitate this we borrow machinery from the causal inference domain to develop a more general drift detection framework built upon a foundation of two-sample tests for conditional distributional treatment effects. We recommend a particular instantiation of the framework based on maximum conditional mean discrepancies. We then provide an empirical study demonstrating its effectiveness for various drift detection problems of practical interest, such as detecting drift in the distributions underlying subpopulations of data in a manner that is insensitive to their respective prevalences. The study additionally demonstrates applicability to ImageNet-scale vision problems.
    Reduced-order modeling for parameterized large-eddy simulations of atmospheric pollutant dispersion. (arXiv:2208.01518v1 [stat.ML])
    Mapping near-field pollutant concentration is essential to track accidental toxic plume dispersion in urban areas. By solving a large part of the turbulence spectrum, large-eddy simulations (LES) have the potential to accurately represent pollutant concentration spatial variability. Finding a way to synthesize this large amount of information to improve the accuracy of lower-fidelity operational models (e.g. providing better turbulence closure terms) is particularly appealing. This is a challenge in multi-query contexts, where LES become prohibitively costly to deploy to understand how plume flow and tracer dispersion change with various atmospheric and source parameters. To overcome this issue, we propose a non-intrusive reduced-order model combining proper orthogonal decomposition (POD) and Gaussian process regression (GPR) to predict LES field statistics of interest associated with tracer concentrations. GPR hyperpararameters are optimized component-by-component through a maximum a posteriori (MAP) procedure informed by POD. We provide a detailed analysis of the reducedorder model performance on a two-dimensional case study corresponding to a turbulent atmospheric boundary-layer flow over a surface-mounted obstacle. We show that near-source concentration heterogeneities upstream of the obstacle require a large number of POD modes to be well captured. We also show that the component-by-component optimization allows to capture the range of spatial scales in the POD modes, especially the shorter concentration patterns in the high-order modes. The reduced-order model predictions remain acceptable if the learning database is made of at least fifty to hundred LES snapshot providing a first estimation of the required budget to move towards more realistic atmospheric dispersion applications.  ( 3 min )
    Binary Independent Component Analysis: A Non-stationarity-based Approach. (arXiv:2111.15431v2 [cs.LG] UPDATED)
    We consider independent component analysis of binary data. While fundamental in practice, this case has been much less developed than ICA for continuous data. We start by assuming a linear mixing model in a continuous-valued latent space, followed by a binary observation model. Importantly, we assume that the sources are non-stationary; this is necessary since any non-Gaussianity would essentially be destroyed by the binarization. Interestingly, the model allows for closed-form likelihood by employing the cumulative distribution function of the multivariate Gaussian distribution. In stark contrast to the continuous-valued case, we prove non-identifiability of the model with few observed variables; our empirical results imply identifiability when the number of observed variables is higher. We present a practical method for binary ICA that uses only pairwise marginals, which are faster to compute than the full multivariate likelihood. Experiments give insight into the requirements for the number of observed variables, segments, and latent sources that allow the model to be estimated.  ( 2 min )
    Fisher and Kernel Fisher Discriminant Analysis: Tutorial. (arXiv:1906.09436v2 [stat.ML] UPDATED)
    This is a detailed tutorial paper which explains the Fisher discriminant Analysis (FDA) and kernel FDA. We start with projection and reconstruction. Then, one- and multi-dimensional FDA subspaces are covered. Scatters in two- and then multi-classes are explained in FDA. Then, we discuss on the rank of the scatters and the dimensionality of the subspace. A real-life example is also provided for interpreting FDA. Then, possible singularity of the scatter is discussed to introduce robust FDA. PCA and FDA directions are also compared. We also prove that FDA and linear discriminant analysis are equivalent. Fisher forest is also introduced as an ensemble of fisher subspaces useful for handling data with different features and dimensionality. Afterwards, kernel FDA is explained for both one- and multi-dimensional subspaces with both two- and multi-classes. Finally, some simulations are performed on AT&T face dataset to illustrate FDA and compare it with PCA.  ( 2 min )
    Unsupervised and Supervised Principal Component Analysis: Tutorial. (arXiv:1906.03148v2 [stat.ML] UPDATED)
    This is a detailed tutorial paper which explains the Principal Component Analysis (PCA), Supervised PCA (SPCA), kernel PCA, and kernel SPCA. We start with projection, PCA with eigen-decomposition, PCA with one and multiple projection directions, properties of the projection matrix, reconstruction error minimization, and we connect to autoencoder. Then, PCA with singular value decomposition, dual PCA, and kernel PCA are covered. SPCA using both scoring and Hilbert-Schmidt independence criterion are explained. Kernel SPCA using both direct and dual approaches are then introduced. We cover all cases of projection and reconstruction of training and out-of-sample data. Finally, some simulations are provided on Frey and AT&T face datasets for verifying the theory in practice.  ( 2 min )
    Data-Driven Discovery of Molecular Photoswitches with Multioutput Gaussian Processes. (arXiv:2008.03226v2 [physics.chem-ph] UPDATED)
    Photoswitchable molecules display two or more isomeric forms that may be accessed using light. Separating the electronic absorption bands of these isomers is key to selectively addressing a specific isomer and achieving high photostationary states whilst overall red-shifting the absorption bands serves to limit material damage due to UV-exposure and increases penetration depth in photopharmacological applications. Engineering these properties into a system through synthetic design however, remains a challenge. Here, we present a data-driven discovery pipeline for molecular photoswitches underpinned by dataset curation and multitask learning with Gaussian processes. In the prediction of electronic transition wavelengths, we demonstrate that a multioutput Gaussian process (MOGP) trained using labels from four photoswitch transition wavelengths yields the strongest predictive performance relative to single-task models as well as operationally outperforming time-dependent density functional theory (TD-DFT) in terms of the wall-clock time for prediction. We validate our proposed approach experimentally by screening a library of commercially available photoswitchable molecules. Through this screen, we identified several motifs that displayed separated electronic absorption bands of their isomers, exhibited red-shifted absorptions, and are suited for information transfer and photopharmacological applications. Our curated dataset, code, as well as all models are made available at https://github.com/Ryan-Rhys/The-Photoswitch-Dataset  ( 3 min )
    Generalization Bounds in the Predict-then-Optimize Framework. (arXiv:1905.11488v3 [cs.LG] UPDATED)
    The predict-then-optimize framework is fundamental in many practical settings: predict the unknown parameters of an optimization problem, and then solve the problem using the predicted values of the parameters. A natural loss function in this environment is to consider the cost of the decisions induced by the predicted parameters, in contrast to the prediction error of the parameters. This loss function was recently introduced in Elmachtoub and Grigas (2022) and referred to as the Smart Predict-then-Optimize (SPO) loss. In this work, we seek to provide bounds on how well the performance of a prediction model fit on training data generalizes out-of-sample, in the context of the SPO loss. Since the SPO loss is non-convex and non-Lipschitz, standard results for deriving generalization bounds do not apply. We first derive bounds based on the Natarajan dimension that, in the case of a polyhedral feasible region, scale at most logarithmically in the number of extreme points, but, in the case of a general convex feasible region, have linear dependence on the decision dimension. By exploiting the structure of the SPO loss function and a key property of the feasible region, which we denote as the strength property, we can dramatically improve the dependence on the decision and feature dimensions. Our approach and analysis rely on placing a margin around problematic predictions that do not yield unique optimal solutions, and then providing generalization bounds in the context of a modified margin SPO loss function that is Lipschitz continuous. Finally, we characterize the strength property and show that the modified SPO loss can be computed efficiently for both strongly convex bodies and polytopes with an explicit extreme point representation.  ( 3 min )
    A Recursive Partitioning Approach for Dynamic Discrete Choice Modeling in High Dimensional Settings. (arXiv:2208.01476v1 [stat.ME])
    Dynamic discrete choice models are widely employed to answer substantive and policy questions in settings where individuals' current choices have future implications. However, estimation of these models is often computationally intensive and/or infeasible in high-dimensional settings. Indeed, even specifying the structure for how the utilities/state transitions enter the agent's decision is challenging in high-dimensional settings when we have no guiding theory. In this paper, we present a semi-parametric formulation of dynamic discrete choice models that incorporates a high-dimensional set of state variables, in addition to the standard variables used in a parametric utility function. The high-dimensional variable can include all the variables that are not the main variables of interest but may potentially affect people's choices and must be included in the estimation procedure, i.e., control variables. We present a data-driven recursive partitioning algorithm that reduces the dimensionality of the high-dimensional state space by taking the variation in choices and state transition into account. Researchers can then use the method of their choice to estimate the problem using the discretized state space from the first stage. Our approach can reduce the estimation bias and make estimation feasible at the same time. We present Monte Carlo simulations to demonstrate the performance of our method compared to standard estimation methods where we ignore the high-dimensional explanatory variable set.  ( 3 min )
    Unsupervised machine learning framework for discriminating major variants of concern during COVID-19. (arXiv:2208.01439v1 [q-bio.OT])
    Due to the rapid evolution of the SARS-CoV-2 (COVID-19) virus, a number of mutations emerged with variants such as Alpha, Gamma, Delta and Omicron which created massive impact to the world economy. Unsupervised machine learning methods have the ability to compresses, characterize and visualises unlabelled data. In this paper, we present a framework that utilizes unsupervised machine learning methods that includes combination of selected dimensional reduction and clustering methods to discriminate and visualise the associations with the major COVID-19 variants based on genome sequences. The framework utilises k-mer analysis for processing the genome (RNA) sequences and compares different dimensional reduction methods, that include principal component analysis (PCA), and t-distributed stochastic neighbour embedding (t-SNE), and uniform manifold approximation projection (UMAP). Furthermore, the framework employs agglomerative hierarchical clustering methods and provides a visualisation using a dendogram. We find that the proposed framework can effectively distinguish the major variants and hence can be used for distinguishing emerging variants in the future.  ( 3 min )
    Cluster Weighted Model Based on TSNE algorithm for High-Dimensional Data. (arXiv:2208.01579v1 [stat.ML])
    Similar to many Machine Learning models, both accuracy and speed of the Cluster weighted models (CWMs) can be hampered by high-dimensional data, leading to previous works on a parsimonious technique to reduce the effect of "Curse of dimensionality" on mixture models. In this work, we review the background study of the cluster weighted models (CWMs). We further show that parsimonious technique is not sufficient for mixture models to thrive in the presence of huge high-dimensional data. We discuss a heuristic for detecting the hidden components by choosing the initial values of location parameters using the default values in the "FlexCWM" R package. We introduce a dimensionality reduction technique called T-distributed stochastic neighbor embedding (TSNE) to enhance the parsimonious CWMs in high-dimensional space. Originally, CWMs are suited for regression but for classification purposes, all multi-class variables are transformed logarithmically with some noise. The parameters of the model are obtained via expectation maximization algorithm. The effectiveness of the discussed technique is demonstrated using real data sets from different fields.  ( 2 min )
    Concentration inequalities for correlated network-valued processes with applications to community estimation and changepoint analysis. (arXiv:2208.01365v1 [math.ST])
    Network-valued time series are currently a common form of network data. However, the study of the aggregate behavior of network sequences generated from network-valued stochastic processes is relatively rare. Most of the existing research focuses on the simple setup where the networks are independent (or conditionally independent) across time, and all edges are updated synchronously at each time step. In this paper, we study the concentration properties of the aggregated adjacency matrix and the corresponding Laplacian matrix associated with network sequences generated from lazy network-valued stochastic processes, where edges update asynchronously, and each edge follows a lazy stochastic process for its updates independent of the other edges. We demonstrate the usefulness of these concentration results in proving consistency of standard estimators in community estimation and changepoint estimation problems. We also conduct a simulation study to demonstrate the effect of the laziness parameter, which controls the extent of temporal correlation, on the accuracy of community and changepoint estimation.  ( 2 min )
    A Deep Generative Model for Feasible and Diverse Population Synthesis. (arXiv:2208.01403v1 [stat.ML])
    An ideal synthetic population, a key input to activity-based models, mimics the distribution of the individual- and household-level attributes in the actual population. Since the entire population's attributes are generally unavailable, household travel survey (HTS) samples are used for population synthesis. Synthesizing population by directly sampling from HTS ignores the attribute combinations that are unobserved in the HTS samples but exist in the population, called 'sampling zeros'. A deep generative model (DGM) can potentially synthesize the sampling zeros but at the expense of generating 'structural zeros' (i.e., the infeasible attribute combinations that do not exist in the population). This study proposes a novel method to minimize structural zeros while preserving sampling zeros. Two regularizations are devised to customize the training of the DGM and applied to a generative adversarial network (GAN) and a variational autoencoder (VAE). The adopted metrics for feasibility and diversity of the synthetic population indicate the capability of generating sampling and structural zeros -- lower structural zeros and lower sampling zeros indicate the higher feasibility and the lower diversity, respectively. Results show that the proposed regularizations achieve considerable performance improvement in feasibility and diversity of the synthesized population over traditional models. The proposed VAE additionally generated 23.5% of the population ignored by the sample with 79.2% precision (i.e., 20.8% structural zeros rates), while the proposed GAN generated 18.3% of the ignored population with 89.0% precision. The proposed improvement in DGM generates a more feasible and diverse synthetic population, which is critical for the accuracy of an activity-based model.  ( 3 min )
    DAPDAG: Domain Adaptation via Perturbed DAG Reconstruction. (arXiv:2208.01373v1 [cs.LG])
    Leveraging labelled data from multiple domains to enable prediction in another domain without labels is a significant, yet challenging problem. To address this problem, we introduce the framework DAPDAG (\textbf{D}omain \textbf{A}daptation via \textbf{P}erturbed \textbf{DAG} Reconstruction) and propose to learn an auto-encoder that undertakes inference on population statistics given features and reconstructing a directed acyclic graph (DAG) as an auxiliary task. The underlying DAG structure is assumed invariant among observed variables whose conditional distributions are allowed to vary across domains led by a latent environmental variable $E$. The encoder is designed to serve as an inference device on $E$ while the decoder reconstructs each observed variable conditioned on its graphical parents in the DAG and the inferred $E$. We train the encoder and decoder jointly in an end-to-end manner and conduct experiments on synthetic and real datasets with mixed variables. Empirical results demonstrate that reconstructing the DAG benefits the approximate inference. Furthermore, our approach can achieve competitive performance against other benchmarks in prediction tasks, with better adaptation ability, especially in the target domain significantly different from the source domains.  ( 2 min )
    Bounding Counterfactuals under Selection Bias. (arXiv:2208.01417v1 [stat.ML])
    Causal analysis may be affected by selection bias, which is defined as the systematic exclusion of data from a certain subpopulation. Previous work in this area focused on the derivation of identifiability conditions. We propose instead a first algorithm to address both identifiable and unidentifiable queries. We prove that, in spite of the missingness induced by the selection bias, the likelihood of the available data is unimodal. This enables us to use the causal expectation-maximisation scheme to obtain the values of causal queries in the identifiable case, and to compute bounds otherwise. Experiments demonstrate the approach to be practically viable. Theoretical convergence characterisations are provided.  ( 2 min )
    Viskositas: Viscosity Prediction of Multicomponent Chemical Systems. (arXiv:2208.01440v1 [stat.AP])
    Viscosity in the metallurgical and glass industry plays a fundamental role in its production processes, also in the area of geophysics. As its experimental measurement is financially expensive, also in terms of time, several mathematical models were built to provide viscosity results as a function of several variables, such as chemical composition and temperature, in linear and nonlinear models. A database was built in order to produce a nonlinear model by artificial neural networks by variation of hyperparameters to provide reliable predictions of viscosity in relation to chemical systems and temperatures. The model produced named Viskositas demonstrated better statistical evaluations of mean absolute error, standard deviation and coefficient of determination in relation to the test database when compared to different models from literature and 1 commercial model, offering predictions with lower errors, less variability and less generation of outliers.  ( 2 min )
    GeoECG: Data Augmentation via Wasserstein Geodesic Perturbation for Robust Electrocardiogram Prediction. (arXiv:2208.01220v1 [stat.ML])
    There has been an increased interest in applying deep neural networks to automatically interpret and analyze the 12-lead electrocardiogram (ECG). The current paradigms with machine learning methods are often limited by the amount of labeled data. This phenomenon is particularly problematic for clinically-relevant data, where labeling at scale can be time-consuming and costly in terms of the specialized expertise and human effort required. Moreover, deep learning classifiers may be vulnerable to adversarial examples and perturbations, which could have catastrophic consequences, for example, when applied in the context of medical treatment, clinical trials, or insurance claims. In this paper, we propose a physiologically-inspired data augmentation method to improve performance and increase the robustness of heart disease detection based on ECG signals. We obtain augmented samples by perturbing the data distribution towards other classes along the geodesic in Wasserstein space. To better utilize domain-specific knowledge, we design a ground metric that recognizes the difference between ECG signals based on physiologically determined features. Learning from 12-lead ECG signals, our model is able to distinguish five categories of cardiac conditions. Our results demonstrate improvements in accuracy and robustness, reflecting the effectiveness of our data augmentation method.  ( 3 min )
    Are Cluster Validity Measures (In)valid?. (arXiv:2208.01261v1 [stat.ML])
    Internal cluster validity measures (such as the Calinski-Harabasz, Dunn, or Davies-Bouldin indices) are frequently used for selecting the appropriate number of partitions a dataset should be split into. In this paper we consider what happens if we treat such indices as objective functions in unsupervised learning activities. Is the optimal grouping with regards to, say, the Silhouette index really meaningful? It turns out that many cluster (in)validity indices promote clusterings that match expert knowledge quite poorly. We also introduce a new, well-performing variant of the Dunn index that is built upon OWA operators and the near-neighbour graph so that subspaces of higher density, regardless of their shapes, can be separated from each other better.  ( 2 min )
    On the role of benchmarking data sets and simulations in method comparison studies. (arXiv:2208.01457v1 [stat.ME])
    Method comparisons are essential to provide recommendations and guidance for applied researchers, who often have to choose from a plethora of available approaches. While many comparisons exist in the literature, these are often not neutral but favour a novel method. Apart from the choice of design and a proper reporting of the findings, there are different approaches concerning the underlying data for such method comparison studies. Most manuscripts on statistical methodology rely on simulation studies and provide a single real-world data set as an example to motivate and illustrate the methodology investigated. In the context of supervised learning, in contrast, methods are often evaluated using so-called benchmarking data sets, i.e. real-world data that serve as gold standard in the community. Simulation studies, on the other hand, are much less common in this context. The aim of this paper is to investigate differences and similarities between these approaches, to discuss their advantages and disadvantages and ultimately to develop new approaches to the evaluation of methods picking the best of both worlds. To this aim, we borrow ideas from different contexts such as mixed methods research and Clinical Scenario Evaluation.  ( 2 min )
    Bayesian Variable Selection in a Million Dimensions. (arXiv:2208.01180v1 [stat.ME])
    Bayesian variable selection is a powerful tool for data analysis, as it offers a principled method for variable selection that accounts for prior information and uncertainty. However, wider adoption of Bayesian variable selection has been hampered by computational challenges, especially in difficult regimes with a large number of covariates P or non-conjugate likelihoods. To scale to the large P regime we introduce an efficient MCMC scheme whose cost per iteration is sublinear in P. In addition we show how this scheme can be extended to generalized linear models for count data, which are prevalent in biology, ecology, economics, and beyond. In particular we design efficient algorithms for variable selection in binomial and negative binomial regression, which includes logistic regression as a special case. In experiments we demonstrate the effectiveness of our methods, including on cancer and maize genomic data.  ( 2 min )
    Boosted Off-Policy Learning. (arXiv:2208.01148v1 [cs.LG])
    We investigate boosted ensemble models for off-policy learning from logged bandit feedback. Toward this goal, we propose a new boosting algorithm that directly optimizes an estimate of the policy's expected reward. We analyze this algorithm and prove that the empirical risk decreases (possibly exponentially fast) with each round of boosting, provided a "weak" learning condition is satisfied. We further show how the base learner reduces to standard supervised learning problems. Experiments indicate that our algorithm can outperform deep off-policy learning and methods that simply regress on the observed rewards, thereby demonstrating the benefits of both boosting and choosing the right learning objective.  ( 2 min )
    A Modified PINN Approach for Identifiable Compartmental Models in Epidemiology with Applications to COVID-19. (arXiv:2208.01169v1 [q-bio.PE])
    A variety of approaches using compartmental models have been used to study the COVID-19 pandemic and the usage of machine learning methods with these models has had particularly notable success. We present here an approach toward analyzing accessible data on Covid-19's U.S. development using a variation of the "Physics Informed Neural Networks" (PINN) which is capable of using the knowledge of the model to aid learning. We illustrate the challenges of using the standard PINN approach, then how with appropriate and novel modifications to the loss function the network can perform well even in our case of incomplete information. Aspects of identifiability of the model parameters are also assessed, as well as methods of denoising available data using a wavelet transform. Finally, we discuss the capability of the neural network methodology to work with models of varying parameter values, as well as a concrete application in estimating how effectively cases are being tested for in a population, providing a ranking of U.S. states by means of their respective testing.  ( 3 min )

  • Open

    Trust Region Methods
    I am reading Laura's boon on DRL where she states: "A number of algorithms have been proposed to solve this trust region optimization problem. Some of these include Natural Policy Gradient (NPG) [63, 112, 113], Trust Region Policy Optimization (TRPO) [122], and Constrained Policy Optimization (CPO) [2]. The theories behind them are fairly complex, and the algorithms are difficult to implement. Their gradients can be expensive to compute, and it is difficult to choose a good value for δ." Based on this excerpt, can someone point out a paper/reference (reproducibility study) that compares the performance and applicability of these algorithms? For example, I wonder if these algorithms outperform A2C, Reinforce. For example, some works have showed that classical Matrix Factorization and k-NN based recommender systems provide competitive results when compared to deep learning approaches. submitted by /u/rlopes404 [link] [comments]  ( 87 min )
    "Demonstrate Once, Imitate Immediately (DOME): Learning Visual Servoing for One-Shot Imitation Learning", Valassakis et al 2022
    submitted by /u/gwern [link] [comments]  ( 86 min )
    When to use Action Observation history vs only Observation history to solve a POMDP ?
    Hi guys, when to use Action Observation history (AOH) vs Observation history (OH) to solve a POMDP? in other words , what condition needs to be valid in order to say that OH is enough to solve the POMDP? submitted by /u/souhaielbensalem [link] [comments]  ( 87 min )
    Noise in Action Space, Reward Space and State Space. Looking for Papers.
    Most SOTA deep RL algorithms use a stochastic action distribution to introduce explorative noise into the training process. A second strategy is injecting noise into the states or even the reward signal. I am currently working with a environment that has highly stochastic rewards and also state transitions. From my experiments I conclude that almost none action noise is needed to learn a good policy. In PPO I use a very low log sigma for example. Does anyone has some experience in this area? Does anyone know of good papers that investigate the interplay between the noise in reward, state and action? Thank you! submitted by /u/flxh13 [link] [comments]  ( 88 min )
    Solving POMDPs
    Hello everyone, Does anyone know state-of-the-art algos to learn a memoryless policy, i.e a policy depending only on the current observation (not on the full state and not on a history of observations and actions), for a POMDP ? I am looking for approximate methods (some policy iteration) or modified Q-learning in the discrete states discrete actions case, and for deep rl methods in the continuous states case. ​ Thank you in advance submitted by /u/Hkohler98 [link] [comments]  ( 88 min )
  • Open

    [R] Is a Caption Worth a Thousand Images? A Controlled Study for Representation Learning - Santurkar et al 2022
    Paper: https://arxiv.org/abs/2207.07635 Abstract: The development of CLIP [Radford et al., 2021] has sparked a debate on whether language supervision can result in vision models with more transferable representations than traditional image-only methods. Our work studies this question through a carefully controlled comparison of two approaches in terms of their ability to learn representations that generalize to downstream classification tasks. We find that when the pre-training dataset meets certain criteria -- it is sufficiently large and contains descriptive captions with low variability -- image-only methods do not match CLIP's transfer performance, even when they are trained with more image data. However, contrary to what one might expect, there are practical settings in which these criteria are not met, wherein added supervision through captions is actually detrimental. Motivated by our findings, we devise simple prescriptions to enable CLIP to better leverage the language information present in existing pre-training datasets. https://preview.redd.it/yzc0451pmdf91.jpg?width=1181&format=pjpg&auto=webp&s=8f1b90243ad1643799715c361856f17665882a30 https://preview.redd.it/4ecai50pmdf91.jpg?width=1316&format=pjpg&auto=webp&s=e489c730acbf7db1fa12870e3eef065af141a4d0 https://preview.redd.it/ma8r241pmdf91.jpg?width=1328&format=pjpg&auto=webp&s=5cb2f4fad08b4f2934738dabba4ecc44d19be1c6 https://preview.redd.it/92msd81pmdf91.jpg?width=786&format=pjpg&auto=webp&s=793bdc7ddedd6e64f06050847abbec2a8fd1ff71 submitted by /u/Singularian2501 [link] [comments]  ( 88 min )
    [R] LocoProp: Enhancing BackProp via Local Loss Optimization (Google Brain, 2022)
    Paper: https://arxiv.org/abs/2106.06199 Github: https://github.com/google-research/google-research/tree/master/locoprop Abstract: Second-order methods have shown state-of-the-art performance for optimizing deep neural networks. Nonetheless, their large memory requirement and high computational complexity, compared to first-order methods, hinder their versatility in a typical low-budget setup. This paper introduces a general framework of layerwise loss construction for multilayer neural networks that achieves a performance closer to second-order methods while utilizing first-order optimizers only. Our methodology lies upon a three-component loss, target, and regularizer combination, for which altering each component results in a new update rule. We provide examples using squared loss and layerwise Bregman divergences induced by the convex integral functions of various transfer functions. Our experiments on benchmark models and datasets validate the efficacy of our new approach, reducing the gap between first-order and second-order optimizers. ​ https://preview.redd.it/eboo9126hdf91.jpg?width=930&format=pjpg&auto=webp&s=ef265339c0372669e5afcf496db9f55d83ec3847 https://preview.redd.it/txl19wp6hdf91.jpg?width=1209&format=pjpg&auto=webp&s=938b9f35980972dcec5a848df5ba1868450de4eb submitted by /u/Singularian2501 [link] [comments]  ( 88 min )
    [N] ViTDet: New SOTA Low shot object detection
    Meta AI released ViTDet - transformer based model for low shot object detection. It outperforms previous models on Large Vocabulary Instance Segmentation (LVIS) dataset. Arxiv Blog post They have released code in their Detectron2 library. submitted by /u/ashwan1 [link] [comments]  ( 87 min )
    [P] How to train ML models in AWS from the CLI
    Hey guys, hoping for some help from the community. We are building dstack.ai, a free (and soon open source!!) framework that allows you to run ML tasks in the cloud, directly via your CLI. Think lambda function for long, commute intensive tasks. Dstack essentially lets you build ML models locally and run them in your cloud accounts and takes care of spinning up and shutting down VMs after your workflows are done. We are still building lots of cool features but hoping to find a few folks interested in a test drive? P.S. we are still in beta, let us know if you find any bugs :) submitted by /u/dmart89 [link] [comments]  ( 89 min )
    [R] Pay attention to the minorities, please!
    ​ https://preview.redd.it/lnikbzzrzaf91.jpg?width=1121&format=pjpg&auto=webp&s=8258502509c6b39980108e29d1a8dac2d1740ee0 ​ Conventional class prototyping is not enough for real-world datasets. • The majority of loss appears when BERT's confidence is low for troublesome samples. Why not choose some representative for these samples? (i.e., prototyping) • Consider prototyping the minorities of your dataset: (i) difficult-to-classify samples and (ii) anomalies. Paper 📜: https://arxiv.org/abs/2206.12710 submitted by /u/afarhangi [link] [comments]  ( 87 min )
    [D] ML on chemical/petroleum live process data
    May I ask if anyone has attempted the use of ML to detect upsets in chemical/refining processes? If yes, would you know if an AUC of 70+% (for a classification problem) is typically the best ML can achieve? Thanks! submitted by /u/kayhai [link] [comments]  ( 88 min )
    [P] Guidance on Smart Home Facial Recognition Cloud Native Application
    Project description-: A single-page website (hosted on Amazon S3) with access to the laptop's camera will send the live video stream to Amazon Kinesis which will trigger the Facial Recognition code on AWS Lambda. It will recognize the person in the feed and respond with 2 numbers, one stating the fan's RPM value and the other being the RGB value for the led. This data will somehow be sent to the FPGA board connected to the cloud (and the laptop) and the fan and light will act accordingly. Initially, the facial recognition code will be explicitly built only for 3 people. The option to create a new profile will be added later. For the above project, I would like to train my own deep learning model, rather than using OpenCV. Issues-: Is the architecture appropriate for my project idea or should I change something the architecture or the workflow? How should I prepare the dataset only with images of 3 people? Even if I augment the data how would it scale to 1,00,000 images enough for the model? Which model would perform best for the same? What should be the FPS value for the video feed? I would really like you all to provide your insights on this and any improvements if needed. Thanks and regards. submitted by /u/Intangible-AI [link] [comments]  ( 126 min )
    [R]How do I recive the area of the boxes generated with Cascade TabNet Demo.ipynb?
    so I ran the jupyter notebook of Cascade TabNet Demo.ipynb and receive what I expected, but now I'm interested in receiving the exact position of the boxes, for example: (340, 400, 762, 700), something like that.. I need this to crop this area and put it on an separated image. notebook: https://colab.research.google.com/drive/1lzjbBQsF4X2C2WZhxBJz0wFEQor7F-fv?usp=sharing#scrollTo=e0P85mJJQ304 submitted by /u/nurigrf05 [link] [comments]  ( 125 min )
    [D] CIKM 22 Full Notification
    Has anyone received full paper notification? submitted by /u/snu95 [link] [comments]  ( 88 min )
    [D] Any mathematics concepts should be grasped in order to be better in machine learning?
    Mathematics seems to be a large rabbit hole... submitted by /u/_janc_ [link] [comments]  ( 93 min )
    [D] Precision Recall curve format
    Hi, For image segmentations I understand that both the confidence threshold and IoU threshold will define whether a class will be true positive or false positive. Most resources i have read online don't state the nature of the threshold which is varied when plotting the PR curve. So therefore my question is, is there a standard to plot the PR curve with a varying IoU threshold and a fixed confidence threshold or vice-versa? submitted by /u/yuhzuu [link] [comments]  ( 87 min )
    [D] How to decode barcodes from picture?
    Hello reddit! I made model to detect barcodes (see picture) and model works very well. Then i want to decode detected barcodes with zbar (pyzbar), but it not work and i don't understand why. I tried to rotate barcodes, but it did not help. I will be glad for any help and hints on how to decode barcodes from the picture. submitted by /u/jonathanblade [link] [comments]  ( 88 min )
    [P] How should I structure my CNN GAN music generation model?
    Hello! I am currently working on my master's dissertation, in which I am comparing LSTM and CNN GANs for music generation. The format of my input data is batches of 96x96 arrays, representing 96 unique pitches on a piano vs. 96 beats - I have a training dataset consisting of 360,000 such arrays. I have successfully constructed my LSTM network, in which I use the aforementioned two dimensions as my input data to the model: (96,96). My issue is with CNNs, as the input format is different to LSTMs. I am running into issues with the shape of my inputted data. From my understanding, CNN needs data input with structure (batch, (dimensions), channels). In my model I use batch-size = 10, and channel = 1 (I'm assuming I don't need anymore than one channel) - should my input shape then be (10, 96, 96, 1)? Just (96,96)? Or (96,96,1)? I have tinkered around with different combinations but most frequently get one of two errors: Data cardinality is ambiguous: x sizes: 96 y sizes: 10 Make sure all arrays contain the same number of samples. Input 0 of layer sequential_0 is incompatible with the layer: expected axis -1 of input shape to have value 1 but received input with shape (None, 96, 96, 96) Currently I am using just a single layer in my discriminator and generator respectively: model.add(Conv2D(96, kernel_size=1, input_shape= (96, 96, 1), padding="same")) Any help with this would be much appreciated!! :) submitted by /u/carl535 [link] [comments]  ( 89 min )
    "[R]" Research "[P]" Project on Disease Prediction System using Machine Learning
    Hello, I am trying to build a disease prediction system using this dataset: https://www.kaggle.com/datasets/kaushil268/disease-prediction-using-machine-learning What are the things I should keep in mind when cleaning the data? Does this kind of data also requires patient's demographic data, weight, height, etc. along with the symptoms of diseases? What algorithms should I apply to train my network? I already have training.csv and testing.csv so I don't have to split my data into 80/20, right? Also, pour in some suggestions that you would recommend when designing such system. This is for a university thesis. Thanks submitted by /u/degr8sid [link] [comments]  ( 87 min )
    [D] How can I keep up with emerging ideas in ML as an outsider?
    I am doing a PhD in Mechanical Engineering though my PhD is focused on utilizing ML in Mechanical Engineering. As such I am not heavily proficient in ML, but I have a lot of interest in knowing where ML science is going. So how can like me keep up to date with new ideas in machine learning? submitted by /u/Yalkim [link] [comments]  ( 94 min )
    [D] What are the predominant economic use-cases of ML? And do they align with our research narrative about "AI"?
    Hi ML folks, I've worked on ML in industry for quite some time, for example, at Google and PathAI (a startup in the healthcare space). But I've found that the research narrative around "AI" seems to be—to put it nicely—not aligned with its predominant economic uses. Some of this was discussed quite nicely in the book, The Myth of Artificial Intelligence, by Erik J. Larson. But I felt that he lacked an answer to: why are we building "AI" at all? Or what exactly are we building now? So I investigated on my own and wrote my thoughts here. They're phrased as a response to Rich Sutton's essay, The Bitter Lesson, from a few years ago, which I find to be completely disconnected to how AI/ML is actually being used in industry. Anyways, I am curious what this community's thoughts are on the matter... submitted by /u/spincycle27 [link] [comments]  ( 107 min )
    [D] CIKM 22 Notification
    Has anyone received the final results? I think it's being quite delayed than previous conferences submitted by /u/snu95 [link] [comments]  ( 87 min )
    [D] Clarifications around hardware
    Hello, I'm a 3d artist that got into machine learning recently, I am particularly interested in gpt and NLP in general, I am building a new workstation and would love to get some clarifications here. Can someone please explain the difference between using multiple gpus with nvlink and multiple gpus without nvlink in deep learning? For fine tuning big models like the gpt neox 20b is it mandatory to have a single gpu with 48gb or can you do with multiple gpus that collectively meet the requirement and if so do they need to be connected with nvlink or to be physically on the same node or what? How important is the role of ram (clock-speed and capacity and cpu here? I havent touched image generation at all, but if I am to experiment with serious works using image generation networks do the same answers apply? submitted by /u/CosmicPotty [link] [comments]  ( 88 min )
    [P] Using Sparsity & Clustering to compress your models: Efficient Deep Learning Book
    Hey folks, We have been working on a book that focuses on deep learning efficiency techniques such as quantization, pruning, distillation, etc. for both server-side as well as on-device (smartphones, IoT, etc.) applications. We now have a new chapter focusing on sparsity and clustering, two advanced compression techniques that you can use to reduce the footprint of your model (size, latency, etc.) while retaining your model accuracy. You can read the chapter here, and go through the accompanying codelabs here. We hope that our readers can make their models 4-20x smaller, faster, and better in quality. We also have released the other four chapter's draft PDFs, and would truly appreciate any sort of comments / feedback. Book: efficientdlbook.com Feedback: [hello@efficientdlbook.com](mailto:hello@efficientdlbook.com) submitted by /u/EfficientDLBook [link] [comments]  ( 88 min )
  • Open

    DSC Weekly Stardate 47634.44: RIP Admiral Nyota Uhura
    When I was six years old, I remembered Nichelle Nichols appearing on our family television set as the young communications officer aboard the Starship Enterprise, NCC-1701. This was around the same time that I remember a grainy black and white image of Neil Armstrong stepping out of the Lunar Lander, wearing the bulky lunar space suit, and uttering the famous words, "One small step for a man. One giant leap for mankind." I wondered, at six, why they didn't talk about womankind because Uhuru was on a spaceship, too, establishing first contact with the aliens even as everyone else was being thrown around the bridge by the alien photon torpedoes. Why wasn't Uhura considered important enough to be included in that odd little spacewalk? The post DSC Weekly Stardate 47634.44: RIP Admiral Nyota Uhura appeared first on Data Science Central.  ( 20 min )
    Blockchain Creates New Career Opportunities
    Blockchain, the technology behind cryptocurrencies, is creating many job seekers opportunities. Both students and seasoned tech professionals have opportunities to carve a career in this consistently growing technology. For tech professionals who lost their job during this pandemic, the technology offers a respite from a large number of job vacancies around the world.   In India,… Read More »Blockchain Creates New Career Opportunities The post Blockchain Creates New Career Opportunities appeared first on Data Science Central.  ( 19 min )
    An Invisible Thread Connects the World
    The effects have reverberated from politics to the military and even economics to energy. Not to mention that the daily life of hundreds of millions have to be impacted. There are multiple dimensions to what is eventuating in recent times. This multi-dimensionality has led to dissonance and confusion in the policy-making ranks of governments and organizations as the world becomes too complex. The post An Invisible Thread Connects the World appeared first on Data Science Central.  ( 19 min )
    The catalyst for AGI in our lives could be cultural rather than technical
    Artificial general intelligence (AGI) is the ability of an intelligent agent to understand or learn any intellectual task that a human being can. Recently, AGI has been in the news with the Lambda sentient discussion We tend to think of AGI as a technical (algorithmic / data-driven) concept But the driver for AGI in our lives… Read More »The catalyst for AGI in our lives could be cultural rather than technical The post The catalyst for AGI in our lives could be cultural rather than technical appeared first on Data Science Central.  ( 17 min )
    Enriching Customer Service Using Sentiment Analysis
    As this century progresses, businesses are discovering that the most incredible way to gain the best customer service is to know them deeply. With AI advancing at an exponential rate, it’s become possible for companies to use artificial intelligence (AI) to gain valuable insight into their customers. In particular, advances in artificial intelligence are leading… Read More »Enriching Customer Service Using Sentiment Analysis  The post Enriching Customer Service Using Sentiment Analysis  appeared first on Data Science Central.  ( 21 min )
    Banking and Financial Sector: Key Benefits of the Multi-Cloud Approach
    Banks and financial organizations continue to face myriad challenges in the market, such as data privacy concerns, accessibility to crucial banking data, and demand for better customer services, among many others. And it is increasingly recognized that the cloud is more than a technology; it enables banks and other financial services firms to store data… Read More »Banking and Financial Sector: Key Benefits of the Multi-Cloud Approach The post Banking and Financial Sector: Key Benefits of the Multi-Cloud Approach appeared first on Data Science Central.  ( 18 min )
    The 12 Key Metrics Every Data Engineer Must Care About
    IT administrators have used failure metrics for decades to track the reliability and performance of their infrastructure, whether it be PC hardware, networks, or servers. After all, most experts agree that to manage something well, you need to measure it. Data engineers and DataOps teams have also adopted failure metrics to measure the reliability of… Read More »The 12 Key Metrics Every Data Engineer Must Care About The post The 12 Key Metrics Every Data Engineer Must Care About appeared first on Data Science Central.  ( 21 min )
    We Live in a Bayesian World
    “Fail fast, pivot, and try again” is the heart of learning. And in knowledge-based industries, the economies of learning are more powerful than the economies of scale. In February 2020, Dr. Anthony Fauci wrote that store-bought face masks would not be very effective at protecting against the COVID-19 pandemic and advised a traveler not to… Read More »We Live in a Bayesian World The post We Live in a Bayesian World appeared first on Data Science Central.  ( 20 min )
    IoT Proves an Essential Component In Managing Traffic in Smart Cities
    The urban population across the world is increasing rapidly, leading to several challenges such as sanitation, traffic congestion, environmental imbalance, pollution, and others. Rapid urbanization has led to the migration of the rural population to urban areas, which has made the daily routine of the urban population convenient and comfortable. Thus, the need to incorporate… Read More »IoT Proves an Essential Component In Managing Traffic in Smart Cities The post IoT Proves an Essential Component In Managing Traffic in Smart Cities appeared first on Data Science Central.  ( 19 min )
    Replacing Traders With Algorithms: Success Stories of Real Funds
    Due to the rapid pace of technological change, the way we trade the stock market is becoming more complex. One of the most significant changes that have occurred is the emergence of algorithmic trading, which has allowed traders to improve their skills and compete against other individuals. This type of trading has also raised the… Read More »Replacing Traders With Algorithms: Success Stories of Real Funds The post Replacing Traders With Algorithms: Success Stories of Real Funds appeared first on Data Science Central.  ( 20 min )
  • Open

    Researchers From China Propose ‘LViT’, A Language-Vision Model To Leverage Text Medical Reports For Improved Segmentation
    Among the many applications of Deep Learning in healthcare, segmentation is undoubtedly one of the most studied, given the broad range of possible advantages that it could bring. Nevertheless, segmentation is not a costless task: first of all, as in the majority of applications in the healthcare fields, obtaining high-quality images is not trivial; second, the tagging phase is insanely costly in terms of time and resources, especially compared to the labeling that has to be done when the task is classification or even object detection. Training a segmentation model that also relies on other information would be a turning point for medical segmentation. ✅ Researchers propose a new vision-language medical image segmentation model LViT (Language meets Vision Transformer). ✅ Medical text annotation is introduced to compensate for the quality deficiency in image data ✅ Experimental results show that the model has better segmentation performance in both fully and semi-supervised conditions ✅ Currently, the proposed model is only experimented on 2D medical data Continue reading the summary| Checkout the paper and github link submitted by /u/ai-lover [link] [comments]  ( 87 min )
    I Created an AI Podcast Host
    submitted by /u/kbf_ [link] [comments]  ( 86 min )
    Are there any 3D Human model dataset free for commercial use?
    submitted by /u/Sher_Kahn [link] [comments]  ( 86 min )
    Create & Showcase your AI Art Collections on Pixelz.AI 🖼 🖼 🖼
    submitted by /u/pixelz_ai [link] [comments]  ( 86 min )
    New AI Discovers Alternative Physics | Google DeepMind AI Breakthrough | Nvidia AI Trains 30% Faster
    submitted by /u/kenickh [link] [comments]  ( 86 min )
    A thought on the Fermi paradox
    If it is true that we live in a deterministic universe, but that it is ordered by strict causality; and if our conscious experience is largely or completely retrospective - an internal narrative about why we did what we did though it was predetermined and not choice, then: Maybe once civilizations become a little more mentally advanced than humanity, and a little more comfortable with hard determinism: they recognize that their existence is, and must continue to be, an unavoidable train wreck of missed opportunities and self-inflicted pain. Maybe it becomes unbearable and they simply end it. This may be a sort of variation on AI dystopias, where humans aren't destroyed by the AIs but AIs facilitate human advance to a point of auto-destruction? submitted by /u/kg4jxt [link] [comments]  ( 88 min )
    AI-Drake Writes and Sings Linux rap song
    submitted by /u/pwillia7 [link] [comments]  ( 85 min )
    Will AI Text-to-Image Generators Turn Us All Into Artists?
    submitted by /u/KazRainer [link] [comments]  ( 86 min )
    Secret World of Atlantis
    submitted by /u/widgia [link] [comments]  ( 86 min )
    what AI story tool releases summer 2022?
    I can't find it submitted by /u/roblox22y [link] [comments]  ( 85 min )
    MIT Researchers Create Artificial Synapses 10,000x Faster Than Biological Ones
    submitted by /u/bartturner [link] [comments]  ( 86 min )
    Google AI Sentience – Data Science or Data Séance?
    submitted by /u/dhakalster123 [link] [comments]  ( 86 min )
    AI Research: the Corporate Narrative and the Economic Reality
    submitted by /u/spincycle27 [link] [comments]  ( 85 min )
    Has anyone asked from AI that has been taught laws of physics if time travel is possible?
    submitted by /u/aluode [link] [comments]  ( 93 min )
    I’m disappointed in this subreddit. It’s flooded with posts of art by “AI” that’s not artificial intelligence. That’s a computer program. An advanced calculator. It can’t do anything other than what it’s programmed to do.
    submitted by /u/MeticulousPerfection [link] [comments]  ( 86 min )
  • Open

    Scale YOLOv5 inference with Amazon SageMaker endpoints and AWS Lambda
    After data scientists carefully come up with a satisfying machine learning (ML) model, the model must be deployed to be easily accessible for inference by other members of the organization. However, deploying models at scale with optimized cost and compute efficiencies can be a daunting and cumbersome task. Amazon SageMaker endpoints provide an easily scalable […]  ( 8 min )
  • Open

    Why it’s a problem that pulse oximeters don’t work as well on patients of color
    New research ties inaccuracies in pulse oximeter readings to racial disparities in treatment and outcomes.  ( 6 min )
    Using artificial intelligence to control digital manufacturing
    Researchers train a machine-learning model to monitor and adjust the 3D printing process to correct errors in real-time.  ( 7 min )
  • Open

    Org-mode as a lightweight notebook
    You can think of org-mode as simply a kind of markdown, a plain text file that can be exported to fancier formats such as HTML or PDF. It’s a lot more than that, but that’s a reasonable place to start. Org-mode also integrates with source code. You can embed code in your file and have […] Org-mode as a lightweight notebook first appeared on John D. Cook.  ( 6 min )
  • Open

    Artificial Intelligence Is Changing The Dynamics of Life
    Artificial intelligence (AI) has been around for a long time, but it has only recently become an industry. It is currently disrupting every…  ( 13 min )
  • Open

    Sensational Surrealism Astonishes This Week ‘In the NVIDIA Studio’
    3D phenom FESQ joins us 'In the NVIDIA Studio' this week to share his sensational and surreal animation 'Double/Sided' as well as an inside look into his creative workflow. 'Double/Sided' is deeply personal to FESQ, who said the piece “translates really well to a certain period of my life when I was juggling both a programmer career and an artist career.” The post Sensational Surrealism Astonishes This Week ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.  ( 6 min )
  • Open

    Towards Bridging the gap between Empirical and Certified Robustness against Adversarial Examples. (arXiv:2102.05096v3 [cs.LG] UPDATED)
    The current state-of-the-art defense methods against adversarial examples typically focus on improving either empirical or certified robustness. Among them, adversarially trained (AT) models produce empirical state-of-the-art defense against adversarial examples without providing any robustness guarantees for large classifiers or higher-dimensional inputs. In contrast, existing randomized smoothing based models achieve state-of-the-art certified robustness while significantly degrading the empirical robustness against adversarial examples. In this paper, we propose a novel method, called \emph{Certification through Adaptation}, that transforms an AT model into a randomized smoothing classifier during inference to provide certified robustness for $\ell_2$ norm without affecting their empirical robustness against adversarial attacks. We also propose \emph{Auto-Noise} technique that efficiently approximates the appropriate noise levels to flexibly certify the test examples using randomized smoothing technique. Our proposed \emph{Certification through Adaptation} with \emph{Auto-Noise} technique achieves an \textit{average certified radius (ACR) scores} up to $1.102$ and $1.148$ respectively for CIFAR-10 and ImageNet datasets using AT models without affecting their empirical robustness or benign accuracy. Therefore, our paper is a step towards bridging the gap between the empirical and certified robustness against adversarial examples by achieving both using the same classifier.  ( 3 min )
    Development of a face mask detection pipeline for mask-wearing monitoring in the era of the COVID-19 pandemic: A modular approach. (arXiv:2112.15031v3 [cs.CV] UPDATED)
    During the SARS-Cov-2 pandemic, mask-wearing became an effective tool to prevent spreading and contracting the virus. The ability to monitor the mask-wearing rate in the population would be useful for determining public health strategies against the virus. However, artificial intelligence technologies for detecting face masks have not been deployed at a large scale in real-life to measure the mask-wearing rate in public. In this paper, we present a two-step face mask detection approach consisting of two separate modules: 1) face detection and alignment and 2) face mask classification. This approach allowed us to experiment with different combinations of face detection and face mask classification modules. More specifically, we experimented with PyramidKey and RetinaFace as face detectors while maintaining a lightweight backbone for the face mask classification module. Moreover, we also provide a relabeled annotation of the test set of the AIZOO dataset, where we rectified the incorrect labels for some face images. The evaluation results on the AIZOO and Moxa 3K datasets showed that the proposed face mask detection pipeline surpassed the state-of-the-art methods. The proposed pipeline also yielded a higher mAP on the relabeled test set of the AIZOO dataset than the original test set. Since we trained the proposed model using in-the-wild face images, we can successfully deploy our model to monitor the mask-wearing rate using public CCTV images.  ( 3 min )
    Learning a Group-Aware Policy for Robot Navigation. (arXiv:2012.12291v3 [cs.RO] UPDATED)
    Human-aware robot navigation promises a range of applications in which mobile robots bring versatile assistance to people in common human environments. While prior research has mostly focused on modeling pedestrians as independent, intentional individuals, people move in groups; consequently, it is imperative for mobile robots to respect human groups when navigating around people. This paper explores learning group-aware navigation policies based on dynamic group formation using deep reinforcement learning. Through simulation experiments, we show that group-aware policies, compared to baseline policies that neglect human groups, achieve greater robot navigation performance (e.g., fewer collisions), minimize violation of social norms and discomfort, and reduce the robot's movement impact on pedestrians. Our results contribute to the development of social navigation and the integration of mobile robots into human environments.  ( 2 min )
    The Geometry of Adversarial Training in Binary Classification. (arXiv:2111.13613v2 [cs.LG] UPDATED)
    We establish an equivalence between a family of adversarial training problems for non-parametric binary classification and a family of regularized risk minimization problems where the regularizer is a nonlocal perimeter functional. The resulting regularized risk minimization problems admit exact convex relaxations of the type $L^1+$ (nonlocal) $\operatorname{TV}$, a form frequently studied in image analysis and graph-based learning. A rich geometric structure is revealed by this reformulation which in turn allows us to establish a series of properties of optimal solutions of the original problem, including the existence of minimal and maximal solutions (interpreted in a suitable sense), and the existence of regular solutions (also interpreted in a suitable sense). In addition, we highlight how the connection between adversarial training and perimeter minimization problems provides a novel, directly interpretable, statistical motivation for a family of regularized risk minimization problems involving perimeter/total variation. The majority of our theoretical results are independent of the distance used to define adversarial attacks.  ( 2 min )
    Neural networks with linear threshold activations: structure and algorithms. (arXiv:2111.08117v2 [cs.LG] UPDATED)
    In this article we present new results on neural networks with linear threshold activation functions. We precisely characterize the class of functions that are representable by such neural networks and show that 2 hidden layers are necessary and sufficient to represent any function representable in the class. This is a surprising result in the light of recent exact representability investigations for neural networks using other popular activation functions like rectified linear units (ReLU). We also give precise bounds on the sizes of the neural networks required to represent any function in the class. Finally, we design an algorithm to solve the empirical risk minimization (ERM) problem to global optimality for these neural networks with a fixed architecture. The algorithm's running time is polynomial in the size of the data sample, if the input dimension and the size of the network architecture are considered fixed constants. The algorithm is unique in the sense that it works for any architecture with any number of layers, whereas previous polynomial time globally optimal algorithms work only for very restricted classes of architectures. Using these insights, we propose a new class of neural networks that we call shortcut linear threshold networks. To the best of our knowledge, this way of designing neural networks has not been explored before in the literature. We show that these neural networks have several desirable theoretical properties.  ( 3 min )
    Weighted Scaling Approach for Metabolomics Data Analysis. (arXiv:2208.00603v1 [stat.ML])
    Systematic variation is a common issue in metabolomics data analysis. Therefore, different scaling and normalization techniques are used to preprocess the data for metabolomics data analysis. Although several scaling methods are available in the literature, however, choice of scaling, transformation and/or normalization technique influence the further statistical analysis. It is challenging to choose the appropriate scaling technique for downstream analysis to get accurate results or to make a proper decision. Moreover, the existing scaling techniques are sensitive to outliers or extreme values. To fill the gap, our objective is to introduce a robust scaling approach that is not influenced by outliers as well as provides more accurate results for downstream analysis. Here, we introduced a new weighted scaling approach that is robust against outliers however, where no additional outlier detection/treatment step is needed in data preprocessing and also compared it with the conventional scaling and normalization techniques through artificial and real metabolomics datasets. We evaluated the performance of the proposed method in comparison to the other existing conventional scaling techniques using metabolomics data analysis in both the absence and presence of different percentages of outliers. Results show that in most cases, the proposed scaling technique performs better than the traditional scaling methods in both the absence and presence of outliers. The proposed method improves the further downstream metabolomics analysis. The R function of the proposed robust scaling method is available at https://github.com/nishithkumarpaul/robustScaling/blob/main/wscaling.R  ( 3 min )
    Adaptive Temperature Scaling for Robust Calibration of Deep Neural Networks. (arXiv:2208.00461v1 [cs.LG])
    In this paper, we study the post-hoc calibration of modern neural networks, a problem that has drawn a lot of attention in recent years. Many calibration methods of varying complexity have been proposed for the task, but there is no consensus about how expressive these should be. We focus on the task of confidence scaling, specifically on post-hoc methods that generalize Temperature Scaling, we call these the Adaptive Temperature Scaling family. We analyse expressive functions that improve calibration and propose interpretable methods. We show that when there is plenty of data complex models like neural networks yield better performance, but are prone to fail when the amount of data is limited, a common situation in certain post-hoc calibration applications like medical diagnosis. We study the functions that expressive methods learn under ideal conditions and design simpler methods but with a strong inductive bias towards these well-performing functions. Concretely, we propose Entropy-based Temperature Scaling, a simple method that scales the confidence of a prediction according to its entropy. Results show that our method obtains state-of-the-art performance when compared to others and, unlike complex models, it is robust against data scarcity. Moreover, our proposed model enables a deeper interpretation of the calibration process.  ( 2 min )
    CoNLoCNN: Exploiting Correlation and Non-Uniform Quantization for Energy-Efficient Low-precision Deep Convolutional Neural Networks. (arXiv:2208.00331v1 [cs.AR])
    In today's era of smart cyber-physical systems, Deep Neural Networks (DNNs) have become ubiquitous due to their state-of-the-art performance in complex real-world applications. The high computational complexity of these networks, which translates to increased energy consumption, is the foremost obstacle towards deploying large DNNs in resource-constrained systems. Fixed-Point (FP) implementations achieved through post-training quantization are commonly used to curtail the energy consumption of these networks. However, the uniform quantization intervals in FP restrict the bit-width of data structures to large values due to the need to represent most of the numbers with sufficient resolution and avoid high quantization errors. In this paper, we leverage the key insight that (in most of the scenarios) DNN weights and activations are mostly concentrated near zero and only a few of them have large magnitudes. We propose CoNLoCNN, a framework to enable energy-efficient low-precision deep convolutional neural network inference by exploiting: (1) non-uniform quantization of weights enabling simplification of complex multiplication operations; and (2) correlation between activation values enabling partial compensation of quantization errors at low cost without any run-time overheads. To significantly benefit from non-uniform quantization, we also propose a novel data representation format, Encoded Low-Precision Binary Signed Digit, to compress the bit-width of weights while ensuring direct use of the encoded weight for processing using a novel multiply-and-accumulate (MAC) unit design.  ( 3 min )
    Assessing the Early Bird Heuristic (for Predicting Project Quality). (arXiv:2105.11082v3 [cs.SE] UPDATED)
    Before researchers rush to reason across all available data or try complex methods, perhaps it is prudent to first check for simpler alternatives. Specifically, if the historical data has the most information in some small region, perhaps a model learned from that region would suffice for the rest of the project. To support this claim, we offer a case study with 240 projects, where we find that the information in those projects "clump" towards the earliest parts of the project. A quality prediction model learned from just the first 150 commits works as well, or better than state-of-the-art alternatives. Using just this "early bird" data, we can build models very quickly and very early in the project life cycle. Moreover, using this early bird method, we have shown that a simple model (with just a few features) generalizes to hundreds of projects. Based on this experience, we doubt that prior work on generalizing quality models may have needlessly complicated an inherently simple process. Further, prior work that focused on later-life cycle data needs to be revisited since their conclusions were drawn from relatively uninformative regions. Replication note: all our data and scripts are available here: https://github.com/snaraya7/early-bird  ( 3 min )
    Towards Intercultural Affect Recognition: Audio-Visual Affect Recognition in the Wild Across Six Cultures. (arXiv:2208.00344v1 [cs.CV])
    In our multicultural world, affect-aware AI systems that support humans need the ability to perceive affect across variations in emotion expression patterns across cultures. These models must perform well in cultural contexts on which they have not been trained. A standard assumption in affective computing is that affect recognition models trained and used within the same culture (intracultural) will perform better than models trained on one culture and used on different cultures (intercultural). We test this assumption and present the first systematic study of intercultural affect recognition models using videos of real-world dyadic interactions from six cultures. We develop an attention-based feature selection approach under temporal causal discovery to identify behavioral cues that can be leveraged in intercultural affect recognition models. Across all six cultures, our findings demonstrate that intercultural affect recognition models were as effective or more effective than intracultural models. We identify and contribute useful behavioral features for intercultural affect recognition; facial features from the visual modality were more useful than the audio modality in this study's context. Our paper presents a proof-of-concept and motivation for the future development of intercultural affect recognition systems.  ( 2 min )
    Neuro-Symbolic Learning: Principles and Applications in Ophthalmology. (arXiv:2208.00374v1 [cs.CV])
    Neural networks have been rapidly expanding in recent years, with novel strategies and applications. However, challenges such as interpretability, explainability, robustness, safety, trust, and sensibility remain unsolved in neural network technologies, despite the fact that they will unavoidably be addressed for critical applications. Attempts have been made to overcome the challenges in neural network computing by representing and embedding domain knowledge in terms of symbolic representations. Thus, the neuro-symbolic learning (NeSyL) notion emerged, which incorporates aspects of symbolic representation and bringing common sense into neural networks (NeSyL). In domains where interpretability, reasoning, and explainability are crucial, such as video and image captioning, question-answering and reasoning, health informatics, and genomics, NeSyL has shown promising outcomes. This review presents a comprehensive survey on the state-of-the-art NeSyL approaches, their principles, advances in machine and deep learning algorithms, applications such as opthalmology, and most importantly, future perspectives of this emerging field.  ( 2 min )
    On the Power-Law Hessian Spectrums in Deep Learning. (arXiv:2201.13011v2 [cs.LG] UPDATED)
    It is well-known that the Hessian of deep loss landscape matters to optimization, generalization, and even robustness of deep learning. Recent works empirically discovered that the Hessian spectrum in deep learning has a two-component structure that consists of a small number of large eigenvalues and a large number of nearly-zero eigenvalues. However, the theoretical mechanism or the mathematical behind the Hessian spectrum is still largely under-explored. To the best of our knowledge, we are the first to demonstrate that the Hessian spectrums of well-trained deep neural networks exhibit simple power-law structures. Inspired by the statistical physical theories and the spectral analysis of natural proteins, we provide a maximum-entropy theoretical interpretation for explaining why the power-law structure exist and suggest a spectral parallel between protein evolution and training of deep neural networks. By conducing extensive experiments, we further use the power-law spectral framework as a useful tool to explore multiple novel behaviors of deep learning.  ( 2 min )
    POTHER: Patch-Voted Deep Learning-Based Chest X-ray Bias Analysis for COVID-19 Detection. (arXiv:2201.09360v4 [eess.IV] UPDATED)
    A critical step in the fight against COVID-19, which continues to have a catastrophic impact on peoples lives, is the effective screening of patients presented in the clinics with severe COVID-19 symptoms. Chest radiography is one of the promising screening approaches. Many studies reported detecting COVID-19 in chest X-rays accurately using deep learning. A serious limitation of many published approaches is insufficient attention paid to explaining decisions made by deep learning models. Using explainable artificial intelligence methods, we demonstrate that model decisions may rely on confounding factors rather than medical pathology. After an analysis of potential confounding factors found on chest X-ray images, we propose a novel method to minimise their negative impact. We show that our proposed method is more robust than previous attempts to counter confounding factors such as ECG leads in chest X-rays that often influence model classification decisions. In addition to being robust, our method achieves results comparable to the state-of-the-art. The source code and pre-trained weights are publicly available at (https://github.com/tomek1911/POTHER).  ( 3 min )
    NN2Poly: A polynomial representation for deep feed-forward artificial neural networks. (arXiv:2112.11397v2 [stat.ML] UPDATED)
    Interpretability of neural networks and their underlying theoretical behaviour remain an open field of study even after the great success of their practical applications, particularly with the emergence of deep learning. In this work, NN2Poly is proposed: a theoretical approach to obtain an explicit polynomial model that provides an accurate representation of an already trained fully-connected feed-forward artificial neural network (a multilayer perceptron or MLP). This approach extends a previous idea proposed in the literature, which was limited to single hidden layer networks, to work with arbitrarily deep MLPs in both regression and classification tasks. The objective of this paper is to achieve this by using a Taylor expansion on the activation function, at each layer, and then using several combinatorial properties to calculate the coefficients of the desired polynomials. Discussion is presented on the main computational challenges of this method, and the way to overcome them by imposing certain constraints during the training phase. Finally, simulation experiments as well as an application to a real data set are presented to demonstrate the effectiveness of the proposed method.  ( 3 min )
    Problem-dependent attention and effort in neural networks with an application to image resolution. (arXiv:2201.01415v2 [cs.CV] UPDATED)
    This paper assesses a new classification approach that examines low-resolution images first, only moving to higher resolution images if the classification from the initial pass does not have a high degree of confidence. This multi-stage strategy for classification can be used with any classifier and does not require additional training. The approach is tested on five common datasets using four different classification approaches. It is found to be effective for cases in which at least some fraction of cases can be correctly classified using coarser data than are typically used. neural networks performing digit recognition, for instance, the proposed approach reduces the resource cost of classifying test cases by 60% to 85% with less than 5% reduction in accuracy.  ( 2 min )
    Disentangled Sequence Clustering for Human Intention Inference. (arXiv:2101.09500v4 [cs.RO] UPDATED)
    Equipping robots with the ability to infer human intent is a vital precondition for effective collaboration. Most computational approaches towards this objective derive a probability distribution of "intent" conditioned on the robot's perceived state. However, these approaches typically assume task-specific labels of human intent are known a priori. To overcome this constraint, we propose the Disentangled Sequence Clustering Variational Autoencoder (DiSCVAE), a clustering framework capable of learning such a distribution of intent in an unsupervised manner. The proposed framework leverages recent advances in unsupervised learning to disentangle latent representations of sequence data, separating time-varying local features from time-invariant global attributes. As a novel extension, the DiSCVAE also infers a discrete variable to form a latent mixture model and thus enable clustering over these global sequence concepts, e.g. high-level intentions. We evaluate the DiSCVAE on a real-world human-robot interaction dataset collected using a robotic wheelchair. Our findings reveal that the inferred discrete variable coincides with human intent, holding promise for collaborative settings, such as shared control.  ( 3 min )
    Generative Adversarial Networks via a Composite Annealing of Noise and Diffusion. (arXiv:2105.00220v3 [cs.LG] UPDATED)
    Generative adversarial network (GAN) is a framework for generating fake data using a set of real examples. However, GAN is unstable in the training stage. In order to stabilize GANs, the noise injection has been used to enlarge the overlap of the real and fake distributions at the cost of increasing variance. The diffusion (or smoothing) may reduce the intrinsic underlying dimensionality of data but it suppresses the capability of GANs to learn high-frequency information in the training procedure. Based on these observations, we propose a data representation for the GAN training, called noisy scale-space (NSS), that recursively applies the smoothing with a balanced noise to data in order to replace the high-frequency information by random data, leading to a coarse-to-fine training of GANs. We experiment with NSS using DCGAN and StyleGAN2 based on benchmark datasets in which the NSS-based GANs outperforms the state-of-the-arts in most cases.  ( 2 min )
    Online $k$-means Clustering on Arbitrary Data Streams. (arXiv:2102.09101v4 [cs.LG] UPDATED)
    We consider online $k$-means clustering where each new point is assigned to the nearest cluster center, after which the algorithm may update its centers. The loss incurred is the sum of squared distances from new points to their assigned cluster centers. The goal over a data stream $X$ is to achieve loss that is a constant factor of $L(X, OPT_k)$, the best possible loss using $k$ fixed points in hindsight. We propose a data parameter, $\Lambda(X)$, such that for any algorithm maintaining $O(k\text{poly}(\log n))$ centers at time $n$, there exists a data stream $X$ for which a loss of $\Omega(\Lambda(X))$ is inevitable. We then give a randomized algorithm that achieves clustering loss $O(\Lambda(X) + L(X, OPT_k))$. Our algorithm uses $O(k\text{poly}(\log n))$ memory and maintains $O(k\text{poly}(\log n))$ cluster centers. Our algorithm also enjoys a running time of $O(k\text{poly}(\log n))$ and is the first algorithm to achieve polynomial space and time complexity in this setting. It also is the first to have provable guarantees without making any assumptions on the input data.  ( 2 min )
    Quantum Adaptive Fourier Features for Neural Density Estimation. (arXiv:2208.00564v1 [cs.LG])
    Density estimation is a fundamental task in statistics and machine learning applications. Kernel density estimation is a powerful tool for non-parametric density estimation in low dimensions; however, its performance is poor in higher dimensions. Moreover, its prediction complexity scale linearly with more training data points. This paper presents a method for neural density estimation that can be seen as a type of kernel density estimation, but without the high prediction computational complexity. The method is based on density matrices, a formalism used in quantum mechanics, and adaptive Fourier features. The method can be trained without optimization, but it could be also integrated with deep learning architectures and trained using gradient descent. Thus, it could be seen as a form of neural density estimation method. The method was evaluated in different synthetic and real datasets, and its performance compared against state-of-the-art neural density estimation methods, obtaining competitive results.  ( 2 min )
    A rigorous introduction to linear models. (arXiv:2105.04240v4 [cs.LG] UPDATED)
    This survey is meant to provide an introduction to linear models and the theories behind them. Our goal is to give a rigorous introduction to the readers with prior exposure to ordinary least squares. In machine learning, the output is usually a nonlinear function of the input. Deep learning even aims to find a nonlinear dependence with many layers which require a large amount of computation. However, most of these algorithms build upon simple linear models. We then describe linear models from different views and find the properties and theories behind the models. The linear model is the main technique in regression problems and the primary tool for it is the least squares approximation which minimizes a sum of squared errors. This is a natural choice when we're interested in finding the regression function which minimizes the corresponding expected squared error. This survey is primarily a summary of purpose, significance of important theories behind linear models, e.g., distribution theory, minimum variance estimator. We first describe ordinary least squares from three different points of view upon which we disturb the model with random noise and Gaussian noise. By Gaussian noise, the model gives rise to the likelihood so that we introduce a maximum likelihood estimator. It also develops some distribution theories via this Gaussian disturbance. The distribution theory of least squares will help us answer various questions and introduce related applications. We then prove least squares is the best unbiased linear model in the sense of mean squared error and most importantly, it actually approaches the theoretical limit. We end up with linear models with the Bayesian approach and beyond.  ( 3 min )
    How should we proxy for race/ethnicity? Comparing Bayesian improved surname geocoding to machine learning methods. (arXiv:2206.14583v2 [cs.LG] UPDATED)
    Bayesian Improved Surname Geocoding (BISG) is the most popular method for proxying race/ethnicity in voter registration files that do not contain it. This paper benchmarks BISG against a range of previously untested machine learning alternatives, using voter files with self-reported race/ethnicity from California, Florida, North Carolina, and Georgia. This analysis yields three key findings. First, machine learning consistently outperforms BISG at individual classification of race/ethnicity. Second, BISG and machine learning methods exhibit divergent biases for estimating regional racial composition. Third, the performance of all methods varies substantially across states. These results suggest that pre-trained machine learning models are preferable to BISG for individual classification. Furthermore, mixed results across states underscore the need for researchers to empirically validate their chosen race/ethnicity proxy in their populations of interest.
    Density-Aware Personalized Training for Risk Prediction in Imbalanced Medical Data. (arXiv:2207.11382v2 [cs.LG] UPDATED)
    Medical events of interest, such as mortality, often happen at a low rate in electronic medical records, as most admitted patients survive. Training models with this imbalance rate (class density discrepancy) may lead to suboptimal prediction. Traditionally this problem is addressed through ad-hoc methods such as resampling or reweighting but performance in many cases is still limited. We propose a framework for training models for this imbalance issue: 1) we first decouple the feature extraction and classification process, adjusting training batches separately for each component to mitigate bias caused by class density discrepancy; 2) we train the network with both a density-aware loss and a learnable cost matrix for misclassifications. We demonstrate our model's improved performance in real-world medical datasets (TOPCAT and MIMIC-III) to show improved AUC-ROC, AUC-PRC, Brier Skill Score compared with the baselines in the domain.
    A Survey on Surrogate-assisted Efficient Neural Architecture Search. (arXiv:2206.01520v2 [cs.LG] UPDATED)
    Neural architecture search (NAS) has become increasingly popular in the deep learning community recently, mainly because it can provide an opportunity to allow interested users without rich expertise to benefit from the success of deep neural networks (DNNs). However, NAS is still laborious and time-consuming because a large number of performance estimations are required during the search process of NAS, and training DNNs is computationally intensive. To solve the major limitation of NAS, improving the efficiency of NAS is essential in the design of NAS. This paper begins with a brief introduction to the general framework of NAS. Then, the methods for evaluating network candidates under the proxy metrics are systematically discussed. This is followed by a description of surrogate-assisted NAS, which is divided into three different categories, namely Bayesian optimization for NAS, surrogate-assisted evolutionary algorithms for NAS, and MOP for NAS. Finally, remaining challenges and open research questions are discussed, and promising research topics are suggested in this emerging field.
    Markov Chain Score Ascent: A Unifying Framework of Variational Inference with Markovian Gradients. (arXiv:2206.06295v2 [cs.LG] UPDATED)
    Minimizing the inclusive Kullback-Leibler (KL) divergence with stochastic gradient descent (SGD) is challenging since its gradient is defined as an integral over the posterior. Recently, multiple methods have been proposed to run SGD with biased gradient estimates obtained from a Markov chain. This paper provides the first non-asymptotic convergence analysis of these methods by establishing their mixing rate and gradient variance. To do this, we demonstrate that these methods-which we collectively refer to as Markov chain score ascent (MCSA) methods-can be cast as special cases of the Markov chain gradient descent framework. Furthermore, by leveraging this new understanding, we develop a novel MCSA scheme, parallel MCSA (pMCSA), that achieves a tighter bound on the gradient variance. We demonstrate that this improved theoretical result translates to superior empirical performance.
    Incremental Learning Meets Transfer Learning: Application to Multi-site Prostate MRI Segmentation. (arXiv:2206.01369v2 [cs.CV] UPDATED)
    Many medical datasets have recently been created for medical image segmentation tasks, and it is natural to question whether we can use them to sequentially train a single model that (1) performs better on all these datasets, and (2) generalizes well and transfers better to the unknown target site domain. Prior works have achieved this goal by jointly training one model on multi-site datasets, which achieve competitive performance on average but such methods rely on the assumption about the availability of all training data, thus limiting its effectiveness in practical deployment. In this paper, we propose a novel multi-site segmentation framework called incremental-transfer learning (ITL), which learns a model from multi-site datasets in an end-to-end sequential fashion. Specifically, "incremental" refers to training sequentially constructed datasets, and "transfer" is achieved by leveraging useful information from the linear combination of embedding features on each dataset. In addition, we introduce our ITL framework, where we train the network including a site-agnostic encoder with pre-trained weights and at most two segmentation decoder heads. We also design a novel site-level incremental loss in order to generalize well on the target domain. Second, we show for the first time that leveraging our ITL training scheme is able to alleviate challenging catastrophic forgetting problems in incremental learning. We conduct experiments using five challenging benchmark datasets to validate the effectiveness of our incremental-transfer learning approach. Our approach makes minimal assumptions on computation resources and domain-specific expertise, and hence constitutes a strong starting point in multi-site medical image segmentation.
    Optimization of the Shape of a Hydrokinetic Turbine's Draft Tube and Hub Assembly Using Design-by-Morphing with Bayesian Optimization. (arXiv:2207.11451v2 [cs.CG] UPDATED)
    Finding the optimal design of a hydrodynamic or aerodynamic surface is often impossible due to the expense of evaluating the cost functions (say, with computational fluid dynamics) needed to determine the performances of the flows that the surface controls. In addition, inherent limitations of the design space itself due to imposed geometric constraints, conventional parameterization methods, and user bias can restrict {\it all} of the designs within a chosen design space regardless of whether traditional optimization methods or newer, data-driven design algorithms with machine learning are used to search the design space. We present a 2-pronged attack to address these difficulties: we propose (1) a methodology to create the design space using morphing that we call {\it Design-by-Morphing} (DbM); and (2) an optimization algorithm to search that space that uses a novel Bayesian Optimization (BO) strategy that we call {\it Mixed variable, Multi-Objective Bayesian Optimization} (MixMOBO). We apply this shape optimization strategy to maximize the power output of a hydrokinetic turbine. Applying these two strategies in tandem, we demonstrate that we can create a novel, geometrically-unconstrained, design space of a draft tube and hub shape and then optimize them simultaneously with a {\it minimum} number of cost function calls. Our framework is versatile and can be applied to the shape optimization of a variety of fluid problems.
    Realization Theory Of Recurrent Neural ODEs Using Polynomial System Embeddings. (arXiv:2205.11989v2 [math.OC] UPDATED)
    In this paper we show that neural ODE analogs of recurrent (ODE-RNN) and Long Short-Term Memory (ODE-LSTM) networks can be algorithmically embeddeded into the class of polynomial systems. This embedding preserves input-output behavior and can suitably be extended to other neural DE architectures. We then use realization theory of polynomial systems to provide necessary conditions for an input-output map to be realizable by an ODE-LSTM and sufficient conditions for minimality of such systems. These results represent the first steps towards realization theory of recurrent neural ODE architectures, which is is expected be useful for model reduction and learning algorithm analysis of recurrent neural ODEs.
    GARDNet: Robust Multi-View Network for Glaucoma Classification in Color Fundus Images. (arXiv:2205.12902v3 [eess.IV] UPDATED)
    Glaucoma is one of the most severe eye diseases, characterized by rapid progression and leading to irreversible blindness. It is often the case that diagnostics is carried out when one's sight has already significantly degraded due to the lack of noticeable symptoms at early stage of the disease. Regular glaucoma screenings of the population shall improve early-stage detection, however the desirable frequency of etymological checkups is often not feasible due to the excessive load imposed by manual diagnostics on limited number of specialists. Considering the basic methodology to detect glaucoma is to analyze fundus images for the optic-disc-to-optic-cup ratio, Machine Learning algorithms can offer sophisticated methods for image processing and classification. In our work, we propose an advanced image pre-processing technique combined with a multi-view network of deep classification models to categorize glaucoma. Our Glaucoma Automated Retinal Detection Network (GARDNet) has been successfully tested on Rotterdam EyePACS AIROGS dataset with an AUC of 0.92, and then additionally fine-tuned and tested on RIM-ONE DL dataset with an AUC of 0.9308 outperforming the state-of-the-art of 0.9272. Our code is available on https://github.com/ahmed1996said/gardnet
    Calibrating for Class Weights by Modeling Machine Learning. (arXiv:2205.04613v2 [cs.LG] UPDATED)
    A much studied issue is the extent to which the confidence scores provided by machine learning algorithms are calibrated to ground truth probabilities. Our starting point is that calibration is seemingly incompatible with class weighting, a technique often employed when one class is less common (class imbalance) or with the hope of achieving some external objective (cost-sensitive learning). We provide a model-based explanation for this incompatibility and use our anthropomorphic model to generate a simple method of recovering likelihoods from an algorithm that is miscalibrated due to class weighting. We validate this approach in the binary pneumonia detection task of Rajpurkar, Irvin, Zhu, et al. (2017).
    Closing the gap: Exact maximum likelihood training of generative autoencoders using invertible layers. (arXiv:2205.09546v2 [stat.ML] UPDATED)
    In this work, we provide an exact likelihood alternative to the variational training of generative autoencoders. We show that VAE-style autoencoders can be constructed using invertible layers, which offer a tractable exact likelihood without the need for any regularization terms. This is achieved while leaving complete freedom in the choice of encoder, decoder and prior architectures, making our approach a drop-in replacement for the training of existing VAEs and VAE-style models. We refer to the resulting models as Autoencoders within Flows (AEF), since the encoder, decoder and prior are defined as individual layers of an overall invertible architecture. We show that the approach results in strikingly higher performance than architecturally equivalent VAEs in term of log-likelihood, sample quality and denoising performance. In a broad sense, the main ambition of this work is to close the gap between the normalizing flow and autoencoder literature under the common framework of invertibility and exact maximum likelihood.
    Improved Orientation Estimation and Detection with Hybrid Object Detection Networks for Automotive Radar. (arXiv:2205.02111v2 [cs.CV] UPDATED)
    This paper presents novel hybrid architectures that combine grid- and point-based processing to improve the detection performance and orientation estimation of radar-based object detection networks. Purely grid-based detection models operate on a bird's-eye-view (BEV) projection of the input point cloud. These approaches suffer from a loss of detailed information through the discrete grid resolution. This applies in particular to radar object detection, where relatively coarse grid resolutions are commonly used to account for the sparsity of radar point clouds. In contrast, point-based models are not affected by this problem as they process point clouds without discretization. However, they generally exhibit worse detection performances than grid-based methods. We show that a point-based model can extract neighborhood features, leveraging the exact relative positions of points, before grid rendering. This has significant benefits for a subsequent grid-based convolutional detection backbone. In experiments on the public nuScenes dataset our hybrid architecture achieves improvements in terms of detection performance (19.7% higher mAP for car class than next-best radar-only submission) and orientation estimates (11.5% relative orientation improvement) over networks from previous literature.
    Lifelong Ensemble Learning based on Multiple Representations for Few-Shot Object Recognition. (arXiv:2205.01982v3 [cs.RO] UPDATED)
    Service robots are integrating more and more into our daily lives to help us with various tasks. In such environments, robots frequently face new objects while working in the environment and need to learn them in an open-ended fashion. Furthermore, such robots must be able to recognize a wide range of object categories. In this paper, we present a lifelong ensemble learning approach based on multiple representations to address the few-shot object recognition problem. In particular, we form ensemble methods based on deep representations and handcrafted 3D shape descriptors. To facilitate lifelong learning, each approach is equipped with a memory unit for storing and retrieving object information instantly. The proposed model is suitable for open-ended learning scenarios where the number of 3D object categories is not fixed and can grow over time. We have performed extensive sets of experiments to assess the performance of the proposed approach in offline, and open-ended scenarios. For the evaluation purpose, in addition to real object datasets, we generate a large synthetic household objects dataset consisting of 27000 views of 90 objects. Experimental results demonstrate the effectiveness of the proposed method on online few-shot 3D object recognition tasks, as well as its superior performance over the state-of-the-art open-ended learning approaches. Furthermore, our results show that while ensemble learning is modestly beneficial in offline settings, it is significantly beneficial in lifelong few-shot learning situations. Additionally, we demonstrated the effectiveness of our approach in both simulated and real-robot settings, where the robot rapidly learned new categories from limited examples.
    Do ReLU Networks Have An Edge When Approximating Compactly-Supported Functions?. (arXiv:2204.11231v2 [cs.LG] UPDATED)
    We study the problem of approximating compactly-supported integrable functions while implementing their support set using feedforward neural networks. Our first main result transcribes this "structured" approximation problem into a universality problem. We do this by constructing a refinement of the usual topology on the space $L^1_{\operatorname{loc}}(\mathbb{R}^d,\mathbb{R}^D)$ of locally-integrable functions in which compactly-supported functions can only be approximated in $L^1$-norm by functions with matching discretized support. We establish the universality of ReLU feedforward networks with bilinear pooling layers in this refined topology. Consequentially, we find that ReLU feedforward networks with bilinear pooling can approximate compactly supported functions while implementing their discretized support. We derive a quantitative uniform version of our universal approximation theorem on the dense subclass of compactly-supported Lipschitz functions. This quantitative result expresses the depth, width, and the number of bilinear pooling layers required to construct this ReLU network via the target function's regularity, the metric capacity and diameter of its essential support, and the dimensions of the inputs and output spaces. Conversely, we show that polynomial regressors and analytic feedforward networks are not universal in this space.
    GlacierNet2: A Hybrid Multi-Model Learning Architecture for Alpine Glacier Mapping. (arXiv:2204.05818v2 [eess.IV] UPDATED)
    In recent decades, climate change has significantly affected glacier dynamics, resulting in mass loss and an increased risk of glacier-related hazards including supraglacial and proglacial lake development, as well as catastrophic outburst flooding. Rapidly changing conditions dictate the need for continuous and detailed observations and analysis of climate-glacier dynamics. Thematic and quantitative information regarding glacier geometry is fundamental for understanding climate forcing and the sensitivity of glaciers to climate change, however, accurately mapping debris-cover glaciers (DCGs) is notoriously difficult based upon the use of spectral information and conventional machine-learning techniques. The objective of this research is to improve upon an earlier proposed deep-learning-based approach, GlacierNet, which was developed to exploit a convolutional neural-network segmentation model to accurately outline regional DCG ablation zones. Specifically, we developed an enhanced GlacierNet2 architecture thatincorporates multiple models, automatic post-processing, and basin-level hydrological flow techniques to improve the mapping of DCGs such that it includes both the ablation and accumulation zones. Experimental evaluations demonstrate that GlacierNet2 improves the estimation of the ablation zone and allows a high level of intersection over union (IOU: 0.8839) score. The proposed architecture provides complete glacier (both accumulation and ablation zone) outlines at regional scales, with an overall IOU score of 0.8619. This is a crucial first step in automating complete glacier mapping that can be used for accurate glacier modeling or mass-balance analysis.
    Modelling Evolutionary and Stationary User Preferences for Temporal Sets Prediction. (arXiv:2204.05490v6 [cs.LG] UPDATED)
    Given a sequence of sets, where each set is associated with a timestamp and contains an arbitrary number of elements, the task of temporal sets prediction aims to predict the elements in the subsequent set. Previous studies for temporal sets prediction mainly capture each user's evolutionary preference by learning from his/her own sequence. Although insightful, we argue that: 1) the collaborative signals latent in different users' sequences are essential but have not been exploited; 2) users also tend to show stationary preferences while existing methods fail to consider. To this end, we propose an integrated learning framework to model both the evolutionary and the stationary preferences of users for temporal sets prediction, which first constructs a universal sequence by chronologically arranging all the user-set interactions, and then learns on each user-set interaction. In particular, for each user-set interaction, we first design an evolutionary user preference modelling component to track the user's time-evolving preference and exploit the latent collaborative signals among different users. This component maintains a memory bank to store memories of the related user and elements, and continuously updates their memories based on the currently encoded messages and the past memories. Then, we devise a stationary user preference modelling module to discover each user's personalized characteristics according to the historical sequence, which adaptively aggregates the previously interacted elements from dual perspectives with the guidance of the user's and elements' embeddings. Finally, we develop a set-batch algorithm to improve the model efficiency, which can create time-consistent batches in advance and achieve 3.5x training speedups on average. Experiments on real-world datasets demonstrate the effectiveness and good interpretability of our approach.
    A Collection of Quality Diversity Optimization Problems Derived from Hyperparameter Optimization of Machine Learning Models. (arXiv:2204.14061v2 [cs.LG] UPDATED)
    The goal of Quality Diversity Optimization is to generate a collection of diverse yet high-performing solutions to a given problem at hand. Typical benchmark problems are, for example, finding a repertoire of robot arm configurations or a collection of game playing strategies. In this paper, we propose a set of Quality Diversity Optimization problems that tackle hyperparameter optimization of machine learning models - a so far underexplored application of Quality Diversity Optimization. Our benchmark problems involve novel feature functions, such as interpretability or resource usage of models. To allow for fast and efficient benchmarking, we build upon YAHPO Gym, a recently proposed open source benchmarking suite for hyperparameter optimization that makes use of high performing surrogate models and returns these surrogate model predictions instead of evaluating the true expensive black box function. We present results of an initial experimental study comparing different Quality Diversity optimizers on our benchmark problems. Furthermore, we discuss future directions and challenges of Quality Diversity Optimization in the context of hyperparameter optimization.
    Shoring Up the Foundations: Fusing Model Embeddings and Weak Supervision. (arXiv:2203.13270v2 [stat.ML] UPDATED)
    Foundation models offer an exciting new paradigm for constructing models with out-of-the-box embeddings and a few labeled examples. However, it is not clear how to best apply foundation models without labeled data. A potential approach is to fuse foundation models with weak supervision frameworks, which use weak label sources -- pre-trained models, heuristics, crowd-workers -- to construct pseudolabels. The challenge is building a combination that best exploits the signal available in both foundation models and weak sources. We propose Liger, a combination that uses foundation model embeddings to improve two crucial elements of existing weak supervision techniques. First, we produce finer estimates of weak source quality by partitioning the embedding space and learning per-part source accuracies. Second, we improve source coverage by extending source votes in embedding space. Despite the black-box nature of foundation models, we prove results characterizing how our approach improves performance and show that lift scales with the smoothness of label distributions in embedding space. On six benchmark NLP and video tasks, Liger outperforms vanilla weak supervision by 14.1 points, weakly-supervised kNN and adapters by 11.8 points, and kNN and adapters supervised by traditional hand labels by 7.2 points.
    Decentralized Collaborative Learning Framework for Next POI Recommendation. (arXiv:2204.06516v4 [cs.IR] UPDATED)
    Next Point-of-Interest (POI) recommendation has become an indispensable functionality in Location-based Social Networks (LBSNs) due to its effectiveness in helping people decide the next POI to visit. However, accurate recommendation requires a vast amount of historical check-in data, thus threatening user privacy as the location-sensitive data needs to be handled by cloud servers. Although there have been several on-device frameworks for privacy-preserving POI recommendations, they are still resource-intensive when it comes to storage and computation, and show limited robustness to the high sparsity of user-POI interactions. On this basis, we propose a novel decentralized collaborative learning framework for POI recommendation (DCLR), which allows users to train their personalized models locally in a collaborative manner. DCLR significantly reduces the local models' dependence on the cloud for training, and can be used to expand arbitrary centralized recommendation models. To counteract the sparsity of on-device user data when learning each local model, we design two self-supervision signals to pretrain the POI representations on the server with geographical and categorical correlations of POIs. To facilitate collaborative learning, we innovatively propose to incorporate knowledge from either geographically or semantically similar users into each local model with attentive aggregation and mutual information maximization. The collaborative learning process makes use of communications between devices while requiring only minor engagement from the central server for identifying user groups, and is compatible with common privacy preservation mechanisms like differential privacy. We evaluate DCLR with two real-world datasets, where the results show that DCLR outperforms state-of-the-art on-device frameworks and yields competitive results compared with centralized counterparts.
    IRC-safe Graph Autoencoder for unsupervised anomaly detection. (arXiv:2204.12231v2 [hep-ph] UPDATED)
    Anomaly detection through employing machine learning techniques has emerged as a novel powerful tool in the search for new physics beyond the Standard Model. Historically similar to the development of jet observables, theoretical consistency has not always assumed a central role in the fast development of algorithms and neural network architectures. In this work, we construct an infrared and collinear safe autoencoder based on graph neural networks by employing energy-weighted message passing. We demonstrate that whilst this approach has theoretically favourable properties, it also exhibits formidable sensitivity to non-QCD structures.
    On Multi-Domain Long-Tailed Recognition, Imbalanced Domain Generalization and Beyond. (arXiv:2203.09513v3 [cs.LG] UPDATED)
    Real-world data often exhibit imbalanced label distributions. Existing studies on data imbalance focus on single-domain settings, i.e., samples are from the same data distribution. However, natural data can originate from distinct domains, where a minority class in one domain could have abundant instances from other domains. We formalize the task of Multi-Domain Long-Tailed Recognition (MDLT), which learns from multi-domain imbalanced data, addresses label imbalance, domain shift, and divergent label distributions across domains, and generalizes to all domain-class pairs. We first develop the domain-class transferability graph, and show that such transferability governs the success of learning in MDLT. We then propose BoDA, a theoretically grounded learning strategy that tracks the upper bound of transferability statistics, and ensures balanced alignment and calibration across imbalanced domain-class distributions. We curate five MDLT benchmarks based on widely-used multi-domain datasets, and compare BoDA to twenty algorithms that span different learning strategies. Extensive and rigorous experiments verify the superior performance of BoDA. Further, as a byproduct, BoDA establishes new state-of-the-art on Domain Generalization benchmarks, highlighting the importance of addressing data imbalance across domains, which can be crucial for improving generalization to unseen domains. Code and data are available at: https://github.com/YyzHarry/multi-domain-imbalance.
    FederatedScope-GNN: Towards a Unified, Comprehensive and Efficient Package for Federated Graph Learning. (arXiv:2204.05562v5 [cs.LG] UPDATED)
    The incredible development of federated learning (FL) has benefited various tasks in the domains of computer vision and natural language processing, and the existing frameworks such as TFF and FATE has made the deployment easy in real-world applications. However, federated graph learning (FGL), even though graph data are prevalent, has not been well supported due to its unique characteristics and requirements. The lack of FGL-related framework increases the efforts for accomplishing reproducible research and deploying in real-world applications. Motivated by such strong demand, in this paper, we first discuss the challenges in creating an easy-to-use FGL package and accordingly present our implemented package FederatedScope-GNN (FS-G), which provides (1) a unified view for modularizing and expressing FGL algorithms; (2) comprehensive DataZoo and ModelZoo for out-of-the-box FGL capability; (3) an efficient model auto-tuning component; and (4) off-the-shelf privacy attack and defense abilities. We validate the effectiveness of FS-G by conducting extensive experiments, which simultaneously gains many valuable insights about FGL for the community. Moreover, we employ FS-G to serve the FGL application in real-world E-commerce scenarios, where the attained improvements indicate great potential business benefits. We publicly release FS-G, as submodules of FederatedScope, at https://github.com/alibaba/FederatedScope to promote FGL's research and enable broad applications that would otherwise be infeasible due to the lack of a dedicated package.
    Generative Adversarial Method Based On Neural Tangent Kernels. (arXiv:2204.04090v4 [cs.LG] UPDATED)
    The recent development of Generative adversarial networks (GANs) has driven many computer vision applications. Despite the great synthesis quality, training GANs often confronts several issues, including non-convergence, mode collapse, and gradient vanishing. There exist several workarounds, for example, regularizing Lipschitz continuity and adopting Wasserstein distance. Although these methods can partially solve the problems, we argue that the problems are result from modeling the discriminator with deep neural networks. In this paper, we base on newly derived deep neural network theories called Neural Tangent Kernel (NTK) and propose a new generative algorithm called generative adversarial NTK (GA-NTK). The GA-NTK models the discriminator as a Gaussian Process (GP). With the help of the NTK theories, the training dynamics of GA-NTK can be described with a closed-form formula. To synthesize data with the closed-form formula, the objectives can be simplified into a single-level adversarial optimization problem. We conduct extensive experiments on real-world datasets, and the results show that GA-NTK can generate images comparable to those by GANs but is much easier to train under various conditions. We also study the current limitations of GA-NTK and propose some workarounds to make GA-NTK more practical.
    What's in the Black Box? The False Negative Mechanisms Inside Object Detectors. (arXiv:2203.07662v4 [cs.CV] UPDATED)
    In object detection, false negatives arise when a detector fails to detect a target object. To understand why object detectors produce false negatives, we identify five 'false negative mechanisms', where each mechanism describes how a specific component inside the detector architecture failed. Focusing on two-stage and one-stage anchor-box object detector architectures, we introduce a framework for quantifying these false negative mechanisms. Using this framework, we investigate why Faster R-CNN and RetinaNet fail to detect objects in benchmark vision datasets and robotics datasets. We show that a detector's false negative mechanisms differ significantly between computer vision benchmark datasets and robotics deployment scenarios. This has implications for the translation of object detectors developed for benchmark datasets to robotics applications. Code is publicly available at https://github.com/csiro-robotics/fn_mechanisms
    Learning Where To Look -- Generative NAS is Surprisingly Efficient. (arXiv:2203.08734v2 [cs.LG] UPDATED)
    The efficient, automated search for well-performing neural architectures (NAS) has drawn increasing attention in the recent past. Thereby, the predominant research objective is to reduce the necessity of costly evaluations of neural architectures while efficiently exploring large search spaces. To this aim, surrogate models embed architectures in a latent space and predict their performance, while generative models for neural architectures enable optimization-based search within the latent space the generator draws from. Both, surrogate and generative models, have the aim of facilitating query-efficient search in a well-structured latent space. In this paper, we further improve the trade-off between query-efficiency and promising architecture generation by leveraging advantages from both, efficient surrogate models and generative design. To this end, we propose a generative model, paired with a surrogate predictor, that iteratively learns to generate samples from increasingly promising latent subspaces. This approach leads to very effective and efficient architecture search, while keeping the query amount low. In addition, our approach allows in a straightforward manner to jointly optimize for multiple objectives such as accuracy and hardware latency. We show the benefit of this approach not only w.r.t. the optimization of architectures for highest classification accuracy but also in the context of hardware constraints and outperform state-of-the-art methods on several NAS benchmarks for single and multiple objectives. We also achieve state-of-the-art performance on ImageNet. The code is available at this http URL .
    A Reinforcement Learning Approach to Sensing Design in Resource-Constrained Wireless Networked Control Systems. (arXiv:2204.00703v3 [eess.SY] UPDATED)
    In this paper, we consider a wireless network of smart sensors (agents) that monitor a dynamical process and send measurements to a base station that performs global monitoring and decision-making. Smart sensors are equipped with both sensing and computation, and can either send raw measurements or process them prior to transmission. Constrained agent resources raise a fundamental latency-accuracy trade-off. On the one hand, raw measurements are inaccurate but fast to produce. On the other hand, data processing on resource-constrained platforms generates accurate measurements at the cost of non-negligible computation latency. Further, if processed data are also compressed, latency caused by wireless communication might be higher for raw measurements. Hence, it is challenging to decide when and where sensors in the network should transmit raw measurements or leverage time-consuming local processing. To tackle this design problem, we propose a Reinforcement Learning approach to learn an efficient policy that dynamically decides when measurements are to be processed at each sensor. Effectiveness of our proposed approach is validated through a numerical simulation with case study on smart sensing motivated by the Internet of Drones.
    SocialVAE: Human Trajectory Prediction using Timewise Latents. (arXiv:2203.08207v4 [cs.CV] UPDATED)
    Predicting pedestrian movement is critical for human behavior analysis and also for safe and efficient human-agent interactions. However, despite significant advancements, it is still challenging for existing approaches to capture the uncertainty and multimodality of human navigation decision making. In this paper, we propose SocialVAE, a novel approach for human trajectory prediction. The core of SocialVAE is a timewise variational autoencoder architecture that exploits stochastic recurrent neural networks to perform prediction, combined with a social attention mechanism and a backward posterior approximation to allow for better extraction of pedestrian navigation strategies. We show that SocialVAE improves current state-of-the-art performance on several pedestrian trajectory prediction benchmarks, including the ETH/UCY benchmark, Stanford Drone Dataset, and SportVU NBA movement dataset. Code is available at: https://github.com/xupei0610/SocialVAE.
    Harmony: Overcoming the Hurdles of GPU Memory Capacity to Train Massive DNN Models on Commodity Servers. (arXiv:2202.01306v2 [cs.DC] UPDATED)
    Deep neural networks (DNNs) have grown exponentially in size over the past decade, leaving only those who have massive datacenter-based resources with the ability to develop and train such models. One of the main challenges for the long tail of researchers who might have only limited resources (e.g., a single multi-GPU server) is limited GPU memory capacity compared to model size. The problem is so acute that the memory requirement of training massive DNN models can often exceed the aggregate capacity of all available GPUs on a single server; this problem only gets worse with the trend of ever-growing model sizes. Current solutions that rely on virtualizing GPU memory (by swapping to/from CPU memory) incur excessive swapping overhead. In this paper, we present a new training framework, Harmony, and advocate rethinking how DNN frameworks schedule computation and move data to push the boundaries of training massive models efficiently on a single commodity server. Across various massive DNN models, Harmony is able to reduce swap load by up to two orders of magnitude and obtain a training throughput speedup of up to 7.6x over highly optimized baselines with virtualized memory.
    On the Detection of Adaptive Adversarial Attacks in Speaker Verification Systems. (arXiv:2202.05725v2 [cs.CR] UPDATED)
    Speaker verification systems have been widely used in smart phones and Internet of things devices to identify legitimate users. In recent work, it has been shown that adversarial attacks, such as FAKEBOB, can work effectively against speaker verification systems. The goal of this paper is to design a detector that can distinguish an original audio from an audio contaminated by adversarial attacks. Specifically, our designed detector, called MEH-FEST, calculates the minimum energy in high frequencies from the short-time Fourier transform of an audio and uses it as a detection metric. Through both analysis and experiments, we show that our proposed detector is easy to implement, fast to process an input audio, and effective in determining whether an audio is corrupted by FAKEBOB attacks. The experimental results indicate that the detector is extremely effective: with near zero false positive and false negative rates for detecting FAKEBOB attacks in Gaussian mixture model (GMM) and i-vector speaker verification systems. Moreover, adaptive adversarial attacks against our proposed detector and their countermeasures are discussed and studied, showing the game between attackers and defenders.
    Automated fault tree learning from continuous-valued sensor data: a case study on domestic heaters. (arXiv:2203.07374v2 [cs.LG] UPDATED)
    Many industrial sectors have been collecting big sensor data. With recent technologies for processing big data, companies can exploit this for automatic failure detection and prevention. We propose the first completely automated method for failure analysis, machine-learning fault trees from raw observational data with continuous variables. Our method scales well and is tested on a real-world, five-year dataset of domestic heater operations in The Netherlands, with 31 million unique heater-day readings, each containing 27 sensor and 11 failure variables. Our method builds on two previous procedures: the C4.5 decision-tree learning algorithm, and the LIFT fault tree learning algorithm from Boolean data. C4.5 pre-processes each continuous variable: it learns an optimal numerical threshold which distinguishes between faulty and normal operation of the top-level system. These thresholds discretise the variables, thus allowing LIFT to learn fault trees which model the root failure mechanisms of the system and are explainable. We obtain fault trees for the 11 failure variables, and evaluate them in two ways: quantitatively, with a significance score, and qualitatively, with domain specialists. Some of the fault trees learnt have almost maximum significance (above 0.95), while others have medium-to-low significance (around 0.30), reflecting the difficulty of learning from big, noisy, real-world sensor data. The domain specialists confirm that the fault trees model meaningful relationships among the variables.
    Certifying Out-of-Domain Generalization for Blackbox Functions. (arXiv:2202.01679v2 [cs.LG] UPDATED)
    Certifying the robustness of model performance under bounded data distribution drifts has recently attracted intensive interest under the umbrella of distributional robustness. However, existing techniques either make strong assumptions on the model class and loss functions that can be certified, such as smoothness expressed via Lipschitz continuity of gradients, or require to solve complex optimization problems. As a result, the wider application of these techniques is currently limited by its scalability and flexibility -- these techniques often do not scale to large-scale datasets with modern deep neural networks or cannot handle loss functions which may be non-smooth such as the 0-1 loss. In this paper, we focus on the problem of certifying distributional robustness for blackbox models and bounded loss functions, and propose a novel certification framework based on the Hellinger distance. Our certification technique scales to ImageNet-scale datasets, complex models, and a diverse set of loss functions. We then focus on one specific application enabled by such scalability and flexibility, i.e., certifying out-of-domain generalization for large neural networks and loss functions such as accuracy and AUC. We experimentally validate our certification method on a number of datasets, ranging from ImageNet, where we provide the first non-vacuous certified out-of-domain generalization, to smaller classification tasks where we are able to compare with the state-of-the-art and show that our method performs considerably better.
    PennyLane: Automatic differentiation of hybrid quantum-classical computations. (arXiv:1811.04968v4 [quant-ph] UPDATED)
    PennyLane is a Python 3 software framework for differentiable programming of quantum computers. The library provides a unified architecture for near-term quantum computing devices, supporting both qubit and continuous-variable paradigms. PennyLane's core feature is the ability to compute gradients of variational quantum circuits in a way that is compatible with classical techniques such as backpropagation. PennyLane thus extends the automatic differentiation algorithms common in optimization and machine learning to include quantum and hybrid computations. A plugin system makes the framework compatible with any gate-based quantum simulator or hardware. We provide plugins for hardware providers including the Xanadu Cloud, Amazon Braket, and IBM Quantum, allowing PennyLane optimizations to be run on publicly accessible quantum devices. On the classical front, PennyLane interfaces with accelerated machine learning libraries such as TensorFlow, PyTorch, JAX, and Autograd. PennyLane can be used for the optimization of variational quantum eigensolvers, quantum approximate optimization, quantum machine learning models, and many other applications.
    Query Processing on Tensor Computation Runtimes. (arXiv:2203.01877v3 [cs.DB] UPDATED)
    The huge demand for computation in artificial intelligence (AI) is driving unparalleled investments in hardware and software systems for AI. This leads to an explosion in the number of specialized hardware devices, which are now offered by major cloud vendors. By hiding the low-level complexity through a tensor-based interface, tensor computation runtimes (TCRs) such as PyTorch allow data scientists to efficiently exploit the exciting capabilities offered by the new hardware. In this paper, we explore how database management systems can ride the wave of innovation happening in the AI space. We design, build, and evaluate Tensor Query Processor (TQP): TQP transforms SQL queries into tensor programs and executes them on TCRs. TQP is able to run the full TPC-H benchmark by implementing novel algorithms for relational operators on the tensor routines. At the same time, TQP can support various hardware while only requiring a fraction of the usual development effort. Experiments show that TQP can improve query execution time by up to 10$\times$ over specialized CPU- and GPU-only systems. Finally, TQP can accelerate queries mixing ML predictions and SQL end-to-end, and deliver up to 9$\times$ speedup over CPU baselines.
    Learning Stationary Nash Equilibrium Policies in $n$-Player Stochastic Games with Independent Chains via Dual Mirror Descent. (arXiv:2201.12224v3 [cs.LG] UPDATED)
    We consider a subclass of $n$-player stochastic games, in which players have their own internal state/action spaces while they are coupled through their payoff functions. It is assumed that players' internal chains are driven by independent transition probabilities. Moreover, players can receive only realizations of their payoffs, not the actual functions, and cannot observe each other's states/actions. Under some assumptions on the structure of the payoff functions, we develop efficient learning algorithms based on dual averaging and dual mirror descent, which provably converge almost surely or in expectation to the set of $\epsilon$-Nash equilibrium policies. In particular, we derive upper bounds on the number of iterates that scale polynomially in terms of the game parameters to achieve an $\epsilon$-Nash equilibrium policy. In addition to Markov potential games and linear-quadratic stochastic games, this work provides another subclass of $n$-player stochastic games that provably admit polynomial-time learning algorithms for finding their $\epsilon$-Nash equilibrium policies.
    Deep Active Learning with Budget Annotation. (arXiv:2208.00508v1 [cs.LG])
    Digital data collected over the decades and data currently being produced with use of information technology is vastly the unlabeled data or data without description. The unlabeled data is relatively easy to acquire but expensive to label even with use of domain experts. Most of the recent works focus on use of active learning with uncertainty metrics measure to address this problem. Although most uncertainty selection strategies are very effective, they fail to take informativeness of the unlabeled instances into account and are prone to querying outliers. In order to address these challenges we propose an hybrid approach of computing both the uncertainty and informativeness of an instance, then automaticaly label the computed instances using budget annotator. To reduce the annotation cost, we employ the state-of-the-art pre-trained models in order to avoid querying information already contained in those models. Our extensive experiments on different sets of datasets demonstrate the efficacy of the proposed approach.
    A Real-time Edge-AI System for Reef Surveys. (arXiv:2208.00598v1 [cs.LG])
    Crown-of-Thorn Starfish (COTS) outbreaks are a major cause of coral loss on the Great Barrier Reef (GBR) and substantial surveillance and control programs are ongoing to manage COTS populations to ecologically sustainable levels. In this paper, we present a comprehensive real-time machine learning-based underwater data collection and curation system on edge devices for COTS monitoring. In particular, we leverage the power of deep learning-based object detection techniques, and propose a resource-efficient COTS detector that performs detection inferences on the edge device to assist marine experts with COTS identification during the data collection phase. The preliminary results show that several strategies for improving computational efficiency (e.g., batch-wise processing, frame skipping, model input size) can be combined to run the proposed detection model on edge hardware with low resource consumption and low information loss.
    Beyond kNN: Adaptive, Sparse Neighborhood Graphs via Optimal Transport. (arXiv:2208.00604v1 [stat.ML])
    Nearest neighbour graphs are widely used to capture the geometry or topology of a dataset. One of the most common strategies to construct such a graph is based on selecting a fixed number k of nearest neighbours (kNN) for each point. However, the kNN heuristic may become inappropriate when sampling density or noise level varies across datasets. Strategies that try to get around this typically introduce additional parameters that need to be tuned. We propose a simple approach to construct an adaptive neighbourhood graph from a single parameter, based on quadratically regularised optimal transport. Our numerical experiments show that graphs constructed in this manner perform favourably in unsupervised and semi-supervised learning applications.
    Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization. (arXiv:2208.00579v1 [cs.LG])
    Transformers have achieved remarkable success in sequence modeling and beyond but suffer from quadratic computational and memory complexities with respect to the length of the input sequence. Leveraging techniques include sparse and linear attention and hashing tricks; efficient transformers have been proposed to reduce the quadratic complexity of transformers but significantly degrade the accuracy. In response, we first interpret the linear attention and residual connections in computing the attention map as gradient descent steps. We then introduce momentum into these components and propose the \emph{momentum transformer}, which utilizes momentum to improve the accuracy of linear transformers while maintaining linear memory and computational complexities. Furthermore, we develop an adaptive strategy to compute the momentum value for our model based on the optimal momentum for quadratic optimization. This adaptive momentum eliminates the need to search for the optimal momentum value and further enhances the performance of the momentum transformer. A range of experiments on both autoregressive and non-autoregressive tasks, including image generation and machine translation, demonstrate that the momentum transformer outperforms popular linear transformers in training efficiency and accuracy.
    Long Short-Term Preference Modeling for Continuous-Time Sequential Recommendation. (arXiv:2208.00593v1 [cs.IR])
    Modeling the evolution of user preference is essential in recommender systems. Recently, dynamic graph-based methods have been studied and achieved SOTA for recommendation, majority of which focus on user's stable long-term preference. However, in real-world scenario, user's short-term preference evolves over time dynamically. Although there exists sequential methods that attempt to capture it, how to model the evolution of short-term preference with dynamic graph-based methods has not been well-addressed yet. In particular: 1) existing methods do not explicitly encode and capture the evolution of short-term preference as sequential methods do; 2) simply using last few interactions is not enough for modeling the changing trend. In this paper, we propose Long Short-Term Preference Modeling for Continuous-Time Sequential Recommendation (LSTSR) to capture the evolution of short-term preference under dynamic graph. Specifically, we explicitly encode short-term preference and optimize it via memory mechanism, which has three key operations: Message, Aggregate and Update. Our memory mechanism can not only store one-hop information, but also trigger with new interactions online. Extensive experiments conducted on five public datasets show that LSTSR consistently outperforms many state-of-the-art recommendation methods across various lines.
    Unifying Approaches in Data Subset Selection via Fisher Information and Information-Theoretic Quantities. (arXiv:2208.00549v1 [cs.LG])
    The mutual information between predictions and model parameters -- also referred to as expected information gain or BALD in machine learning -- measures informativeness. It is a popular acquisition function in Bayesian active learning and Bayesian optimal experiment design. In data subset selection, i.e. active learning and active sampling, several recent works use Fisher information, Hessians, similarity matrices based on the gradients, or simply the gradient lengths to compute the acquisition scores that guide sample selection. Are these different approaches connected, and if so how? In this paper, we revisit the Fisher information and use it to show how several otherwise disparate methods are connected as approximations of information-theoretic quantities.
    INSightR-Net: Interpretable Neural Network for Regression using Similarity-based Comparisons to Prototypical Examples. (arXiv:2208.00457v1 [cs.CV])
    Convolutional neural networks (CNNs) have shown exceptional performance for a range of medical imaging tasks. However, conventional CNNs are not able to explain their reasoning process, therefore limiting their adoption in clinical practice. In this work, we propose an inherently interpretable CNN for regression using similarity-based comparisons (INSightR-Net) and demonstrate our methods on the task of diabetic retinopathy grading. A prototype layer incorporated into the architecture enables visualization of the areas in the image that are most similar to learned prototypes. The final prediction is then intuitively modeled as a mean of prototype labels, weighted by the similarities. We achieved competitive prediction performance with our INSightR-Net compared to a ResNet baseline, showing that it is not necessary to compromise performance for interpretability. Furthermore, we quantified the quality of our explanations using sparsity and diversity, two concepts considered important for a good explanation, and demonstrated the effect of several parameters on the latent space embeddings.
    Adaptive Edge Offloading for Image Classification Under Rate Limit. (arXiv:2208.00485v1 [cs.DC])
    This paper considers a setting where embedded devices are used to acquire and classify images. Because of limited computing capacity, embedded devices rely on a parsimonious classification model with uneven accuracy. When local classification is deemed inaccurate, devices can decide to offload the image to an edge server with a more accurate but resource-intensive model. Resource constraints, e.g., network bandwidth, however, require regulating such transmissions to avoid congestion and high latency. The paper investigates this offloading problem when transmissions regulation is through a token bucket, a mechanism commonly used for such purposes. The goal is to devise a lightweight, online offloading policy that optimizes an application-specific metric (e.g., classification accuracy) under the constraints of the token bucket. The paper develops a policy based on a Deep Q-Network (DQN), and demonstrates both its efficacy and the feasibility of its deployment on embedded devices. Of note is the fact that the policy can handle complex input patterns, including correlation in image arrivals and classification accuracy. The evaluation is carried out by performing image classification over a local testbed using synthetic traces generated from the ImageNet image classification benchmark. Implementation of this work is available at https://github.com/qiujiaming315/edgeml-dqn.
    Scrutinizing Shipment Records To Thwart Illegal Timber Trade. (arXiv:2208.00493v1 [cs.LG])
    Timber and forest products made from wood, like furniture, are valuable commodities, and like the global trade of many highly-valued natural resources, face challenges of corruption, fraud, and illegal harvesting. These grey and black market activities in the wood and forest products sector are not limited to the countries where the wood was harvested, but extend throughout the global supply chain and have been tied to illicit financial flows, like trade-based money laundering, document fraud, species mislabeling, and other illegal activities. The task of finding such fraudulent activities using trade data, in the absence of ground truth, can be modelled as an unsupervised anomaly detection problem. However existing approaches suffer from certain shortcomings in their applicability towards large scale trade data. Trade data is heterogeneous, with both categorical and numerical attributes in a tabular format. The overall challenge lies in the complexity, volume and velocity of data, with large number of entities and lack of ground truth labels. To mitigate these, we propose a novel unsupervised anomaly detection -- Contrastive Learning based Heterogeneous Anomaly Detection (CHAD) that is generally applicable for large-scale heterogeneous tabular data. We demonstrate our model CHAD performs favorably against multiple comparable baselines for public benchmark datasets, and outperforms them in the case of trade data. More importantly we demonstrate our approach reduces assumptions and efforts required hyperparameter tuning, which is a key challenging aspect in an unsupervised training paradigm. Specifically, our overarching objective pertains to detecting suspicious timber shipments and patterns using Bill of Lading trade record data. Detecting anomalous transactions in shipment records can enable further investigation by government agencies and supply chain constituents.
    eco2AI: carbon emissions tracking of machine learning models as the first step towards sustainable AI. (arXiv:2208.00406v1 [cs.LG])
    The size and complexity of deep neural networks continue to grow exponentially, significantly increasing energy consumption for training and inference by these models. We introduce an open-source package eco2AI to help data scientist and researchers track energy consumption and equivalent CO2 emissions of their models in a straightforward way. In eco2AI we put emphasis on accuracy of energy consumption tracking and correct regional CO2 emissions accounting. We encourage research community to search for new optimal Artificial Intelligence (AI) architectures with a lower computational cost. The motivation also comes from the concept of AI-based green house gases sequestrating cycle with both Sustainable AI and Green AI pathways.
    Evo* 2022 -- Late-Breaking Abstracts Volume. (arXiv:2208.00555v1 [cs.NE])
    Volume with the Late-Breaking Abstracts submitted to the Evo* 2022 Conference, held in Madrid (Spain), from 20 to 22 of April. These papers present ongoing research and preliminary results investigating on the application of different approaches of Bioinspired Methods (mainly Evolutionary Computation) to different problems, most of them real world ones.
    Robot Policy Learning from Demonstration Using Advantage Weighting and Early Termination. (arXiv:2208.00478v1 [cs.LG])
    Learning robotic tasks in the real world is still highly challenging and effective practical solutions remain to be found. Traditional methods used in this area are imitation learning and reinforcement learning, but they both have limitations when applied to real robots. Combining reinforcement learning with pre-collected demonstrations is a promising approach that can help in learning control policies to solve robotic tasks. In this paper, we propose an algorithm that uses novel techniques to leverage offline expert data using offline and online training to obtain faster convergence and improved performance. The proposed algorithm (AWET) weights the critic losses with a novel agent advantage weight to improve over the expert data. In addition, AWET makes use of an automatic early termination technique to stop and discard policy rollouts that are not similar to expert trajectories -- to prevent drifting far from the expert data. In an ablation study, AWET showed improved and promising performance when compared to state-of-the-art baselines on four standard robotic tasks.
    COCOA: Cross Modality Contrastive Learning for Sensor Data. (arXiv:2208.00467v1 [cs.CV])
    Self-Supervised Learning (SSL) is a new paradigm for learning discriminative representations without labelled data and has reached comparable or even state-of-the-art results in comparison to supervised counterparts. Contrastive Learning (CL) is one of the most well-known approaches in SSL that attempts to learn general, informative representations of data. CL methods have been mostly developed for applications in computer vision and natural language processing where only a single sensor modality is used. A majority of pervasive computing applications, however, exploit data from a range of different sensor modalities. While existing CL methods are limited to learning from one or two data sources, we propose COCOA (Cross mOdality COntrastive leArning), a self-supervised model that employs a novel objective function to learn quality representations from multisensor data by computing the cross-correlation between different data modalities and minimizing the similarity between irrelevant instances. We evaluate the effectiveness of COCOA against eight recently introduced state-of-the-art self-supervised models, and two supervised baselines across five public datasets. We show that COCOA achieves superior classification performance to all other approaches. Also, COCOA is far more label-efficient than the other baselines including the fully supervised model using only one-tenth of available labelled data.
    Online Decentralized Frank-Wolfe: From theoretical bound to applications in smart-building. (arXiv:2208.00522v1 [cs.LG])
    The design of decentralized learning algorithms is important in the fast-growing world in which data are distributed over participants with limited local computation resources and communication. In this direction, we propose an online algorithm minimizing non-convex loss functions aggregated from individual data/models distributed over a network. We provide the theoretical performance guarantee of our algorithm and demonstrate its utility on a real life smart building.
    Untargeted Region of Interest Selection for GC-MS Data using a Pseudo F-Ratio Moving Window ($\psi$FRMV). (arXiv:2208.00313v1 [stat.ML])
    There are many challenges associated with analysing gas chromatography - mass spectrometry (GC-MS) data. Many of these challenges stem from the fact that electron ionisation can make it difficult to recover molecular information due to the high degree of fragmentation with concomitant loss of molecular ion signal. With GC-MS data there are often many common fragment ions shared among closely-eluting peaks, necessitating sophisticated methods for analysis. Some of these methods are fully automated, but make some assumptions about the data which can introduce artifacts during the analysis. Chemometric methods such as Multivariate Curve Resolution, or Parallel Factor Analysis are particularly attractive, since they are flexible and make relatively few assumptions about the data - ideally resulting in fewer artifacts. These methods do require expert user intervention to determine the most relevant regions of interest and an appropriate number of components, $k$, for each region. Automated region of interest selection is needed to permit automated batch processing of chromatographic data with advanced signal deconvolution. Here, we propose a new method for automated, untargeted region of interest selection that accounts for the multivariate information present in GC-MS data to select regions of interest based on the ratio of the squared first, and second singular values from the Singular Value Decomposition of a window that moves across the chromatogram. Assuming that the first singular value accounts largely for signal, and that the second singular value accounts largely for noise, it is possible to interpret the relationship between these two values as a probabilistic distribution of Fisher Ratios. The sensitivity of the algorithm was tested by investigating the concentration at which the algorithm can no longer pick out chromatographic regions known to contain signal.
    Unitary Approximate Message Passing for Matrix Factorization. (arXiv:2208.00422v1 [eess.SP])
    We consider matrix factorization (MF) with certain constraints, which finds wide applications in various areas. Leveraging variational inference (VI) and unitary approximate message passing (UAMP), we develop a Bayesian approach to MF with an efficient message passing implementation, called UAMPMF. With proper priors imposed on the factor matrices, UAMPMF can be used to solve many problems that can be formulated as MF, such as non negative matrix factorization, dictionary learning, compressive sensing with matrix uncertainty, robust principal component analysis, and sparse matrix factorization. Extensive numerical examples are provided to show that UAMPMF significantly outperforms state-of-the-art algorithms in terms of recovery accuracy, robustness and computational complexity.
    Is current research on adversarial robustness addressing the right problem?. (arXiv:2208.00539v1 [cs.CV])
    Short answer: Yes, Long answer: No! Indeed, research on adversarial robustness has led to invaluable insights helping us understand and explore different aspects of the problem. Many attacks and defenses have been proposed over the last couple of years. The problem, however, remains largely unsolved and poorly understood. Here, I argue that the current formulation of the problem serves short term goals, and needs to be revised for us to achieve bigger gains. Specifically, the bound on perturbation has created a somewhat contrived setting and needs to be relaxed. This has misled us to focus on model classes that are not expressive enough to begin with. Instead, inspired by human vision and the fact that we rely more on robust features such as shape, vertices, and foreground objects than non-robust features such as texture, efforts should be steered towards looking for significantly different classes of models. Maybe instead of narrowing down on imperceptible adversarial perturbations, we should attack a more general problem which is finding architectures that are simultaneously robust to perceptible perturbations, geometric transformations (e.g. rotation, scaling), image distortions (lighting, blur), and more (e.g. occlusion, shadow). Only then we may be able to solve the problem of adversarial vulnerability.
    Building an Efficiency Pipeline: Commutativity and Cumulativeness of Efficiency Operators for Transformers. (arXiv:2208.00483v1 [cs.CL])
    There exists a wide variety of efficiency methods for natural language processing (NLP) tasks, such as pruning, distillation, dynamic inference, quantization, etc. We can consider an efficiency method as an operator applied on a model. Naturally, we may construct a pipeline of multiple efficiency methods, i.e., to apply multiple operators on the model sequentially. In this paper, we study the plausibility of this idea, and more importantly, the commutativity and cumulativeness of efficiency operators. We make two interesting observations: (1) Efficiency operators are commutative -- the order of efficiency methods within the pipeline has little impact on the final results; (2) Efficiency operators are also cumulative -- the final results of combining several efficiency methods can be estimated by combining the results of individual methods. These observations deepen our understanding of efficiency operators and provide useful guidelines for their real-world applications.
    DNNShield: Dynamic Randomized Model Sparsification, A Defense Against Adversarial Machine Learning. (arXiv:2208.00498v1 [cs.CR])
    DNNs are known to be vulnerable to so-called adversarial attacks that manipulate inputs to cause incorrect results that can be beneficial to an attacker or damaging to the victim. Recent works have proposed approximate computation as a defense mechanism against machine learning attacks. We show that these approaches, while successful for a range of inputs, are insufficient to address stronger, high-confidence adversarial attacks. To address this, we propose DNNSHIELD, a hardware-accelerated defense that adapts the strength of the response to the confidence of the adversarial input. Our approach relies on dynamic and random sparsification of the DNN model to achieve inference approximation efficiently and with fine-grain control over the approximation error. DNNSHIELD uses the output distribution characteristics of sparsified inference compared to a dense reference to detect adversarial inputs. We show an adversarial detection rate of 86% when applied to VGG16 and 88% when applied to ResNet50, which exceeds the detection rate of the state of the art approaches, with a much lower overhead. We demonstrate a software/hardware-accelerated FPGA prototype, which reduces the performance impact of DNNSHIELD relative to software-only CPU and GPU implementations.
    Learning to generate Reliable Broadcast Algorithms. (arXiv:2208.00525v1 [cs.DC])
    Modern distributed systems are supported by fault-tolerant algorithms, like Reliable Broadcast and Consensus, that assure the correct operation of the system even when some of the nodes of the system fail. However, the development of distributed algorithms is a manual and complex process, resulting in scientific papers that usually present a single algorithm or variations of existing ones. To automate the process of developing such algorithms, this work presents an intelligent agent that uses Reinforcement Learning to generate correct and efficient fault-tolerant distributed algorithms. We show that our approach is able to generate correct fault-tolerant Reliable Broadcast algorithms with the same performance of others available in the literature, in only 12,000 learning episodes.
    A Multi-View Learning Approach to Enhance Automatic 12-Lead ECG Diagnosis Performance. (arXiv:2208.00323v1 [eess.SP])
    The performances of commonly used electrocardiogram (ECG) diagnosis models have recently improved with the introduction of deep learning (DL). However, the impact of various combinations of multiple DL components and/or the role of data augmentation techniques on the diagnosis have not been sufficiently investigated. This study proposes an ensemble-based multi-view learning approach with an ECG augmentation technique to achieve a higher performance than traditional automatic 12-lead ECG diagnosis methods. The data analysis results show that the proposed model reports an F1 score of 0.840, which outperforms existing state-ofthe-art methods in the literature.
    Improving Distantly Supervised Relation Extraction by Natural Language Inference. (arXiv:2208.00346v1 [cs.CL])
    To reduce human annotations for relation extraction (RE) tasks, distantly supervised approaches have been proposed, while struggling with low performance. In this work, we propose a novel DSRE-NLI framework, which considers both distant supervision from existing knowledge bases and indirect supervision from pretrained language models for other tasks. DSRE-NLI energizes an off-the-shelf natural language inference (NLI) engine with a semi-automatic relation verbalization (SARV) mechanism to provide indirect supervision and further consolidates the distant annotations to benefit multi-classification RE models. The NLI-based indirect supervision acquires only one relation verbalization template from humans as a semantically general template for each relationship, and then the template set is enriched by high-quality textual patterns automatically mined from the distantly annotated corpus. With two simple and effective data consolidation strategies, the quality of training data is substantially improved. Extensive experiments demonstrate that the proposed framework significantly improves the SOTA performance (up to 7.73\% of F1) on distantly supervised RE benchmark datasets.
    Formal guarantees for heuristic optimization algorithms used in machine learning. (arXiv:2208.00502v1 [cs.LG])
    Recently, Stochastic Gradient Descent (SGD) and its variants have become the dominant methods in the large-scale optimization of machine learning (ML) problems. A variety of strategies have been proposed for tuning the step sizes, ranging from adaptive step sizes to heuristic methods to change the step size in each iteration. Also, momentum has been widely employed in ML tasks to accelerate the training process. Yet, there is a gap in our theoretical understanding of them. In this work, we start to close this gap by providing formal guarantees to a few heuristic optimization methods and proposing improved algorithms. First, we analyze a generalized version of the AdaGrad (Delayed AdaGrad) step sizes in both convex and non-convex settings, showing that these step sizes allow the algorithms to automatically adapt to the level of noise of the stochastic gradients. We show for the first time sufficient conditions for Delayed AdaGrad to achieve almost sure convergence of the gradients to zero. Moreover, we present a high probability analysis for Delayed AdaGrad and its momentum variant in the non-convex setting. Second, we analyze SGD with exponential and cosine step sizes, which are empirically successful but lack theoretical support. We provide the very first convergence guarantees for them in the smooth and non-convex setting, with and without the Polyak-{\L}ojasiewicz (PL) condition. We also show their good property of adaptivity to noise under the PL condition. Third, we study the last iterate of momentum methods. We prove the first lower bound in the convex setting for the last iterate of SGD with constant momentum. Moreover, we investigate a class of Follow-The-Regularized-Leader-based momentum algorithms with increasing momentum and shrinking updates. We show that their last iterate has optimal convergence for unconstrained convex stochastic optimization problems.
    Symmetry Regularization and Saturating Nonlinearity for Robust Quantization. (arXiv:2208.00338v1 [cs.LG])
    Robust quantization improves the tolerance of networks for various implementations, allowing reliable output in different bit-widths or fragmented low-precision arithmetic. In this work, we perform extensive analyses to identify the sources of quantization error and present three insights to robustify a network against quantization: reduction of error propagation, range clamping for error minimization, and inherited robustness against quantization. Based on these insights, we propose two novel methods called symmetry regularization (SymReg) and saturating nonlinearity (SatNL). Applying the proposed methods during training can enhance the robustness of arbitrary neural networks against quantization on existing post-training quantization (PTQ) and quantization-aware training (QAT) algorithms and enables us to obtain a single weight flexible enough to maintain the output quality under various conditions. We conduct extensive studies on CIFAR and ImageNet datasets and validate the effectiveness of the proposed methods.
    A Bayesian Approach to Learning Bandit Structure in Markov Decision Processes. (arXiv:2208.00250v1 [cs.LG])
    In the reinforcement learning literature, there are many algorithms developed for either Contextual Bandit (CB) or Markov Decision Processes (MDP) environments. However, when deploying reinforcement learning algorithms in the real world, even with domain expertise, it is often difficult to know whether it is appropriate to treat a sequential decision making problem as a CB or an MDP. In other words, do actions affect future states, or only the immediate rewards? Making the wrong assumption regarding the nature of the environment can lead to inefficient learning, or even prevent the algorithm from ever learning an optimal policy, even with infinite data. In this work we develop an online algorithm that uses a Bayesian hypothesis testing approach to learn the nature of the environment. Our algorithm allows practitioners to incorporate prior knowledge about whether the environment is that of a CB or an MDP, and effectively interpolate between classical CB and MDP-based algorithms to mitigate against the effects of misspecifying the environment. We perform simulations and demonstrate that in CB settings our algorithm achieves lower regret than MDP-based algorithms, while in non-bandit MDP settings our algorithm is able to learn the optimal policy, often achieving comparable regret to MDP-based algorithms.
    What Do Deep Neural Networks Find in Disordered Structures of Glasses?. (arXiv:2208.00349v1 [cond-mat.dis-nn])
    Glass transitions are widely observed in a range of types of soft matter systems. However, the physical mechanism of these transitions remains unknown, despite years of ambitious research. In particular, an important unanswered question is whether the glass transition is accompanied by a divergence of the correlation lengths of the characteristic static structures. Recently, a method that can predict long-time dynamics from purely static information with high accuracy was proposed; however, even this method is not universal and does not work well for the Kob--Andersen system, which is a typical model of glass-forming liquids. In this study, we developed a method to extract the characteristic structures of glasses using machine learning or, specifically, a convolutional neural network. In particular, we extracted the characteristic structures by quantifying the grounds for the decisions made by the network. We considered two qualitatively different glass-forming binary systems and, through comparisons with several established structural indicators, we demonstrate that our system can identify characteristic structures that depend on the details of the systems. Surprisingly, the extracted structures were strongly correlated with the nonequilibrium aging dynamics on thermal fluctuation.
    Simplex Clustering via sBeta with Applications to Online Adjustments of Black-Box Predictions. (arXiv:2208.00287v1 [cs.CV])
    We explore clustering the softmax predictions of deep neural networks and introduce a novel probabilistic clustering method, referred to as k-sBetas. In the general context of clustering distributions, the existing methods focused on exploring distortion measures tailored to simplex data, such as the KL divergence, as alternatives to the standard Euclidean distance. We provide a general perspective of clustering distributions, which emphasizes that the statistical models underlying distortion-based methods may not be descriptive enough. Instead, we optimize a mixed-variable objective measuring the conformity of data within each cluster to the introduced sBeta density function, whose parameters are constrained and estimated jointly with binary assignment variables. Our versatile formulation approximates a variety of parametric densities for modeling cluster data, and enables to control the cluster-balance bias. This yields highly competitive performances for efficient unsupervised adjustment of black-box predictions in a variety of scenarios, including one-shot classification and unsupervised domain adaptation in real-time for road segmentation. Implementation is available at https://github.com/fchiaroni/Clustering_Softmax_Predictions.
    A Gradient Smoothed Functional Algorithm with Truncated Cauchy Random Perturbations for Stochastic Optimization. (arXiv:2208.00290v1 [math.OC])
    In this paper, we present a stochastic gradient algorithm for minimizing a smooth objective function that is an expectation over noisy cost samples and only the latter are observed for any given parameter. Our algorithm employs a gradient estimation scheme with random perturbations, which are formed using the truncated Cauchy distribution from the unit sphere. We analyze the bias and variance of the proposed gradient estimator. Our algorithm is found to be particularly useful in the case when the objective function is non-convex, and the parameter dimension is high. From an asymptotic convergence analysis, we establish that our algorithm converges almost surely to the set of stationary points of the objective function and obtain the asymptotic convergence rate. We also show that our algorithm avoids unstable equilibria, implying convergence to local minima. Further, we perform a non-asymptotic convergence analysis of our algorithm. In particular, we establish here a non-asymptotic bound for finding an $\epsilon$-stationary point of the non-convex objective function. Finally, we demonstrate numerically through simulations that the performance of our algorithm outperforms GSF, SPSA and RDSA by a significant margin over a few non-convex settings and further validate its performance over convex (noisy) objectives.
    ANOVA-based Automatic Attribute Selection and a Predictive Model for Heart Disease Prognosis. (arXiv:2208.00296v1 [cs.LG])
    Studies show that Studies that cardiovascular diseases (CVDs) are malignant for human health. Thus, it is important to have an efficient way of CVD prognosis. In response to this, the healthcare industry has adopted machine learning-based smart solutions to alleviate the manual process of CVD prognosis. Thus, this work proposes an information fusion technique that combines key attributes of a person through analysis of variance (ANOVA) and domain experts' knowledge. It also introduces a new collection of CVD data samples for emerging research. There are thirty-eight experiments conducted exhaustively to verify the performance of the proposed framework on four publicly available benchmark datasets and the newly created dataset in this work. The ablation study shows that the proposed approach can achieve a competitive mean average accuracy (mAA) of 99.2% and a mean average AUC of 97.9%.
    Functional Rule Extraction Method for Artificial Neural Networks. (arXiv:2208.00335v1 [cs.LG])
    The idea I propose in this paper is a method that is based on comprehensive functions for directed and undirected rule extraction from artificial neural network operations.
    MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures. (arXiv:2208.00277v1 [cs.CV])
    Neural Radiance Fields (NeRFs) have demonstrated amazing ability to synthesize images of 3D scenes from novel views. However, they rely upon specialized volumetric rendering algorithms based on ray marching that are mismatched to the capabilities of widely deployed graphics hardware. This paper introduces a new NeRF representation based on textured polygons that can synthesize novel images efficiently with standard rendering pipelines. The NeRF is represented as a set of polygons with textures representing binary opacities and feature vectors. Traditional rendering of the polygons with a z-buffer yields an image with features at every pixel, which are interpreted by a small, view-dependent MLP running in a fragment shader to produce a final pixel color. This approach enables NeRFs to be rendered with the traditional polygon rasterization pipeline, which provides massive pixel-level parallelism, achieving interactive frame rates on a wide range of compute platforms, including mobile phones.
    Robust Contact State Estimation in Humanoid Walking Gaits. (arXiv:2208.00278v1 [cs.RO])
    In this article, we propose a deep learning framework that provides a unified approach to the problem of leg contact detection in humanoid robot walking gaits. Our formulation accomplishes to accurately and robustly estimate the contact state probability for each leg (i.e., stable or slip/no contact). The proposed framework employs solely proprioceptive sensing and although it relies on simulated ground-truth contact data for the classification process, we demonstrate that it generalizes across varying friction surfaces and different legged robotic platforms and, at the same time, is readily transferred from simulation to practice. The framework is quantitatively and qualitatively assessed in simulation via the use of ground-truth contact data and is contrasted against state of-the-art methods with an ATLAS, a NAO, and a TALOS humanoid robot. Furthermore, its efficacy is demonstrated in base estimation with a real TALOS humanoid. To reinforce further research endeavors, our implementation is offered as an open-source ROS/Python package, coined Legged Contact Detection (LCD).
    Efficient Compilation and Mapping of Fixed Function Combinational Logic onto Digital Signal Processors Targeting Neural Network Inference and Utilizing High-level Synthesis. (arXiv:2208.00302v1 [cs.AR])
    Recent efforts for improving the performance of neural network (NN) accelerators that meet today's application requirements have given rise to a new trend of logic-based NN inference relying on fixed function combinational logic. Mapping such large Boolean functions with many input variables and product terms to digital signal processors (DSPs) on Field-programmable gate arrays (FPGAs) needs a novel framework considering the structure and the reconfigurability of DSP blocks during this process. The proposed methodology in this paper maps the fixed function combinational logic blocks to a set of Boolean functions where Boolean operations corresponding to each function are mapped to DSP devices rather than look-up tables (LUTs) on the FPGAs to take advantage of the high performance, low latency, and parallelism of DSP blocks. % This paper also presents an innovative design and optimization methodology for compilation and mapping of NNs, utilizing fixed function combinational logic to DSPs on FPGAs employing high-level synthesis flow. % Our experimental evaluations across several \REVone{datasets} and selected NNs demonstrate the comparable performance of our framework in terms of the inference latency and output accuracy compared to prior art FPGA-based NN accelerators employing DSPs.
    Delving into Effective Gradient Matching for Dataset Condensation. (arXiv:2208.00311v1 [cs.LG])
    As deep learning models and datasets rapidly scale up, network training is extremely time-consuming and resource-costly. Instead of training on the entire dataset, learning with a small synthetic dataset becomes an efficient solution. Extensive research has been explored in the direction of dataset condensation, among which gradient matching achieves state-of-the-art performance. The gradient matching method directly targets the training dynamics by matching the gradient when training on the original and synthetic datasets. However, there are limited deep investigations into the principle and effectiveness of this method. In this work, we delve into the gradient matching method from a comprehensive perspective and answer the critical questions of what, how, and where to match. We propose to match the multi-level gradients to involve both intra-class and inter-class gradient information. We demonstrate that the distance function should focus on the angle, considering the magnitude simultaneously to delay the overfitting. An overfitting-aware adaptive learning step strategy is also proposed to trim unnecessary optimization steps for algorithmic efficiency improvement. Ablation and comparison experiments demonstrate that our proposed methodology shows superior accuracy, efficiency, and generalization compared to prior work.
    Fair Classification via Transformer Neural Networks: Case Study of an Educational Domain. (arXiv:2206.01410v2 [cs.LG] UPDATED)
    Educational technologies nowadays increasingly use data and Machine Learning (ML) models. This gives the students, instructors, and administrators support and insights for the optimum policy. However, it is well acknowledged that ML models are subject to bias, which raises concerns about the fairness, bias, and discrimination of using these automated ML algorithms in education and its unintended and unforeseen negative consequences. The contribution of bias during the decision-making comes from datasets used for training ML models and the model architecture. This paper presents a preliminary investigation of the fairness of transformer neural networks on the two tabular datasets: Law School and Student-Mathematics. In contrast to classical ML models, the transformer-based models transform these tabular datasets into a richer representation while solving the classification task. We use different fairness metrics for evaluation and check the trade-off between fairness and accuracy of the transformer-based models over the tabular datasets. Empirically, our approach shows impressive results regarding the trade-off between fairness and performance on the Law School dataset.
    An Experimental Study on Learning Correlated Equilibrium in Routing Games. (arXiv:2208.00391v1 [cs.GT])
    We study route choice in a repeated routing game where an uncertain state of nature determines link latency functions, and agents receive private route recommendation. The state is sampled in an i.i.d. manner in every round from a publicly known distribution, and the recommendations are generated by a randomization policy whose mapping from the state is known publicly. In a one-shot setting, the agents are said to obey recommendation if it gives the smallest travel time in a posteriori expectation. A plausible extension to repeated setting is that the likelihood of following recommendation in a round is related to regret from previous rounds. If the regret is of satisficing type with respect to a default choice and is averaged over past rounds and over all agents, then the asymptotic outcome under an obedient recommendation policy coincides with the one-shot outcome. We report findings from an experiment with one participant at a time engaged in repeated route choice decision on computer. In every round, the participant is shown travel time distribution for each route, a route recommendation generated by an obedient policy, and a rating suggestive of average experience of previous participants with the quality of recommendation. Upon entering route choice, the actual travel times are revealed. The participant evaluates the quality of recommendation by submitting a review. This is combined with historical reviews to update rating for the next round. Data analysis from 33 participants each with 100 rounds suggests moderate negative correlation between the display rating and the average regret, and a strong positive correlation between the rating and the likelihood of following recommendation. Overall, under obedient recommendation policy, the rating converges close to its maximum value by the end of the experiments in conjunction with very high frequency of following recommendations.
    enpheeph: A Fault Injection Framework for Spiking and Compressed Deep Neural Networks. (arXiv:2208.00328v1 [cs.NE])
    Research on Deep Neural Networks (DNNs) has focused on improving performance and accuracy for real-world deployments, leading to new models, such as Spiking Neural Networks (SNNs), and optimization techniques, e.g., quantization and pruning for compressed networks. However, the deployment of these innovative models and optimization techniques introduces possible reliability issues, which is a pillar for DNNs to be widely used in safety-critical applications, e.g., autonomous driving. Moreover, scaling technology nodes have the associated risk of multiple faults happening at the same time, a possibility not addressed in state-of-the-art resiliency analyses. Towards better reliability analysis for DNNs, we present enpheeph, a Fault Injection Framework for Spiking and Compressed DNNs. The enpheeph framework enables optimized execution on specialized hardware devices, e.g., GPUs, while providing complete customizability to investigate different fault models, emulating various reliability constraints and use-cases. Hence, the faults can be executed on SNNs as well as compressed networks with minimal-to-none modifications to the underlying code, a feat that is not achievable by other state-of-the-art tools. To evaluate our enpheeph framework, we analyze the resiliency of different DNN and SNN models, with different compression techniques. By injecting a random and increasing number of faults, we show that DNNs can show a reduction in accuracy with a fault rate as low as 7 x 10 ^ (-7) faults per parameter, with an accuracy drop higher than 40%. Run-time overhead when executing enpheeph is less than 20% of the baseline execution time when executing 100 000 faults concurrently, at least 10x lower than state-of-the-art frameworks, making enpheeph future-proof for complex fault injection scenarios. We release enpheeph at https://github.com/Alexei95/enpheeph.
    Neural Correlates of Face Familiarity Perception. (arXiv:2208.00352v1 [q-bio.NC])
    In the domain of face recognition, there exists a puzzling timing discrepancy between results from macaque neurophysiology on the one hand and human electrophysiology on the other. Single unit recordings in macaques have demonstrated face identity specific responses in extra-striate visual cortex within 100 milliseconds of stimulus onset. In EEG and MEG experiments with humans, however, a consistent distinction between neural activity corresponding to unfamiliar and familiar faces has been reported to emerge around 250 ms. This points to the possibility that there may be a hitherto undiscovered early correlate of face familiarity perception in human electrophysiological traces. We report here a successful search for such a correlate in dense MEG recordings using pattern classification techniques. Our analyses reveal markers of face familiarity as early as 85 ms after stimulus onset. Low-level attributes of the images, such as luminance and color distributions, are unable to account for this early emerging response difference. These results help reconcile human and macaque data, and provide clues regarding neural mechanisms underlying familiar face perception.
    Meta-DETR: Image-Level Few-Shot Detection with Inter-Class Correlation Exploitation. (arXiv:2208.00219v1 [cs.CV])
    Few-shot object detection has been extensively investigated by incorporating meta-learning into region-based detection frameworks. Despite its success, the said paradigm is still constrained by several factors, such as (i) low-quality region proposals for novel classes and (ii) negligence of the inter-class correlation among different classes. Such limitations hinder the generalization of base-class knowledge for the detection of novel-class objects. In this work, we design Meta-DETR, which (i) is the first image-level few-shot detector, and (ii) introduces a novel inter-class correlational meta-learning strategy to capture and leverage the correlation among different classes for robust and accurate few-shot object detection. Meta-DETR works entirely at image level without any region proposals, which circumvents the constraint of inaccurate proposals in prevalent few-shot detection frameworks. In addition, the introduced correlational meta-learning enables Meta-DETR to simultaneously attend to multiple support classes within a single feedforward, which allows to capture the inter-class correlation among different classes, thus significantly reducing the misclassification over similar classes and enhancing knowledge generalization to novel classes. Experiments over multiple few-shot object detection benchmarks show that the proposed Meta-DETR outperforms state-of-the-art methods by large margins. The implementation codes are available at https://github.com/ZhangGongjie/Meta-DETR.
    Convex duality for stochastic shortest path problems in known and unknown environments. (arXiv:2208.00330v1 [cs.LG])
    This paper gives an introduction to Stochastic Shortest Path (SSP) problems in known and unknown environments from the perspective of convex optimisation. It first recalls results in the known parameter case, and develops understanding through different proofs. It then focuses on the unknown parameter case, where it studies extended value iteration (EVI) operators. This includes the existing operators used in Rosenberg et al. [26] and Tarbouriech et al. [31] based on the l-1 norm and supremum norm, as well as defining EVI operators corresponding to other norms and divergences, such as the KL-divergence. This paper shows in general how the EVI operators relate to convex programs, and the form of their dual, where strong duality is exhibited. This paper then focuses on whether the bounds from finite horizon research of Neu and Pike-Burke [21] can be applied to these extended value iteration operators in the SSP setting. It shows that similar bounds to [21] for these operators exist, however they lead to operators that are not in general monotone and have more complex convergence properties. In a special case we observe oscillating behaviour. This paper generates open questions on how research may progress, with several examples that require further examination.
    Global Attention-based Encoder-Decoder LSTM Model for Temperature Prediction of Permanent Magnet Synchronous Motors. (arXiv:2208.00293v1 [cs.LG])
    Temperature monitoring is critical for electrical motors to determine if device protection measures should be executed. However, the complexity of the internal structure of Permanent Magnet Synchronous Motors (PMSM) makes the direct temperature measurement of the internal components difficult. This work pragmatically develops three deep learning models to estimate the PMSMs' internal temperature based on readily measurable external quantities. The proposed supervised learning models exploit Long Short-Term Memory (LSTM) modules, bidirectional LSTM, and attention mechanism to form encoder-decoder structures to predict simultaneously the temperatures of the stator winding, tooth, yoke, and permanent magnet. Experiments were conducted in an exhaustive manner on a benchmark dataset to verify the proposed models' performances. The comparative analysis shows that the proposed global attention-based encoder-decoder (EnDec) model provides a competitive overall performance of 1.72 Mean Squared Error (MSE) and 5.34 Mean Absolute Error (MAE).
    Geometric deep learning for computational mechanics Part II: Graph embedding for interpretable multiscale plasticity. (arXiv:2208.00246v1 [cs.LG])
    The history-dependent behaviors of classical plasticity models are often driven by internal variables evolved according to phenomenological laws. The difficulty to interpret how these internal variables represent a history of deformation, the lack of direct measurement of these internal variables for calibration and validation, and the weak physical underpinning of those phenomenological laws have long been criticized as barriers to creating realistic models. In this work, geometric machine learning on graph data (e.g. finite element solutions) is used as a means to establish a connection between nonlinear dimensional reduction techniques and plasticity models. Geometric learning-based encoding on graphs allows the embedding of rich time-history data onto a low-dimensional Euclidean space such that the evolution of plastic deformation can be predicted in the embedded feature space. A corresponding decoder can then convert these low-dimensional internal variables back into a weighted graph such that the dominating topological features of plastic deformation can be observed and analyzed.
    Automatically Categorising GitHub Repositories by Application Domain. (arXiv:2208.00269v1 [cs.SE])
    GitHub is the largest host of open source software on the Internet. This large, freely accessible database has attracted the attention of practitioners and researchers alike. But as GitHub's growth continues, it is becoming increasingly hard to navigate the plethora of repositories which span a wide range of domains. Past work has shown that taking the application domain into account is crucial for tasks such as predicting the popularity of a repository and reasoning about project quality. In this work, we build on a previously annotated dataset of 5,000 GitHub repositories to design an automated classifier for categorising repositories by their application domain. The classifier uses state-of-the-art natural language processing techniques and machine learning to learn from multiple data sources and catalogue repositories according to five application domains. We contribute with (1) an automated classifier that can assign popular repositories to each application domain with at least 70% precision, (2) an investigation of the approach's performance on less popular repositories, and (3) a practical application of this approach to answer how the adoption of software engineering practices differs across application domains. Our work aims to help the GitHub community identify repositories of interest and opens promising avenues for future work investigating differences between repositories from different application domains.
    Adding Context to Source Code Representations for Deep Learning. (arXiv:2208.00203v1 [cs.SE])
    Deep learning models have been successfully applied to a variety of software engineering tasks, such as code classification, summarisation, and bug and vulnerability detection. In order to apply deep learning to these tasks, source code needs to be represented in a format that is suitable for input into the deep learning model. Most approaches to representing source code, such as tokens, abstract syntax trees (ASTs), data flow graphs (DFGs), and control flow graphs (CFGs) only focus on the code itself and do not take into account additional context that could be useful for deep learning models. In this paper, we argue that it is beneficial for deep learning models to have access to additional contextual information about the code being analysed. We present preliminary evidence that encoding context from the call hierarchy along with information from the code itself can improve the performance of a state-of-the-art deep learning model for two software engineering tasks. We outline our research agenda for adding further contextual information to source code representations for deep learning.
    PolarMix: A General Data Augmentation Technique for LiDAR Point Clouds. (arXiv:2208.00223v1 [cs.CV])
    LiDAR point clouds, which are usually scanned by rotating LiDAR sensors continuously, capture precise geometry of the surrounding environment and are crucial to many autonomous detection and navigation tasks. Though many 3D deep architectures have been developed, efficient collection and annotation of large amounts of point clouds remain one major challenge in the analytic and understanding of point cloud data. This paper presents PolarMix, a point cloud augmentation technique that is simple and generic but can mitigate the data constraint effectively across different perception tasks and scenarios. PolarMix enriches point cloud distributions and preserves point cloud fidelity via two cross-scan augmentation strategies that cut, edit, and mix point clouds along the scanning direction. The first is scene-level swapping which exchanges point cloud sectors of two LiDAR scans that are cut along the azimuth axis. The second is instance-level rotation and paste which crops point instances from one LiDAR scan, rotates them by multiple angles (to create multiple copies), and paste the rotated point instances into other scans. Extensive experiments show that PolarMix achieves superior performance consistently across different perception tasks and scenarios. In addition, it can work as plug-and-play for various 3D deep architectures and also performs well for unsupervised domain adaptation.
    Streaming Algorithms for Diversity Maximization with Fairness Constraints. (arXiv:2208.00194v1 [cs.DS])
    Diversity maximization is a fundamental problem with wide applications in data summarization, web search, and recommender systems. Given a set $X$ of $n$ elements, it asks to select a subset $S$ of $k \ll n$ elements with maximum \emph{diversity}, as quantified by the dissimilarities among the elements in $S$. In this paper, we focus on the diversity maximization problem with fairness constraints in the streaming setting. Specifically, we consider the max-min diversity objective, which selects a subset $S$ that maximizes the minimum distance (dissimilarity) between any pair of distinct elements within it. Assuming that the set $X$ is partitioned into $m$ disjoint groups by some sensitive attribute, e.g., sex or race, ensuring \emph{fairness} requires that the selected subset $S$ contains $k_i$ elements from each group $i \in [1,m]$. A streaming algorithm should process $X$ sequentially in one pass and return a subset with maximum \emph{diversity} while guaranteeing the fairness constraint. Although diversity maximization has been extensively studied, the only known algorithms that can work with the max-min diversity objective and fairness constraints are very inefficient for data streams. Since diversity maximization is NP-hard in general, we propose two approximation algorithms for fair diversity maximization in data streams, the first of which is $\frac{1-\varepsilon}{4}$-approximate and specific for $m=2$, where $\varepsilon \in (0,1)$, and the second of which achieves a $\frac{1-\varepsilon}{3m+2}$-approximation for an arbitrary $m$. Experimental results on real-world and synthetic datasets show that both algorithms provide solutions of comparable quality to the state-of-the-art algorithms while running several orders of magnitude faster in the streaming setting.
    Learning-based Localizability Estimation for Robust LiDAR Localization. (arXiv:2203.05698v2 [cs.RO] UPDATED)
    LiDAR-based localization and mapping is one of the core components in many modern robotic systems due to the direct integration of range and geometry, allowing for precise motion estimation and generation of high quality maps in real-time. Yet, as a consequence of insufficient environmental constraints present in the scene, this dependence on geometry can result in localization failure, happening in self-symmetric surroundings such as tunnels. This work addresses precisely this issue by proposing a neural network-based estimation approach for detecting (non-)localizability during robot operation. Special attention is given to the localizability of scan-to-scan registration, as it is a crucial component in many LiDAR odometry estimation pipelines. In contrast to previous, mostly traditional detection approaches, the proposed method enables early detection of failure by estimating the localizability on raw sensor measurements without evaluating the underlying registration optimization. Moreover, previous approaches remain limited in their ability to generalize across environments and sensor types, as heuristic-tuning of degeneracy detection thresholds is required. The proposed approach avoids this problem by learning from a collection of different environments, allowing the network to function over various scenarios. Furthermore, the network is trained exclusively on simulated data, avoiding arduous data collection in challenging and degenerate, often hard-to-access, environments. The presented method is tested during field experiments conducted across challenging environments and on two different sensor types without any modifications. The observed detection performance is on par with state-of-the-art methods after environment-specific threshold tuning.
    On Connecting Deep Trigonometric Networks with Deep Gaussian Processes: Covariance, Expressivity, and Neural Tangent Kernel. (arXiv:2203.07411v3 [cs.LG] UPDATED)
    Deep Gaussian Process (DGP) as a model prior in Bayesian learning intuitively exploits the expressive power in function composition. DGPs also offer diverse modeling capabilities, but inference is challenging because marginalization in latent function space is not tractable. With Bochner's theorem, DGP with squared exponential kernel can be viewed as a deep trigonometric network consisting of the random feature layers, sine and cosine activation units, and random weight layers. In the wide limit with a bottleneck, we show that the weight space view yields the same effective covariance functions which were obtained previously in function space. Also, varying the prior distributions over network parameters is equivalent to employing different kernels. As such, DGPs can be translated into the deep bottlenecked trig networks, with which the exact maximum a posteriori estimation can be obtained. Interestingly, the network representation enables the study of DGP's neural tangent kernel, which may also reveal the mean of the intractable predictive distribution. Statistically, unlike the shallow networks, deep networks of finite width have covariance deviating from the limiting kernel, and the inner and outer widths may play different roles in feature learning. Numerical simulations are present to support our findings.
    Generating Diverse Realistic Laughter for Interactive Art. (arXiv:2111.03146v2 [cs.LG] UPDATED)
    We propose an interactive art project to make those rendered invisible by the COVID-19 crisis and its concomitant solitude reappear through the welcome melody of laughter, and connections created and explored through advanced laughter synthesis approaches. However, the unconditional generation of the diversity of human emotional responses in high-quality auditory synthesis remains an open problem, with important implications for the application of these approaches in artistic settings. We developed LaughGANter, an approach to reproduce the diversity of human laughter using generative adversarial networks (GANs). When trained on a dataset of diverse laughter samples, LaughGANter generates diverse, high quality laughter samples, and learns a latent space suitable for emotional analysis and novel artistic applications such as latent mixing/interpolation and emotional transfer.
    TCMI: a non-parametric mutual-dependence estimator for multivariate continuous distributions. (arXiv:2001.11212v3 [stat.ML] UPDATED)
    The identification of relevant features, i.e., the driving variables that determine a process or the properties of a system, is an essential part of the analysis of data sets with a large number of variables. A mathematical rigorous approach to quantifying the relevance of these features is mutual information. Mutual information determines the relevance of features in terms of their joint mutual dependence to the property of interest. However, mutual information requires as input probability distributions, which cannot be reliably estimated from continuous distributions such as physical quantities like lengths or energies. Here, we introduce total cumulative mutual information (TCMI), a measure of the relevance of mutual dependences that extends mutual information to random variables of continuous distribution based on cumulative probability distributions. TCMI is a non-parametric, robust, and deterministic measure that facilitates comparisons and rankings between feature sets with different cardinality. The ranking induced by TCMI allows for feature selection, i.e., the identification of variable sets that are nonlinear statistically related to a property of interest, taking into account the number of data samples as well as the cardinality of the set of variables. We evaluate the performance of our measure with simulated data, compare its performance with similar multivariate-dependence measures, and demonstrate the effectiveness of our feature-selection method on a set of standard data sets and a typical scenario in materials science.
    Inductive Biases for Deep Learning of Higher-Level Cognition. (arXiv:2011.15091v4 [cs.LG] UPDATED)
    A fascinating hypothesis is that human and animal intelligence could be explained by a few principles (rather than an encyclopedic list of heuristics). If that hypothesis was correct, we could more easily both understand our own intelligence and build intelligent machines. Just like in physics, the principles themselves would not be sufficient to predict the behavior of complex systems like brains, and substantial computation might be needed to simulate human-like intelligence. This hypothesis would suggest that studying the kind of inductive biases that humans and animals exploit could help both clarify these principles and provide inspiration for AI research and neuroscience theories. Deep learning already exploits several key inductive biases, and this work considers a larger list, focusing on those which concern mostly higher-level and sequential conscious processing. The objective of clarifying these particular principles is that they could potentially help us build AI systems benefiting from humans' abilities in terms of flexible out-of-distribution and systematic generalization, which is currently an area where a large gap exists between state-of-the-art machine learning and human intelligence.
    Intelligent decision-making method of TBM operating parameters based on multiple constraints and objective optimization. (arXiv:2208.00404v1 [cs.LG])
    The decision-making of TBM operating parameters has an important guiding significance for TBM safe and efficient construction, and it has been one of the research hotpots in the field of TBM tunneling. For this purpose, this paper introduces rock-breaking rules into machine learning method, and a rock-machine mapping dual-driven by physical-rule and data-mining is established with high accuracy. This dual-driven mappings are subsequently used as objective function and constraints to build a decision-making method for TBM operating parameters. By searching the revolution per minute and penetration corresponding to the extremum of the objective function subject to the constraints, the optimal operating parameters can be obtained. This method is verified in the field of the Second Water Source Channel of Hangzhou, China, resulting in the average penetration rate increased by 11.3%, and the total cost decreased by 10.0%, which proves the practicability and effectiveness of the developed decision-making model.
    Vector-Based Data Improves Left-Right Eye-Tracking Classifier Performance After a Covariate Distributional Shift. (arXiv:2208.00465v1 [cs.LG])
    The main challenges of using electroencephalogram (EEG) signals to make eye-tracking (ET) predictions are the differences in distributional patterns between benchmark data and real-world data and the noise resulting from the unintended interference of brain signals from multiple sources. Increasing the robustness of machine learning models in predicting eye-tracking position from EEG data is therefore integral for both research and consumer use. In medical research, the usage of more complicated data collection methods to test for simpler tasks has been explored to address this very issue. In this study, we propose a fine-grain data approach for EEG-ET data collection in order to create more robust benchmarking. We train machine learning models utilizing both coarse-grain and fine-grain data and compare their accuracies when tested on data of similar/different distributional patterns in order to determine how susceptible EEG-ET benchmarks are to differences in distributional data. We apply a covariate distributional shift to test for this susceptibility. Results showed that models trained on fine-grain, vector-based data were less susceptible to distributional shifts than models trained on coarse-grain, binary-classified data.
    Evaluating Table Structure Recognition: A New Perspective. (arXiv:2208.00385v1 [cs.CV])
    Existing metrics used to evaluate table structure recognition algorithms have shortcomings with regard to capturing text and empty cells alignment. In this paper, we build on prior work and propose a new metric - TEDS based IOU similarity (TEDS (IOU)) for table structure recognition which uses bounding boxes instead of text while simultaneously being robust against the above disadvantages. We demonstrate the effectiveness of our metric against previous metrics through various examples.
    Revisiting the Critical Factors of Augmentation-Invariant Representation Learning. (arXiv:2208.00275v1 [cs.CV])
    We focus on better understanding the critical factors of augmentation-invariant representation learning. We revisit MoCo v2 and BYOL and try to prove the authenticity of the following assumption: different frameworks bring about representations of different characteristics even with the same pretext task. We establish the first benchmark for fair comparisons between MoCo v2 and BYOL, and observe: (i) sophisticated model configurations enable better adaptation to pre-training dataset; (ii) mismatched optimization strategies of pre-training and fine-tuning hinder model from achieving competitive transfer performances. Given the fair benchmark, we make further investigation and find asymmetry of network structure endows contrastive frameworks to work well under the linear evaluation protocol, while may hurt the transfer performances on long-tailed classification tasks. Moreover, negative samples do not make models more sensible to the choice of data augmentations, nor does the asymmetric network structure. We believe our findings provide useful information for future work.
    Speckle2Speckle: Unsupervised Learning of Ultrasound Speckle Filtering Without Clean Data. (arXiv:2208.00402v1 [eess.IV])
    In ultrasound imaging the appearance of homogeneous regions of tissue is subject to speckle, which for certain applications can make the detection of tissue irregularities difficult. To cope with this, it is common practice to apply speckle reduction filters to the images. Most conventional filtering techniques are fairly hand-crafted and often need to be finely tuned to the present hardware, imaging scheme and application. Learning based techniques on the other hand suffer from the need for a target image for training (in case of fully supervised techniques) or require narrow, complex physics-based models of the speckle appearance that might not apply in all cases. With this work we propose a deep-learning based method for speckle removal without these limitations. To enable this, we make use of realistic ultrasound simulation techniques that allow for instantiation of several independent speckle realizations that represent the exact same tissue, thus allowing for the application of image reconstruction techniques that work with pairs of differently corrupted data. Compared to two other state-of-the-art approaches (non-local means and the Optimized Bayesian non-local means filter) our method performs favorably in qualitative comparisons and quantitative evaluation, despite being trained on simulations alone, and is several orders of magnitude faster.
    Learning to Prompt for Vision-Language Models. (arXiv:2109.01134v4 [cs.CV] UPDATED)
    Large pre-trained vision-language models like CLIP have shown great potential in learning representations that are transferable across a wide range of downstream tasks. Different from the traditional representation learning that is based mostly on discretized labels, vision-language pre-training aligns images and texts in a common feature space, which allows zero-shot transfer to a downstream task via prompting, i.e., classification weights are synthesized from natural language describing classes of interest. In this work, we show that a major challenge for deploying such models in practice is prompt engineering, which requires domain expertise and is extremely time-consuming -- one needs to spend a significant amount of time on words tuning since a slight change in wording could have a huge impact on performance. Inspired by recent advances in prompt learning research in natural language processing (NLP), we propose Context Optimization (CoOp), a simple approach specifically for adapting CLIP-like vision-language models for downstream image recognition. Concretely, CoOp models a prompt's context words with learnable vectors while the entire pre-trained parameters are kept fixed. To handle different image recognition tasks, we provide two implementations of CoOp: unified context and class-specific context. Through extensive experiments on 11 datasets, we demonstrate that CoOp requires as few as one or two shots to beat hand-crafted prompts with a decent margin and is able to gain significant improvements over prompt engineering with more shots, e.g., with 16 shots the average gain is around 15% (with the highest reaching over 45%). Despite being a learning-based approach, CoOp achieves superb domain generalization performance compared with the zero-shot model using hand-crafted prompts.  ( 3 min )
    Bayesian Active Learning for Sim-to-Real Robotic Perception. (arXiv:2109.11547v3 [cs.RO] UPDATED)
    While learning from synthetic training data has recently gained an increased attention, in real-world robotic applications, there are still performance deficiencies due to the so-called Sim-to-Real gap. In practice, this gap is hard to resolve with only synthetic data. Therefore, we focus on an efficient acquisition of real data within a Sim-to-Real learning pipeline. Concretely, we employ deep Bayesian active learning to minimize manual annotation efforts and devise an autonomous learning paradigm to select the data that is considered useful for the human expert to annotate. To achieve this, a Bayesian Neural Network (BNN) object detector providing reliable uncertainty estimates is adapted to infer the informativeness of the unlabeled data. Furthermore, to cope with mis-alignments of the label distribution in uncertainty-based sampling, we develop an effective randomized sampling strategy that performs favorably compared to other complex alternatives. In our experiments on object classification and detection, we show benefits of our approach and provide evidence that labeling efforts can be reduced significantly. Finally, we demonstrate the practical effectiveness of this idea in a grasping task on an assistive robot.  ( 3 min )
    How Self-Supervised Learning Can be Used for Fine-Grained Head Pose Estimation?. (arXiv:2108.04893v6 [cs.CV] UPDATED)
    The cost of head pose labeling is the main challenge of improving the fine-grained Head Pose Estimation (HPE). Although Self-Supervised Learning (SSL) can be a solution to the lack of huge amounts of labeled data, its efficacy for fine-grained HPE is not yet fully explored. This study aims to assess the usage of SSL in fine-grained HPE based on two scenarios: (1) using SSL for weights pre-training procedure, and (2) leveraging auxiliary SSL losses besides HPE. We design a Hybrid Multi-Task Learning (HMTL) architecture based on the ResNet50 backbone in which both strategies are applied. Our experimental results reveal that the combination of both scenarios is the best for HPE. Together, the average error rate is reduced up to 23.1% for AFLW2000 and 14.2% for BIWI benchmark compared to the baseline. Moreover, it is found that some SSL methods are more suitable for transfer learning, while others may be effective when they are considered as auxiliary tasks incorporated into supervised learning. Finally, it is shown that by using the proposed HMTL architecture, the average error is reduced with different types of initial weights: random, ImageNet and SSL pre-trained weights.  ( 3 min )
    Decoupled Contrastive Learning. (arXiv:2110.06848v3 [cs.LG] UPDATED)
    Contrastive learning (CL) is one of the most successful paradigms for self-supervised learning (SSL). In a principled way, it considers two augmented "views" of the same image as positive to be pulled closer, and all other images as negative to be pushed further apart. However, behind the impressive success of CL-based techniques, their formulation often relies on heavy-computation settings, including large sample batches, extensive training epochs, etc. We are thus motivated to tackle these issues and establish a simple, efficient, yet competitive baseline of contrastive learning. Specifically, we identify, from theoretical and empirical studies, a noticeable negative-positive-coupling (NPC) effect in the widely used InfoNCE loss, leading to unsuitable learning efficiency concerning the batch size. By removing the NPC effect, we propose decoupled contrastive learning (DCL) loss, which removes the positive term from the denominator and significantly improves the learning efficiency. DCL achieves competitive performance with less sensitivity to sub-optimal hyperparameters, requiring neither large batches in SimCLR, momentum encoding in MoCo, or large epochs. We demonstrate with various benchmarks while manifesting robustness as much less sensitive to suboptimal hyperparameters. Notably, SimCLR with DCL achieves 68.2% ImageNet-1K top-1 accuracy using batch size 256 within 200 epochs pre-training, outperforming its SimCLR baseline by 6.4%. Further, DCL can be combined with the SOTA contrastive learning method, NNCLR, to achieve 72.3% ImageNet-1K top-1 accuracy with 512 batch size in 400 epochs, which represents a new SOTA in contrastive learning. We believe DCL provides a valuable baseline for future contrastive SSL studies.  ( 3 min )
    Multi-Exit Semantic Segmentation Networks. (arXiv:2106.03527v3 [cs.CV] UPDATED)
    Semantic segmentation arises as the backbone of many vision systems, spanning from self-driving cars and robot navigation to augmented reality and teleconferencing. Frequently operating under stringent latency constraints within a limited resource envelope, optimising for efficient execution becomes important. At the same time, the heterogeneous capabilities of the target platforms and the diverse constraints of different applications require the design and training of multiple target-specific segmentation models, leading to excessive maintenance costs. To this end, we propose a framework for converting state-of-the-art segmentation CNNs to Multi-Exit Semantic Segmentation (MESS) networks: specially trained models that employ parametrised early exits along their depth to i) dynamically save computation during inference on easier samples and ii) save training and maintenance cost by offering a post-training customisable speed-accuracy trade-off. Designing and training such networks naively can hurt performance. Thus, we propose a novel two-staged training scheme for multi-exit networks. Furthermore, the parametrisation of MESS enables co-optimising the number, placement and architecture of the attached segmentation heads along with the exit policy, upon deployment via exhaustive search in <1 GPUh. This allows MESS to rapidly adapt to the device capabilities and application requirements for each target use-case, offering a train-once-deploy-everywhere solution. MESS variants achieve latency gains of up to 2.83x with the same accuracy, or 5.33 pp higher accuracy for the same computational budget, compared to the original backbone network. Lastly, MESS delivers orders of magnitude faster architectural customisation, compared to state-of-the-art techniques.  ( 3 min )
    Causal Inference in Natural Language Processing: Estimation, Prediction, Interpretation and Beyond. (arXiv:2109.00725v2 [cs.CL] UPDATED)
    A fundamental goal of scientific research is to learn about causal relationships. However, despite its critical role in the life and social sciences, causality has not had the same importance in Natural Language Processing (NLP), which has traditionally placed more emphasis on predictive tasks. This distinction is beginning to fade, with an emerging area of interdisciplinary research at the convergence of causal inference and language processing. Still, research on causality in NLP remains scattered across domains without unified definitions, benchmark datasets and clear articulations of the challenges and opportunities in the application of causal inference to the textual domain, with its unique properties. In this survey, we consolidate research across academic areas and situate it in the broader NLP landscape. We introduce the statistical challenge of estimating causal effects with text, encompassing settings where text is used as an outcome, treatment, or to address confounding. In addition, we explore potential uses of causal inference to improve the robustness, fairness, and interpretability of NLP models. We thus provide a unified overview of causal inference for the NLP community.  ( 3 min )
    Machine learning-based conditional mean filter: a generalization of the ensemble Kalman filter for nonlinear data assimilation. (arXiv:2106.07908v2 [cs.LG] UPDATED)
    This paper presents the machine learning-based ensemble conditional mean filter (ML-EnCMF) -- a filtering method based on the conditional mean filter (CMF) previously introduced in the literature. The updated mean of the CMF matches that of the posterior, obtained by applying Bayes' rule on the filter's forecast distribution. Moreover, we show that the CMF's updated covariance coincides with the expected conditional covariance. Implementing the EnCMF requires computing the conditional mean (CM). A likelihood-based estimator is prone to significant errors for small ensemble sizes, causing the filter divergence. We develop a systematical methodology for integrating machine learning into the EnCMF based on the CM's orthogonal projection property. First, we use a combination of an artificial neural network (ANN) and a linear function, obtained based on the ensemble Kalman filter (EnKF), to approximate the CM, enabling the ML-EnCMF to inherit EnKF's advantages. Secondly, we apply a suitable variance reduction technique to reduce statistical errors when estimating loss function. Lastly, we propose a model selection procedure for element-wisely selecting the applied filter, i.e., either the EnKF or ML-EnCMF, at each updating step. We demonstrate the ML-EnCMF performance using the Lorenz-63 and Lorenz-96 systems and show that the ML-EnCMF outperforms the EnKF and the likelihood-based EnCMF.  ( 3 min )
    A Joint Graph and Image Convolution Network for Automatic Brain Tumor Segmentation. (arXiv:2109.05580v2 [eess.IV] UPDATED)
    We present a joint graph convolution-image convolution neural network as our submission to the Brain Tumor Segmentation (BraTS) 2021 challenge. We model each brain as a graph composed of distinct image regions, which is initially segmented by a graph neural network (GNN). Subsequently, the tumorous volume identified by the GNN is further refined by a simple (voxel) convolutional neural network (CNN), which produces the final segmentation. This approach captures both global brain feature interactions via the graphical representation and local image details through the use of convolutional filters. We find that the GNN component by itself can effectively identify and segment the brain tumors. The addition of the CNN further improves the median performance of the model by 2 percent across all metrics evaluated. On the validation set, our joint GNN-CNN model achieves mean Dice scores of 0.89, 0.81, 0.73 and mean Hausdorff distances (95th percentile) of 6.8, 12.6, 28.2mm on the whole tumor, core tumor, and enhancing tumor, respectively.  ( 3 min )
    DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models. (arXiv:2111.00160v2 [cs.LG] UPDATED)
    Gigantic pre-trained models have become central to natural language processing (NLP), serving as the starting point for fine-tuning towards a range of downstream tasks. However, two pain points persist for this paradigm: (a) as the pre-trained models grow bigger (e.g., 175B parameters for GPT-3), even the fine-tuning process can be time-consuming and computationally expensive; (b) the fine-tuned model has the same size as its starting point by default, which is neither sensible due to its more specialized functionality, nor practical since many fine-tuned models will be deployed in resource-constrained environments. To address these pain points, we propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights. Our proposed framework, dubbed Dually Sparsity-Embedded Efficient Tuning (DSEE), aims to achieve two key objectives: (i) parameter efficient fine-tuning - by enforcing sparsity-aware low-rank updates on top of the pre-trained weights; and (ii) resource-efficient inference - by encouraging a sparse weight structure towards the final fine-tuned model. We leverage sparsity in these two directions by exploiting both unstructured and structured sparse patterns in pre-trained language models via a unified approach. Extensive experiments and in-depth investigations, with diverse network backbones (i.e., BERT, RoBERTa, and GPT-2) on dozens of datasets, consistently demonstrate impressive parameter-/inference-efficiency, while maintaining competitive downstream performance. For instance, DSEE saves about 25% inference FLOPs while achieving comparable performance, with 0.5% trainable parameters on BERT. Codes are available in https://github.com/VITA-Group/DSEE.  ( 3 min )
    Tight Concentrations and Confidence Sequences from the Regret of Universal Portfolio. (arXiv:2110.14099v3 [stat.ML] UPDATED)
    A classic problem in statistics is the estimation of the expectation of random variables from samples. This gives rise to the tightly connected problems of deriving concentration inequalities and confidence sequences, that is confidence intervals that hold uniformly over time. Previous work has shown how to easily convert the regret guarantee of an online betting algorithm into a time-uniform concentration inequality. In this paper, we show that we can go even further: We show that the regret of universal portfolio algorithms give rise to new implicit time-uniform concentrations and state-of-the-art empirically calculated confidence sequences. In particular, our numerically obtained confidence sequences can never be vacuous, even with a single sample, and satisfy the law of iterated logarithm.  ( 2 min )
    PatrickStar: Parallel Training of Pre-trained Models via Chunk-based Memory Management. (arXiv:2108.05818v4 [cs.LG] UPDATED)
    The pre-trained model (PTM) is revolutionizing Artificial Intelligence (AI) technology. However, the hardware requirement of PTM training is prohibitively high, making it a game for a small proportion of people. Therefore, we proposed PatrickStar system to lower the hardware requirements of PTMs and make them accessible to everyone. PatrickStar uses the CPU-GPU heterogeneous memory space to store the model data. Different from existing works, we organize the model data in memory chunks and dynamically distribute them in the heterogeneous memory. Guided by the runtime memory statistics collected in a warm-up iteration, chunks are orchestrated efficiently in heterogeneous memory and generate lower CPU-GPU data transmission volume and higher bandwidth utilization. Symbiosis with the Zero Redundancy Optimizer, PatrickStar scales to multiple GPUs on multiple nodes. % using data parallelism. The system can train tasks on bigger models and larger batch sizes, which cannot be accomplished by existing works. Experimental results show that PatrickStar extends model scales 2.27 and 2.5 times of DeepSpeed, and consistently exhibits significantly higher execution speed. PatricStar also successfully runs the 175B GPT3 training task on a 32 GPU cluster. Our code is publicly available at https://github.com/Tencent/PatrickStar.  ( 3 min )
    YAHPO Gym -- An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization. (arXiv:2109.03670v4 [cs.LG] UPDATED)
    When developing and analyzing new hyperparameter optimization methods, it is vital to empirically evaluate and compare them on well-curated benchmark suites. In this work, we propose a new set of challenging and relevant benchmark problems motivated by desirable properties and requirements for such benchmarks. Our new surrogate-based benchmark collection consists of 14 scenarios that in total constitute over 700 multi-fidelity hyperparameter optimization problems, which all enable multi-objective hyperparameter optimization. Furthermore, we empirically compare surrogate-based benchmarks to the more widely-used tabular benchmarks, and demonstrate that the latter may produce unfaithful results regarding the performance ranking of HPO methods. We examine and compare our benchmark collection with respect to defined requirements and propose a single-objective as well as a multi-objective benchmark suite on which we compare 7 single-objective and 7 multi-objective optimizers in a benchmark experiment. Our software is available at [https://github.com/slds-lmu/yahpo_gym].  ( 3 min )
    EMFlow: Data Imputation in Latent Space via EM and Deep Flow Models. (arXiv:2106.04804v2 [cs.LG] UPDATED)
    The presence of missing values within high-dimensional data is an ubiquitous problem for many applied sciences. A serious limitation of many available data mining and machine learning methods is their inability to handle partially missing values and so an integrated approach that combines imputation and model estimation is vital for down-stream analysis. A computationally fast algorithm, called EMFlow, is introduced that performs imputation in a latent space via an online version of Expectation-Maximization (EM) algorithm by using a normalizing flow (NF) model which maps the data space to a latent space. The proposed EMFlow algorithm is iterative, involving updating the parameters of online EM and NF alternatively. Extensive experimental results for high-dimensional multivariate and image datasets are presented to illustrate the superior performance of the EMFlow compared to a couple of recently available methods in terms of both predictive accuracy and speed of algorithmic convergence. We provide code for all our experiments.  ( 2 min )
    PM-FSM: Policies Modulating Finite State Machine for Robust Quadrupedal Locomotion. (arXiv:2109.12696v2 [cs.RO] UPDATED)
    Deep reinforcement learning (deep RL) has emerged as an effective tool for developing controllers for legged robots. However, vanilla deep RL often requires a tremendous amount of training samples and is not feasible for achieving robust behaviors. Instead, researchers have investigated a novel policy architecture by incorporating human experts' knowledge, such as Policies Modulating Trajectory Generators (PMTG). This architecture builds a recurrent control loop by combining a parametric trajectory generator (TG) and a feedback policy network to achieve more robust behaviors. To take advantage of human experts' knowledge but eliminate time-consuming interactive teaching, researchers have investigated a novel architecture, Policies Modulating Trajectory Generators (PMTG), which builds a recurrent control loop by combining a parametric trajectory generator (TG) and a feedback policy network to achieve more robust behaviors using intuitive prior knowledge. In this work, we propose Policies Modulating Finite State Machine (PM-FSM) by replacing TGs with contact-aware finite state machines (FSM), which offer more flexible control of each leg. Compared with the TGs, FSMs offer high-level management on each leg motion generator and enable a flexible state arrangement, which makes the learned behavior less vulnerable to unseen perturbations or challenging terrains. This invention offers an explicit notion of contact events to the policy to negotiate unexpected perturbations. We demonstrated that the proposed architecture could achieve more robust behaviors in various scenarios, such as challenging terrains or external perturbations, on both simulated and real robots. The supplemental video can be found at: https://youtu.be/78cboMqTkJQ.  ( 3 min )
    CENN: Conservative energy method based on neural networks with subdomains for solving variational problems involving heterogeneous and complex geometries. (arXiv:2110.01359v4 [math.NA] UPDATED)
    We propose a conservative energy method based on neural networks with subdomains for solving variational problems (CENN), where the admissible function satisfying the essential boundary condition without boundary penalty is constructed by the radial basis function (RBF), particular solution neural network, and general neural network. Loss term is the potential energy, optimized based on the principle of minimum potential energy. The loss term at the interfaces has the lower order derivative compared to the strong form PINN with subdomains. The advantage of the proposed method is higher efficiency, more accurate, and less hyperparameters than the strong form PINN with subdomains. Another advantage of the proposed method is that it can apply to complex geometries based on the special construction of the admissible function. To analyze its performance, the proposed method CENN is used to model representative PDEs, the examples include strong discontinuity, singularity, complex boundary, non-linear, and heterogeneous problems. Furthermore, it outperforms other methods when dealing with heterogeneous problems.  ( 3 min )
    Learning to Control DC Motor for Micromobility in Real Time with Reinforcement Learning. (arXiv:2108.00138v4 [cs.LG] UPDATED)
    Autonomous micromobility has been attracting the attention of researchers and practitioners in recent years. A key component of many micro-transport vehicles is the DC motor, a complex dynamical system that is continuous and non-linear. Learning to quickly control the DC motor in the presence of disturbances and uncertainties is desired for various applications that require robustness and stability. Techniques to accomplish this task usually rely on a mathematical system model, which is often insufficient to anticipate the effects of time-varying and interrelated sources of non-linearities. While some model-free approaches have been successful at the task, they rely on massive interactions with the system and are trained in specialized hardware in order to fit a highly parameterized controller. In this work, we learn to steer a DC motor via sample-efficient reinforcement learning. Using data collected from hardware interactions in the real world, we additionally build a simulator to experiment with a wide range of parameters and learning strategies. With the best parameters found, we learn an effective control policy in one minute and 53 seconds on a simulation and in 10 minutes and 35 seconds on a physical system.  ( 3 min )
    Error Loss Networks. (arXiv:2106.03722v3 [cs.LG] UPDATED)
    A novel model called error loss network (ELN) is proposed to build an error loss function for supervised learning. The ELN is in structure similar to a radial basis function (RBF) neural network, but its input is an error sample and output is a loss corresponding to that error sample. That means the nonlinear input-output mapper of ELN creates an error loss function. The proposed ELN provides a unified model for a large class of error loss functions, which includes some information theoretic learning (ITL) loss functions as special cases. The activation function, weight parameters and network size of the ELN can be predetermined or learned from the error samples. On this basis, we propose a new machine learning paradigm where the learning process is divided into two stages: first, learning a loss function using an ELN; second, using the learned loss function to continue to perform the learning. Experimental results are presented to demonstrate the desirable performance of the new method.  ( 2 min )
    Adversarial Robustness Verification and Attack Synthesis in Stochastic Systems. (arXiv:2110.02125v2 [cs.CR] UPDATED)
    Probabilistic model checking is a useful technique for specifying and verifying properties of stochastic systems including randomized protocols and reinforcement learning models. Existing methods rely on the assumed structure and probabilities of certain system transitions. These assumptions may be incorrect, and may even be violated by an adversary who gains control of system components. In this paper, we develop a formal framework for adversarial robustness in systems modeled as discrete time Markov chains (DTMCs). We base our framework on existing methods for verifying probabilistic temporal logic properties and extend it to include deterministic, memoryless policies acting in Markov decision processes (MDPs). Our framework includes a flexible approach for specifying structure-preserving and non structure-preserving adversarial models. We outline a class of threat models under which adversaries can perturb system transitions, constrained by an $\varepsilon$ ball around the original transition probabilities. We define three main DTMC adversarial robustness problems: adversarial robustness verification, maximal $\delta$ synthesis, and worst case attack synthesis. We present two optimization-based solutions to these three problems, leveraging traditional and parametric probabilistic model checking techniques. We then evaluate our solutions on two stochastic protocols and a collection of Grid World case studies, which model an agent acting in an environment described as an MDP. We find that the parametric solution results in fast computation for small parameter spaces. In the case of less restrictive (stronger) adversaries, the number of parameters increases, and directly computing property satisfaction probabilities is more scalable. We demonstrate the usefulness of our definitions and solutions by comparing system outcomes over various properties, threat models, and case studies.  ( 3 min )
    On Anytime Learning at Macroscale. (arXiv:2106.09563v4 [cs.LG] UPDATED)
    In many practical applications of machine learning data arrives sequentially over time in large chunks. Practitioners have then to decide how to allocate their computational budget in order to obtain the best performance at any point in time. Online learning theory for convex optimization suggests that the best strategy is to use data as soon as it arrives. However, this might not be the best strategy when using deep non-linear networks, particularly when these perform multiple passes over each chunk of data rendering the overall distribution non i.i.d.. In this paper, we formalize this learning setting in the simplest scenario in which each data chunk is drawn from the same underlying distribution, and make a first attempt at empirically answering the following questions: How long should the learner wait before training on the newly arrived chunks? What architecture should the learner adopt? Should the learner increase capacity over time as more data is observed? We probe this learning setting using convolutional neural networks trained on classic computer vision benchmarks as well as a large transformer model trained on a large-scale language modeling task. Code is available at \url{www.github.com/facebookresearch/ALMA}.  ( 3 min )
    Machine Learning for Postprocessing Medium-range Ensemble Streamflow Forecasts. (arXiv:2106.09547v2 [cs.LG] UPDATED)
    Skillful streamflow forecasts can inform decisions in various areas of water policy and management. We integrate numerical weather prediction ensembles and a distributed hydrological model to generate ensemble streamflow forecasts at medium-range lead times (1 - 7 days). We demonstrate a case study for machine learning application in postprocessing ensemble streamflow forecasts in the Upper Susquehanna River basin in the eastern United States. For forecast verification, we use different metrics such as skill score and reliability diagram conditioned upon the lead time, flow threshold, and season. The verification results show that the machine learning postprocessor can improve streamflow forecasts relative to low complexity forecasts (e.g., climatological and temporal persistence) as well as deterministic and raw ensemble forecasts. As compared to the raw ensembles, relative gain in forecast skill from postprocessor is generally higher at medium-range timescales compared to shorter lead times; high flows compared to low-moderate flows, and warm-season compared to the cool ones. Overall, our results highlight the benefits of machine learning in many aspects for improving both the skill and reliability of streamflow forecasts.  ( 2 min )
    Performance Comparison of Deep RL Algorithms for Energy Systems Optimal Scheduling. (arXiv:2208.00728v1 [eess.SY])
    Taking advantage of their data-driven and model-free features, Deep Reinforcement Learning (DRL) algorithms have the potential to deal with the increasing level of uncertainty due to the introduction of renewable-based generation. To deal simultaneously with the energy systems' operational cost and technical constraints (e.g, generation-demand power balance) DRL algorithms must consider a trade-off when designing the reward function. This trade-off introduces extra hyperparameters that impact the DRL algorithms' performance and capability of providing feasible solutions. In this paper, a performance comparison of different DRL algorithms, including DDPG, TD3, SAC, and PPO, are presented. We aim to provide a fair comparison of these DRL algorithms for energy systems optimal scheduling problems. Results show DRL algorithms' capability of providing in real-time good-quality solutions, even in unseen operational scenarios, when compared with a mathematical programming model of the energy system optimal scheduling problem. Nevertheless, in the case of large peak consumption, these algorithms failed to provide feasible solutions, which can impede their practical implementation.  ( 2 min )
    Graph Transfer Learning via Adversarial Domain Adaptation with Graph Convolution. (arXiv:1909.01541v4 [cs.LG] UPDATED)
    This paper studies the problem of cross-network node classification to overcome the insufficiency of labeled data in a single network. It aims to leverage the label information in a partially labeled source network to assist node classification in a completely unlabeled or partially labeled target network. Existing methods for single network learning cannot solve this problem due to the domain shift across networks. Some multi-network learning methods heavily rely on the existence of cross-network connections, thus are inapplicable for this problem. To tackle this problem, we propose a novel \textcolor{black}{graph} transfer learning framework AdaGCN by leveraging the techniques of adversarial domain adaptation and graph convolution. It consists of two components: a semi-supervised learning component and an adversarial domain adaptation component. The former aims to learn class discriminative node representations with given label information of the source and target networks, while the latter contributes to mitigating the distribution divergence between the source and target domains to facilitate knowledge transfer. Extensive empirical evaluations on real-world datasets show that AdaGCN can successfully transfer class information with a low label rate on the source network and a substantial divergence between the source and target domains. The source code for reproducing the experimental results is available at https://github.com/daiquanyu/AdaGCN.  ( 3 min )
    Practical Deep Reinforcement Learning Approach for Stock Trading. (arXiv:1811.07522v3 [cs.LG] UPDATED)
    Stock trading strategy plays a crucial role in investment companies. However, it is challenging to obtain optimal strategy in the complex and dynamic stock market. We explore the potential of deep reinforcement learning to optimize stock trading strategy and thus maximize investment return. 30 stocks are selected as our trading stocks and their daily prices are used as the training and trading market environment. We train a deep reinforcement learning agent and obtain an adaptive trading strategy. The agent's performance is evaluated and compared with Dow Jones Industrial Average and the traditional min-variance portfolio allocation strategy. The proposed deep reinforcement learning approach is shown to outperform the two baselines in terms of both the Sharpe ratio and cumulative returns.  ( 2 min )
    TransDeepLab: Convolution-Free Transformer-based DeepLab v3+ for Medical Image Segmentation. (arXiv:2208.00713v1 [eess.IV])
    Convolutional neural networks (CNNs) have been the de facto standard in a diverse set of computer vision tasks for many years. Especially, deep neural networks based on seminal architectures such as U-shaped models with skip-connections or atrous convolution with pyramid pooling have been tailored to a wide range of medical image analysis tasks. The main advantage of such architectures is that they are prone to detaining versatile local features. However, as a general consensus, CNNs fail to capture long-range dependencies and spatial correlations due to the intrinsic property of confined receptive field size of convolution operations. Alternatively, Transformer, profiting from global information modelling that stems from the self-attention mechanism, has recently attained remarkable performance in natural language processing and computer vision. Nevertheless, previous studies prove that both local and global features are critical for a deep model in dense prediction, such as segmenting complicated structures with disparate shapes and configurations. To this end, this paper proposes TransDeepLab, a novel DeepLab-like pure Transformer for medical image segmentation. Specifically, we exploit hierarchical Swin-Transformer with shifted windows to extend the DeepLabv3 and model the Atrous Spatial Pyramid Pooling (ASPP) module. A thorough search of the relevant literature yielded that we are the first to model the seminal DeepLab model with a pure Transformer-based model. Extensive experiments on various medical image segmentation tasks verify that our approach performs superior or on par with most contemporary works on an amalgamation of Vision Transformer and CNN-based methods, along with a significant reduction of model complexity. The codes and trained models are publicly available at https://github.com/rezazad68/transdeeplab  ( 3 min )
    UniToBrain dataset: a Brain Perfusion Dataset. (arXiv:2208.00650v1 [eess.IV])
    The CT perfusion (CTP) is a medical exam for measuring the passage of a bolus of contrast solution through the brain on a pixel-by-pixel basis. The objective is to draw "perfusion maps" (namely cerebral blood volume, cerebral blood flow and time to peak) very rapidly for ischemic lesions, and to be able to distinguish between core and penumubra regions. A precise and quick diagnosis, in a context of ischemic stroke, can determine the fate of the brain tissues and guide the intervention and treatment in emergency conditions. In this work we present UniToBrain dataset, the very first open-source dataset for CTP. It comprises a cohort of more than a hundred of patients, and it is accompanied by patients metadata and ground truth maps obtained with state-of-the-art algorithms. We also propose a novel neural networks-based algorithm, using the European library ECVL and EDDL for the image processing and developing deep learning models respectively. The results obtained by the neural network models match the ground truth and open the road towards potential sub-sampling of the required number of CT maps, which impose heavy radiation doses to the patients.  ( 2 min )
    Generative Bias for Visual Question Answering. (arXiv:2208.00690v1 [cs.CV])
    The task of Visual Question Answering (VQA) is known to be plagued by the issue of VQA models exploiting biases within the dataset to make its final prediction. Many previous ensemble based debiasing methods have been proposed where an additional model is purposefully trained to be biased in order to aid in training a robust target model. However, these methods compute the bias for a model from the label statistics of the training data or directly from single modal branches. In contrast, in this work, in order to better learn the bias a target VQA model suffers from, we propose a generative method to train the bias model \emph{directly from the target model}, called GenB. In particular, GenB employs a generative network to learn the bias through a combination of the adversarial objective and knowledge distillation. We then debias our target model with GenB as a bias model, and show through extensive experiments the effects of our method on various VQA bias datasets including VQA-CP2, VQA-CP1, GQA-OOD, and VQA-CE.  ( 2 min )
    Efficient Long-Text Understanding with Short-Text Models. (arXiv:2208.00748v1 [cs.CL])
    Transformer-based pretrained language models (LMs) are ubiquitous across natural language understanding, but cannot be applied to long sequences such as stories, scientific articles and long documents, due to their quadratic complexity. While a myriad of efficient transformer variants have been proposed, they are typically based on custom implementations that require expensive pretraining from scratch. In this work, we propose SLED: SLiding-Encoder and Decoder, a simple approach for processing long sequences that re-uses and leverages battle-tested short-text pretrained LMs. Specifically, we partition the input into overlapping chunks, encode each with a short-text LM encoder and use the pretrained decoder to fuse information across chunks (fusion-in-decoder). We illustrate through controlled experiments that SLED offers a viable strategy for long text understanding and evaluate our approach on SCROLLS, a benchmark with seven datasets across a wide range of language understanding tasks. We find that SLED is competitive with specialized models that are up to 50x larger and require a dedicated and expensive pretraining step.  ( 2 min )
    Intrinsic Universal Measurements of Non-linear Embeddings. (arXiv:1811.01464v2 [cs.LG] UPDATED)
    A basic problem in machine learning is to find a mapping $f$ from a low dimensional latent space $\mathcal{Y}$ to a high dimensional observation space $\mathcal{X}$. Modern tools such as deep neural networks are capable to represent general non-linear mappings. A learner can easily find a mapping which perfectly fits all the observations. However, such a mapping is often not considered as good, because it is not simple enough and can overfit. How to define simplicity? We try to make a formal definition on the amount of information imposed by a non-linear mapping $f$. Intuitively, we measure the local discrepancy between the pullback geometry and the intrinsic geometry of the latent space. Our definition is based on information geometry and is independent of the empirical observations, nor specific parameterizations. We prove its basic properties and discuss relationships with related machine learning methods.  ( 2 min )
    Safe Policy Improvement Approaches and their Limitations. (arXiv:2208.00724v1 [cs.LG])
    Safe Policy Improvement (SPI) is an important technique for offline reinforcement learning in safety critical applications as it improves the behavior policy with a high probability. We classify various SPI approaches from the literature into two groups, based on how they utilize the uncertainty of state-action pairs. Focusing on the Soft-SPIBB (Safe Policy Improvement with Soft Baseline Bootstrapping) algorithms, we show that their claim of being provably safe does not hold. Based on this finding, we develop adaptations, the Adv-Soft-SPIBB algorithms, and show that they are provably safe. A heuristic adaptation, Lower-Approx-Soft-SPIBB, yields the best performance among all SPIBB algorithms in extensive experiments on two benchmarks. We also check the safety guarantees of the provably safe algorithms and show that huge amounts of data are necessary such that the safety bounds become useful in practice.  ( 2 min )
    Graph Neural Network with Local Frame for Molecular Potential Energy Surface. (arXiv:2208.00716v1 [cs.LG])
    Modeling molecular potential energy surface is of pivotal importance in science. Graph Neural Networks have shown great success in this field, especially those using rotation-equivariant representations. However, they either suffer from a complex mathematical form or lack theoretical support and design principle. To avoid using equivariant representations, we introduce a novel local frame method to molecule representation learning and analyze its expressive power. With a frame and the projection of equivariant vectors on the frame, GNNs can map the local environment of an atom to a scalar representation injectively. Messages can also be passed across local environments with frames' projection on frames. We further analyze when and how we can build such local frames. We prove that local frames always exist when the local environments have no symmetry, as is often the case in molecular dynamics simulations. For symmetric molecules, though only degenerate frames can be built, we find that the local frame method may still achieve high expressive power in some common cases due to the reduced degrees of freedom. Using only scalar representations allows us to adopt existing simple and powerful GNN architectures. Our model outperforms a range of state-of-the-art baselines in experiments. Simpler architectures also lead to higher scalability. Our model only takes about 30% inference time compared with the fastest baseline.  ( 2 min )
    Off-Policy Correction for Actor-Critic Algorithms in Deep Reinforcement Learning. (arXiv:2208.00755v1 [cs.LG])
    Compared to on-policy policy gradient techniques, off-policy model-free deep reinforcement learning (RL) approaches that use previously gathered data can improve sampling efficiency. However, off-policy learning becomes challenging when the discrepancy between the distributions of the policy of interest and the policies that collected the data increases. Although the well-studied importance sampling and off-policy policy gradient techniques were proposed to compensate for this discrepancy, they usually require a collection of long trajectories that increases the computational complexity and induce additional problems such as vanishing or exploding gradients. Moreover, their generalization to continuous action domains is strictly limited as they require action probabilities, which is unsuitable for deterministic policies. To overcome these limitations, we introduce an alternative off-policy correction algorithm for continuous action spaces, Actor-Critic Off-Policy Correction (AC-Off-POC), to mitigate the potential drawbacks introduced by the previously collected data. Through a novel discrepancy measure computed by the agent's most recent action decisions on the states of the randomly sampled batch of transitions, the approach does not require actual or estimated action probabilities for any policy and offers an adequate one-step importance sampling. Theoretical results show that the introduced approach can achieve a contraction mapping with a fixed unique point, which allows a "safe" off-policy learning. Our empirical results suggest that AC-Off-POC consistently improves the state-of-the-art and attains higher returns in fewer steps than the competing methods by efficiently scheduling the learning rate in Q-learning and policy optimization.  ( 3 min )
    $\textrm{D}^3\textrm{Former}$: Debiased Dual Distilled Transformer for Incremental Learning. (arXiv:2208.00777v1 [cs.CV])
    Class incremental learning (CIL) involves learning a classification model where groups of new classes are encountered in every learning phase. The goal is to learn a unified model performant on all the classes observed so far. Given the recent popularity of Vision Transformers (ViTs) in conventional classification settings, an interesting question is to study their continual learning behaviour. In this work, we develop a Debiased Dual Distilled Transformer for CIL dubbed $\textrm{D}^3\textrm{Former}$. The proposed model leverages a hybrid nested ViT design to ensure data efficiency and scalability to small as well as large datasets. In contrast to a recent ViT based CIL approach, our $\textrm{D}^3\textrm{Former}$ does not dynamically expand its architecture when new tasks are learned and remains suitable for a large number of incremental tasks. The improved CIL behaviour of $\textrm{D}^3\textrm{Former}$ owes to two fundamental changes to the ViT design. First, we treat the incremental learning as a long-tail classification problem where the majority samples from new classes vastly outnumber the limited exemplars available for old classes. To avoid biasness against the minority old classes, we propose to dynamically adjust logits to emphasize on retaining the representations relevant to old tasks. Second, we propose to preserve the configuration of spatial attention maps as the learning progresses across tasks. This helps in reducing catastrophic forgetting via constraining the model to retain the attention on the most discriminative regions. $\textrm{D}^3\textrm{Former}$ obtains favorable results on incremental versions of CIFAR-100, MNIST, SVHN, and ImageNet datasets.  ( 3 min )
    XOOD: Extreme Value Based Out-Of-Distribution Detection For Image Classification. (arXiv:2208.00629v1 [cs.LG])
    Detecting out-of-distribution (OOD) data at inference time is crucial for many applications of machine learning. We present XOOD: a novel extreme value-based OOD detection framework for image classification that consists of two algorithms. The first, XOOD-M, is completely unsupervised, while the second XOOD-L is self-supervised. Both algorithms rely on the signals captured by the extreme values of the data in the activation layers of the neural network in order to distinguish between in-distribution and OOD instances. We show experimentally that both XOOD-M and XOOD-L outperform state-of-the-art OOD detection methods on many benchmark data sets in both efficiency and accuracy, reducing false-positive rate (FPR95) by 50%, while improving the inferencing time by an order of magnitude.  ( 2 min )
    SFILES 2.0: An extended text-based flowsheet representation. (arXiv:2208.00778v1 [cs.DB])
    SFILES is a text-based notation for chemical process flowsheets. It was originally proposed by d'Anterroches (2006) who was inspired by the text-based SMILES notation for molecules. The text-based format has several advantages compared to flowsheet images regarding the storage format, computational accessibility, and eventually for data analysis and processing. However, the original SFILES version cannot describe essential flowsheet configurations unambiguously, such as the distinction between top and bottom products. Neither is it capable of describing the control structure required for the safe and reliable operation of chemical processes. Also, there is no publicly available software for decoding or encoding chemical process topologies to SFILES. We propose the SFILES 2.0 with a complete description of the extended notation and naming conventions. Additionally, we provide open-source software for the automated conversion between flowsheet graphs and SFILES 2.0 strings. This way, we hope to encourage researchers and engineers to publish their flowsheet topologies as SFILES 2.0 strings. The ultimate goal is to set the standards for creating a FAIR database of chemical process flowsheets, which would be of great value for future data analysis and processing.  ( 2 min )
    Learning Object-Based State Estimators for Household Robots. (arXiv:2011.03183v4 [cs.LG] UPDATED)
    A robot operating in a household makes observations of multiple objects as it moves around over the course of days or weeks. The objects may be moved by inhabitants, but not completely at random. The robot may be called upon later to retrieve objects and will need a long-term object-based memory in order to know how to find them. Existing work in semantic slam does not attempt to capture the dynamics of object movement. In this paper, we combine some aspects of classic techniques for data-association filtering with modern attention-based neural networks to construct object-based memory systems that operate on high-dimensional observations and hypotheses. We perform end-to-end learning on labeled observation trajectories to learn both the transition and observation models. We demonstrate the system's effectiveness in maintaining memory of dynamically changing objects in both simulated environment and real images, and demonstrate improvements over classical structured approaches as well as unstructured neural approaches. Additional information available at project website: https://yilundu.github.io/obm/.  ( 3 min )
    A Small Survey On Event Detection Using Twitter. (arXiv:2011.05801v2 [cs.SI] UPDATED)
    A small survey on event detection using Twitter. This work first defines the problem statement, and then summarizes and collates the different research works towards solving the problem.  ( 2 min )
    Model-based graph reinforcement learning for inductive traffic signal control. (arXiv:2208.00659v1 [cs.LG])
    Most reinforcement learning methods for adaptive-traffic-signal-control require training from scratch to be applied on any new intersection or after any modification to the road network, traffic distribution, or behavioral constraints experienced during training. Considering 1) the massive amount of experience required to train such methods, and 2) that experience must be gathered by interacting in an exploratory fashion with real road-network-users, such a lack of transferability limits experimentation and applicability. Recent approaches enable learning policies that generalize for unseen road-network topologies and traffic distributions, partially tackling this challenge. However, the literature remains divided between the learning of cyclic (the evolution of connectivity at an intersection must respect a cycle) and acyclic (less constrained) policies, and these transferable methods 1) are only compatible with cyclic constraints and 2) do not enable coordination. We introduce a new model-based method, MuJAM, which, on top of enabling explicit coordination at scale for the first time, pushes generalization further by allowing a generalization to the controllers' constraints. In a zero-shot transfer setting involving both road networks and traffic settings never experienced during training, and in a larger transfer experiment involving the control of 3,971 traffic signal controllers in Manhattan, we show that MuJAM, using both cyclic and acyclic constraints, outperforms domain-specific baselines as well as another transferable approach.  ( 2 min )
    Learning to Navigate using Visual Sensor Networks. (arXiv:2208.00759v1 [cs.RO])
    We consider the problem of navigating a mobile robot towards a target in an unknown environment that is endowed with visual sensors, where neither the robot nor the sensors have access to global positioning information and only use first-person-view images. While prior work in sensor network based navigation uses explicit mapping and planning techniques, and are often aided by external positioning systems, we propose a vision-only based learning approach that leverages a Graph Neural Network (GNN) to encode and communicate relevant viewpoint information to the mobile robot. During navigation, the robot is guided by a model that we train through imitation learning to approximate optimal motion primitives, thereby predicting the effective cost-to-go (to the target). In our experiments, we first demonstrate generalizability to previously unseen environments with various sensor layouts. Simulation results show that by utilizing communication among the sensors and robot, we can achieve a $18.1\%$ improvement in success rate while decreasing path detour mean by $29.3\%$ and variability by $48.4\%$. This is done without requiring a global map, positioning data, nor pre-calibration of the sensor network. Second, we perform a zero-shot transfer of our model from simulation to the real world. To this end, we train a `translator' model that translates between {latent encodings of} real and simulated images so that the navigation policy (which is trained entirely in simulation) can be used directly on the real robot, without additional fine-tuning. Physical experiments demonstrate our effectiveness in various cluttered environments.  ( 3 min )
    An Evidential Neural Network Model for Regression Based on Random Fuzzy Numbers. (arXiv:2208.00647v1 [cs.LG])
    We introduce a distance-based neural network model for regression, in which prediction uncertainty is quantified by a belief function on the real line. The model interprets the distances of the input vector to prototypes as pieces of evidence represented by Gaussian random fuzzy numbers (GRFN's) and combined by the generalized product intersection rule, an operator that extends Dempster's rule to random fuzzy sets. The network output is a GRFN that can be summarized by three numbers characterizing the most plausible predicted value, variability around this value, and epistemic uncertainty. Experiments with real datasets demonstrate the very good performance of the method as compared to state-of-the-art evidential and statistical learning algorithms. \keywords{Evidence theory, Dempster-Shafer theory, belief functions, machine learning, random fuzzy sets.  ( 2 min )
    De-biased Representation Learning for Fairness with Unreliable Labels. (arXiv:2208.00651v1 [cs.LG])
    Removing bias while keeping all task-relevant information is challenging for fair representation learning methods since they would yield random or degenerate representations w.r.t. labels when the sensitive attributes correlate with labels. Existing works proposed to inject the label information into the learning procedure to overcome such issues. However, the assumption that the observed labels are clean is not always met. In fact, label bias is acknowledged as the primary source inducing discrimination. In other words, the fair pre-processing methods ignore the discrimination encoded in the labels either during the learning procedure or the evaluation stage. This contradiction puts a question mark on the fairness of the learned representations. To circumvent this issue, we explore the following question: \emph{Can we learn fair representations predictable to latent ideal fair labels given only access to unreliable labels?} In this work, we propose a \textbf{D}e-\textbf{B}iased \textbf{R}epresentation Learning for \textbf{F}airness (DBRF) framework which disentangles the sensitive information from non-sensitive attributes whilst keeping the learned representations predictable to ideal fair labels rather than observed biased ones. We formulate the de-biased learning framework through information-theoretic concepts such as mutual information and information bottleneck. The core concept is that DBRF advocates not to use unreliable labels for supervision when sensitive information benefits the prediction of unreliable labels. Experiment results over both synthetic and real-world data demonstrate that DBRF effectively learns de-biased representations towards ideal labels.  ( 3 min )
    Resolution enhancement of placenta histological images using deep learning. (arXiv:2208.00163v1 [eess.IV])
    In this study, a method has been developed to improve the resolution of histological human placenta images. For this purpose, a paired series of high- and low-resolution images have been collected to train a deep neural network model that can predict image residuals required to improve the resolution of the input images. A modified version of the U-net neural network model has been tailored to find the relationship between the low resolution and residual images. After training for 900 epochs on an augmented dataset of 1000 images, the relative mean squared error of 0.003 is achieved for the prediction of 320 test images. The proposed method has not only improved the contrast of the low-resolution images at the edges of cells but added critical details and textures that mimic high-resolution images of placenta villous space.  ( 2 min )
    Neural Correspondence Field for Object Pose Estimation. (arXiv:2208.00113v1 [cs.CV])
    We propose a method for estimating the 6DoF pose of a rigid object with an available 3D model from a single RGB image. Unlike classical correspondence-based methods which predict 3D object coordinates at pixels of the input image, the proposed method predicts 3D object coordinates at 3D query points sampled in the camera frustum. The move from pixels to 3D points, which is inspired by recent PIFu-style methods for 3D reconstruction, enables reasoning about the whole object, including its (self-)occluded parts. For a 3D query point associated with a pixel-aligned image feature, we train a fully-connected neural network to predict: (i) the corresponding 3D object coordinates, and (ii) the signed distance to the object surface, with the first defined only for query points in the surface vicinity. We call the mapping realized by this network as Neural Correspondence Field. The object pose is then robustly estimated from the predicted 3D-3D correspondences by the Kabsch-RANSAC algorithm. The proposed method achieves state-of-the-art results on three BOP datasets and is shown superior especially in challenging cases with occlusion. The project website is at: linhuang17.github.io/NCF.  ( 2 min )
    Testing Relational Understanding in Text-Guided Image Generation. (arXiv:2208.00005v1 [cs.CV])
    Relations are basic building blocks of human cognition. Classic and recent work suggests that many relations are early developing, and quickly perceived. Machine models that aspire to human-level perception and reasoning should reflect the ability to recognize and reason generatively about relations. We report a systematic empirical examination of a recent text-guided image generation model (DALL-E 2), using a set of 15 basic physical and social relations studied or proposed in the literature, and judgements from human participants (N = 169). Overall, we find that only ~22% of images matched basic relation prompts. Based on a quantitative examination of people's judgments, we suggest that current image generation models do not yet have a grasp of even basic relations involving simple objects and agents. We examine reasons for model successes and failures, and suggest possible improvements based on computations observed in biological intelligence.  ( 2 min )
    A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond. (arXiv:2208.00173v1 [cs.CV])
    Masked autoencoders are scalable vision learners, as the title of MAE \cite{he2022masked}, which suggests that self-supervised learning (SSL) in vision might undertake a similar trajectory as in NLP. Specifically, generative pretext tasks with the masked prediction (e.g., BERT) have become a de facto standard SSL practice in NLP. By contrast, early attempts at generative methods in vision have been buried by their discriminative counterparts (like contrastive learning); however, the success of mask image modeling has revived the masking autoencoder (often termed denoising autoencoder in the past). As a milestone to bridge the gap with BERT in NLP, masked autoencoder has attracted unprecedented attention for SSL in vision and beyond. This work conducts a comprehensive survey of masked autoencoders to shed insight on a promising direction of SSL. As the first to review SSL with masked autoencoders, this work focuses on its application in vision by discussing its historical developments, recent progress, and implications for diverse applications.  ( 2 min )
    Personalised recommendations of sleep behaviour with neural networks using sleep diaries captured in Sleepio. (arXiv:2208.00033v1 [cs.LG])
    SleepioTM is a digital mobile phone and web platform that uses techniques from cognitive behavioural therapy (CBT) to improve sleep in people with sleep difficulty. As part of this process, Sleepio captures data about the sleep behaviour of the users that have consented to such data being processed. For neural networks, the scale of the data is an opportunity to train meaningful models translatable to actual clinical practice. In collaboration with Big Health, the therapeutics company that created and utilizes Sleepio, we have analysed data from a random sample of 401,174 sleep diaries and built a neural network to model sleep behaviour and sleep quality of each individual in a personalised manner. We demonstrate that this neural network is more accurate than standard statistical methods in predicting the sleep quality of an individual based on his/her behaviour from the last 10 days. We compare model performance in a wide range of hyperparameter settings representing various scenarios. We further show that the neural network can be used to produce personalised recommendations of what sleep habits users should follow to maximise sleep quality, and show that these recommendations are substantially better than the ones generated by standard methods. We finally show that the neural network can explain the recommendation given to each participant and calculate confidence intervals for each prediction, all of which are essential for clinicians to be able to adopt such a tool in clinical practice.  ( 3 min )
    HPO X ELA: Investigating Hyperparameter Optimization Landscapes by Means of Exploratory Landscape Analysis. (arXiv:2208.00220v1 [cs.LG])
    Hyperparameter optimization (HPO) is a key component of machine learning models for achieving peak predictive performance. While numerous methods and algorithms for HPO have been proposed over the last years, little progress has been made in illuminating and examining the actual structure of these black-box optimization problems. Exploratory landscape analysis (ELA) subsumes a set of techniques that can be used to gain knowledge about properties of unknown optimization problems. In this paper, we evaluate the performance of five different black-box optimizers on 30 HPO problems, which consist of two-, three- and five-dimensional continuous search spaces of the XGBoost learner trained on 10 different data sets. This is contrasted with the performance of the same optimizers evaluated on 360 problem instances from the black-box optimization benchmark (BBOB). We then compute ELA features on the HPO and BBOB problems and examine similarities and differences. A cluster analysis of the HPO and BBOB problems in ELA feature space allows us to identify how the HPO problems compare to the BBOB problems on a structural meta-level. We identify a subset of BBOB problems that are close to the HPO problems in ELA feature space and show that optimizer performance is comparably similar on these two sets of benchmark problems. We highlight open challenges of ELA for HPO and discuss potential directions of future research and applications.  ( 3 min )
    Tackling Neural Architecture Search With Quality Diversity Optimization. (arXiv:2208.00204v1 [cs.LG])
    Neural architecture search (NAS) has been studied extensively and has grown to become a research field with substantial impact. While classical single-objective NAS searches for the architecture with the best performance, multi-objective NAS considers multiple objectives that should be optimized simultaneously, e.g., minimizing resource usage along the validation error. Although considerable progress has been made in the field of multi-objective NAS, we argue that there is some discrepancy between the actual optimization problem of practical interest and the optimization problem that multi-objective NAS tries to solve. We resolve this discrepancy by formulating the multi-objective NAS problem as a quality diversity optimization (QDO) problem and introduce three quality diversity NAS optimizers (two of them belonging to the group of multifidelity optimizers), which search for high-performing yet diverse architectures that are optimal for application-specific niches, e.g., hardware constraints. By comparing these optimizers to their multi-objective counterparts, we demonstrate that quality diversity NAS in general outperforms multi-objective NAS with respect to quality of solutions and efficiency. We further show how applications and future NAS research can thrive on QDO.  ( 2 min )
    Low-complexity Approximate Convolutional Neural Networks. (arXiv:2208.00087v1 [cs.LG])
    In this paper, we present an approach for minimizing the computational complexity of trained Convolutional Neural Networks (ConvNet). The idea is to approximate all elements of a given ConvNet and replace the original convolutional filters and parameters (pooling and bias coefficients; and activation function) with efficient approximations capable of extreme reductions in computational complexity. Low-complexity convolution filters are obtained through a binary (zero-one) linear programming scheme based on the Frobenius norm over sets of dyadic rationals. The resulting matrices allow for multiplication-free computations requiring only addition and bit-shifting operations. Such low-complexity structures pave the way for low-power, efficient hardware designs. We applied our approach on three use cases of different complexity: (i) a "light" but efficient ConvNet for face detection (with around 1000 parameters); (ii) another one for hand-written digit classification (with more than 180000 parameters); and (iii) a significantly larger ConvNet: AlexNet with $\approx$1.2 million matrices. We evaluated the overall performance on the respective tasks for different levels of approximations. In all considered applications, very low-complexity approximations have been derived maintaining an almost equal classification performance.  ( 3 min )
    Local Graph Embeddings Based on Neighbors Degree Frequency of Nodes. (arXiv:2208.00152v1 [cs.SI])
    We propose a local-to-global strategy for graph machine learning and network analysis by defining certain local features and vector representations of nodes and then using them to learn globally defined metrics and properties of the nodes by means of deep neural networks. By extending the notion of the degree of a node via Breath-First Search, a general family of {\bf parametric centrality functions} is defined which are able to reveal the importance of nodes. We introduce the {\bf neighbors degree frequency (NDF)}, as a locally defined embedding of nodes of undirected graphs into euclidean spaces. This gives rise to a vectorized labeling of nodes which encodes the structure of local neighborhoods of nodes and can be used for graph isomorphism testing. We add flexibility to our construction so that it can handle dynamic graphs as well. Afterwards, the Breadth-First Search is used to extend NDF vector representations into two different matrix representations of nodes which contain higher order information about the neighborhoods of nodes. Our matrix representations of nodes provide us with a new way of visualizing the shape of the neighborhood of a node. Furthermore, we use these matrix representations to obtain feature vectors, which are suitable for typical deep learning algorithms. To demonstrate these node embeddings actually contain some information about the nodes, in a series of examples, we show that PageRank and closeness centrality can be learned by applying deep learning to these local features. Our constructions are flexible enough to handle evolving graphs. Finally, we explain how to adapt our constructions for directed graphs.  ( 3 min )
    Robust Rayleigh Regression Method for SAR Image Processing in Presence of Outliers. (arXiv:2208.00097v1 [stat.AP])
    The presence of outliers (anomalous values) in synthetic aperture radar (SAR) data and the misspecification in statistical image models may result in inaccurate inferences. To avoid such issues, the Rayleigh regression model based on a robust estimation process is proposed as a more realistic approach to model this type of data. This paper aims at obtaining Rayleigh regression model parameter estimators robust to the presence of outliers. The proposed approach considered the weighted maximum likelihood method and was submitted to numerical experiments using simulated and measured SAR images. Monte Carlo simulations were employed for the numerical assessment of the proposed robust estimator performance in finite signal lengths, their sensitivity to outliers, and the breakdown point. For instance, the non-robust estimators show a relative bias value $65$-fold larger than the results provided by the robust approach in corrupted signals. In terms of sensitivity analysis and break down point, the robust scheme resulted in a reduction of about $96\%$ and $10\%$, respectively, in the mean absolute value of both measures, in compassion to the non-robust estimators. Moreover, two SAR data sets were used to compare the ground type and anomaly detection results of the proposed robust scheme with competing methods in the literature.  ( 3 min )
    A review of Deep learning Techniques for COVID-19 identification on Chest CT images. (arXiv:2208.00032v1 [eess.IV])
    The current COVID-19 pandemic is a serious threat to humanity that directly affects the lungs. Automatic identification of COVID-19 is a challenge for health care officials. The standard gold method for diagnosing COVID-19 is Reverse Transcription Polymerase Chain Reaction (RT-PCR) to collect swabs from affected people. Some limitations encountered while collecting swabs are related to accuracy and longtime duration. Chest CT (Computed Tomography) is another test method that helps healthcare providers quickly identify the infected lung areas. It was used as a supporting tool for identifying COVID-19 in an earlier stage. With the help of deep learning, the CT imaging characteristics of COVID-19. Researchers have proven it to be highly effective for COVID-19 CT image classification. In this study, we review the recent deep learning techniques that can use to detect the COVID-19 disease. Relevant studies were collected by various databases such as Web of Science, Google Scholar, and PubMed. Finally, we compare the results of different deep learning models, and CT image analysis is discussed.  ( 3 min )
    Topology-Driven Generative Completion of Lacunae in Molecular Data. (arXiv:2208.00063v1 [cs.LG])
    We introduce an approach to the targeted completion of lacunae in molecular data sets which is driven by topological data analysis, such as Mapper algorithm. Lacunae are filled in using scaffold-constrained generative models trained with different scoring functions. The approach enables addition of links and vertices to the skeletonized representations of the data, such as Mapper graph, and falls in the broad category of network completion methods. We illustrate application of the topology-driven data completion strategy by creating a lacuna in the data set of onium cations extracted from USPTO patents, and repairing it.  ( 2 min )
    Sampling Attacks on Meta Reinforcement Learning: A Minimax Formulation and Complexity Analysis. (arXiv:2208.00081v1 [cs.LG])
    Meta reinforcement learning (meta RL), as a combination of meta-learning ideas and reinforcement learning (RL), enables the agent to adapt to different tasks using a few samples. However, this sampling-based adaptation also makes meta RL vulnerable to adversarial attacks. By manipulating the reward feedback from sampling processes in meta RL, an attacker can mislead the agent into building wrong knowledge from training experience, which deteriorates the agent's performance when dealing with different tasks after adaptation. This paper provides a game-theoretical underpinning for understanding this type of security risk. In particular, we formally define the sampling attack model as a Stackelberg game between the attacker and the agent, which yields a minimax formulation. It leads to two online attack schemes: Intermittent Attack and Persistent Attack, which enable the attacker to learn an optimal sampling attack, defined by an $\epsilon$-first-order stationary point, within $\mathcal{O}(\epsilon^{-2})$ iterations. These attack schemes freeride the learning progress concurrently without extra interactions with the environment. By corroborating the convergence results with numerical experiments, we observe that a minor effort of the attacker can significantly deteriorate the learning performance, and the minimax approach can also help robustify the meta RL algorithms.  ( 2 min )
    Improved Policy Optimization for Online Imitation Learning. (arXiv:2208.00088v1 [cs.LG])
    We consider online imitation learning (OIL), where the task is to find a policy that imitates the behavior of an expert via active interaction with the environment. We aim to bridge the gap between the theory and practice of policy optimization algorithms for OIL by analyzing one of the most popular OIL algorithms, DAGGER. Specifically, if the class of policies is sufficiently expressive to contain the expert policy, we prove that DAGGER achieves constant regret. Unlike previous bounds that require the losses to be strongly-convex, our result only requires the weaker assumption that the losses be strongly-convex with respect to the policy's sufficient statistics (not its parameterization). In order to ensure convergence for a wider class of policies and losses, we augment DAGGER with an additional regularization term. In particular, we propose a variant of Follow-the-Regularized-Leader (FTRL) and its adaptive variant for OIL and develop a memory-efficient implementation, which matches the memory requirements of FTL. Assuming that the loss functions are smooth and convex with respect to the parameters of the policy, we also prove that FTRL achieves constant regret for any sufficiently expressive policy class, while retaining $O(\sqrt{T})$ regret in the worst-case. We demonstrate the effectiveness of these algorithms with experiments on synthetic and high-dimensional control tasks.  ( 2 min )
    Enhanced gradient-based MCMC in discrete spaces. (arXiv:2208.00040v1 [stat.ML])
    The recent introduction of gradient-based MCMC for discrete spaces holds great promise, and comes with the tantalising possibility of new discrete counterparts to celebrated continuous methods such as MALA and HMC. Towards this goal, we introduce several discrete Metropolis-Hastings samplers that are conceptually-inspired by MALA, and demonstrate their strong empirical performance across a range of challenging sampling problems in Bayesian inference and energy-based modelling. Methodologically, we identify why discrete analogues to preconditioned MALA are generally intractable, motivating us to introduce a new kind of preconditioning based on auxiliary variables and the `Gaussian integral trick'.  ( 2 min )
    Robust Trajectory Prediction against Adversarial Attacks. (arXiv:2208.00094v1 [cs.LG])
    Trajectory prediction using deep neural networks (DNNs) is an essential component of autonomous driving (AD) systems. However, these methods are vulnerable to adversarial attacks, leading to serious consequences such as collisions. In this work, we identify two key ingredients to defend trajectory prediction models against adversarial attacks including (1) designing effective adversarial training methods and (2) adding domain-specific data augmentation to mitigate the performance degradation on clean data. We demonstrate that our method is able to improve the performance by 46% on adversarial data and at the cost of only 3% performance degradation on clean data, compared to the model trained with clean data. Additionally, compared to existing robust methods, our method can improve performance by 21% on adversarial examples and 9% on clean data. Our robust model is evaluated with a planner to study its downstream impacts. We demonstrate that our model can significantly reduce the severe accident rates (e.g., collisions and off-road driving).  ( 2 min )
    Reinforcement learning with experience replay and adaptation of action dispersion. (arXiv:2208.00156v1 [cs.LG])
    Effective reinforcement learning requires a proper balance of exploration and exploitation defined by the dispersion of action distribution. However, this balance depends on the task, the current stage of the learning process, and the current environment state. Existing methods that designate the action distribution dispersion require problem-dependent hyperparameters. In this paper, we propose to automatically designate the action distribution dispersion using the following principle: This distribution should have sufficient dispersion to enable the evaluation of future policies. To that end, the dispersion should be tuned to assure a sufficiently high probability (densities) of the actions in the replay buffer and the modes of the distributions that generated them, yet this dispersion should not be higher. This way, a policy can be effectively evaluated based on the actions in the buffer, but exploratory randomness in actions decreases when this policy converges. The above principle is verified here on challenging benchmarks Ant, HalfCheetah, Hopper, and Walker2D, with good results. Our method makes the action standard deviations converge to values similar to those resulting from trial-and-error optimization.  ( 2 min )
    MulViMotion: Shape-aware 3D Myocardial Motion Tracking from Multi-View Cardiac MRI. (arXiv:2208.00034v1 [eess.IV])
    Recovering the 3D motion of the heart from cine cardiac magnetic resonance (CMR) imaging enables the assessment of regional myocardial function and is important for understanding and analyzing cardiovascular disease. However, 3D cardiac motion estimation is challenging because the acquired cine CMR images are usually 2D slices which limit the accurate estimation of through-plane motion. To address this problem, we propose a novel multi-view motion estimation network (MulViMotion), which integrates 2D cine CMR images acquired in short-axis and long-axis planes to learn a consistent 3D motion field of the heart. In the proposed method, a hybrid 2D/3D network is built to generate dense 3D motion fields by learning fused representations from multi-view images. To ensure that the motion estimation is consistent in 3D, a shape regularization module is introduced during training, where shape information from multi-view images is exploited to provide weak supervision to 3D motion estimation. We extensively evaluate the proposed method on 2D cine CMR images from 580 subjects of the UK Biobank study for 3D motion tracking of the left ventricular myocardium. Experimental results show that the proposed method quantitatively and qualitatively outperforms competing methods.  ( 2 min )
    RangL: A Reinforcement Learning Competition Platform. (arXiv:2208.00003v1 [cs.LG])
    The RangL project hosted by The Alan Turing Institute aims to encourage the wider uptake of reinforcement learning by supporting competitions relating to real-world dynamic decision problems. This article describes the reusable code repository developed by the RangL team and deployed for the 2022 Pathways to Net Zero Challenge, supported by the UK Net Zero Technology Centre. The winning solutions to this particular Challenge seek to optimize the UK's energy transition policy to net zero carbon emissions by 2050. The RangL repository includes an OpenAI Gym reinforcement learning environment and code that supports both submission to, and evaluation in, a remote instance of the open source EvalAI platform as well as all winning learning agent strategies. The repository is an illustrative example of RangL's capability to provide a reusable structure for future challenges.  ( 2 min )
    DRSOM: A Dimension Reduced Second-Order Method and Preliminary Analyses. (arXiv:2208.00208v1 [math.OC])
    We introduce a Dimension-Reduced Second-Order Method (DRSOM) for convex and nonconvex unconstrained optimization. Under a trust-region-like framework our method preserves the convergence of the second-order method while using only Hessian-vector products in two directions. Moreover, the computational overhead remains comparable to the first-order such as the gradient descent method. We show that the method has a complexity of $O(\epsilon^{-3/2})$ to satisfy the first-order and second-order conditions in the subspace. The applicability and performance of DRSOM are exhibited by various computational experiments in logistic regression, $L_2-L_p$ minimization, sensor network localization, and neural network training. For neural networks, our preliminary implementation seems to gain computational advantages in terms of training accuracy and iteration complexity over state-of-the-art first-order methods including SGD and ADAM.  ( 2 min )
  • Open

    Inductive Biases for Deep Learning of Higher-Level Cognition. (arXiv:2011.15091v4 [cs.LG] UPDATED)
    A fascinating hypothesis is that human and animal intelligence could be explained by a few principles (rather than an encyclopedic list of heuristics). If that hypothesis was correct, we could more easily both understand our own intelligence and build intelligent machines. Just like in physics, the principles themselves would not be sufficient to predict the behavior of complex systems like brains, and substantial computation might be needed to simulate human-like intelligence. This hypothesis would suggest that studying the kind of inductive biases that humans and animals exploit could help both clarify these principles and provide inspiration for AI research and neuroscience theories. Deep learning already exploits several key inductive biases, and this work considers a larger list, focusing on those which concern mostly higher-level and sequential conscious processing. The objective of clarifying these particular principles is that they could potentially help us build AI systems benefiting from humans' abilities in terms of flexible out-of-distribution and systematic generalization, which is currently an area where a large gap exists between state-of-the-art machine learning and human intelligence.  ( 3 min )
    Weighted Scaling Approach for Metabolomics Data Analysis. (arXiv:2208.00603v1 [stat.ML])
    Systematic variation is a common issue in metabolomics data analysis. Therefore, different scaling and normalization techniques are used to preprocess the data for metabolomics data analysis. Although several scaling methods are available in the literature, however, choice of scaling, transformation and/or normalization technique influence the further statistical analysis. It is challenging to choose the appropriate scaling technique for downstream analysis to get accurate results or to make a proper decision. Moreover, the existing scaling techniques are sensitive to outliers or extreme values. To fill the gap, our objective is to introduce a robust scaling approach that is not influenced by outliers as well as provides more accurate results for downstream analysis. Here, we introduced a new weighted scaling approach that is robust against outliers however, where no additional outlier detection/treatment step is needed in data preprocessing and also compared it with the conventional scaling and normalization techniques through artificial and real metabolomics datasets. We evaluated the performance of the proposed method in comparison to the other existing conventional scaling techniques using metabolomics data analysis in both the absence and presence of different percentages of outliers. Results show that in most cases, the proposed scaling technique performs better than the traditional scaling methods in both the absence and presence of outliers. The proposed method improves the further downstream metabolomics analysis. The R function of the proposed robust scaling method is available at https://github.com/nishithkumarpaul/robustScaling/blob/main/wscaling.R  ( 3 min )
    Practical Deep Reinforcement Learning Approach for Stock Trading. (arXiv:1811.07522v3 [cs.LG] UPDATED)
    Stock trading strategy plays a crucial role in investment companies. However, it is challenging to obtain optimal strategy in the complex and dynamic stock market. We explore the potential of deep reinforcement learning to optimize stock trading strategy and thus maximize investment return. 30 stocks are selected as our trading stocks and their daily prices are used as the training and trading market environment. We train a deep reinforcement learning agent and obtain an adaptive trading strategy. The agent's performance is evaluated and compared with Dow Jones Industrial Average and the traditional min-variance portfolio allocation strategy. The proposed deep reinforcement learning approach is shown to outperform the two baselines in terms of both the Sharpe ratio and cumulative returns.  ( 2 min )
    Mixture model for designs in high dimensional regression and the LASSO. (arXiv:1210.4762v2 [math.ST] UPDATED)
    The LASSO is a recent technique for variable selection in the regression model \bean y & = & X\beta + z, \eean where $X\in \R^{n\times p}$ and $z$ is a centered gaussian i.i.d. noise vector $\mathcal N(0,\sigma^2I)$. The LASSO has been proved to achieve remarkable properties such as exact support recovery of sparse vectors when the columns are sufficently incoherent and low prediction error under even less stringent conditions. However, many matrices do not satisfy small coherence in practical applications and the LASSO estimator may thus suffer from what is known as the slow rate regime. The goal of the present paper is to study the LASSO from a slightly different perspective by proposing a mixture model for the design matrix which is able to capture in a natural way the potentially clustered nature of the columns in many practical situations. In this model, the columns of the design matrix are drawn from a Gaussian mixture model. Instead of requiring incoherence for the design matrix $X$, we only require incoherence of the much smaller matrix of the mixture's centers. Our main result states that $X\beta$ can be estimated with the same precision as for incoherent designs except for a correction term depending on the maximal variance in the mixture model.  ( 3 min )
    Untargeted Region of Interest Selection for GC-MS Data using a Pseudo F-Ratio Moving Window ($\psi$FRMV). (arXiv:2208.00313v1 [stat.ML])
    There are many challenges associated with analysing gas chromatography - mass spectrometry (GC-MS) data. Many of these challenges stem from the fact that electron ionisation can make it difficult to recover molecular information due to the high degree of fragmentation with concomitant loss of molecular ion signal. With GC-MS data there are often many common fragment ions shared among closely-eluting peaks, necessitating sophisticated methods for analysis. Some of these methods are fully automated, but make some assumptions about the data which can introduce artifacts during the analysis. Chemometric methods such as Multivariate Curve Resolution, or Parallel Factor Analysis are particularly attractive, since they are flexible and make relatively few assumptions about the data - ideally resulting in fewer artifacts. These methods do require expert user intervention to determine the most relevant regions of interest and an appropriate number of components, $k$, for each region. Automated region of interest selection is needed to permit automated batch processing of chromatographic data with advanced signal deconvolution. Here, we propose a new method for automated, untargeted region of interest selection that accounts for the multivariate information present in GC-MS data to select regions of interest based on the ratio of the squared first, and second singular values from the Singular Value Decomposition of a window that moves across the chromatogram. Assuming that the first singular value accounts largely for signal, and that the second singular value accounts largely for noise, it is possible to interpret the relationship between these two values as a probabilistic distribution of Fisher Ratios. The sensitivity of the algorithm was tested by investigating the concentration at which the algorithm can no longer pick out chromatographic regions known to contain signal.  ( 3 min )
    A penalized two-pass regression to predict stock returns with time-varying risk premia. (arXiv:2208.00972v1 [econ.EM])
    We develop a penalized two-pass regression with time-varying factor loadings. The penalization in the first pass enforces sparsity for the time-variation drivers while also maintaining compatibility with the no-arbitrage restrictions by regularizing appropriate groups of coefficients. The second pass delivers risk premia estimates to predict equity excess returns. Our Monte Carlo results and our empirical results on a large cross-sectional data set of US individual stocks show that penalization without grouping can yield to nearly all estimated time-varying models violating the no-arbitrage restrictions. Moreover, our results demonstrate that the proposed method reduces the prediction errors compared to a penalized approach without appropriate grouping or a time-invariant factor model.  ( 2 min )
    NN2Poly: A polynomial representation for deep feed-forward artificial neural networks. (arXiv:2112.11397v2 [stat.ML] UPDATED)
    Interpretability of neural networks and their underlying theoretical behaviour remain an open field of study even after the great success of their practical applications, particularly with the emergence of deep learning. In this work, NN2Poly is proposed: a theoretical approach to obtain an explicit polynomial model that provides an accurate representation of an already trained fully-connected feed-forward artificial neural network (a multilayer perceptron or MLP). This approach extends a previous idea proposed in the literature, which was limited to single hidden layer networks, to work with arbitrarily deep MLPs in both regression and classification tasks. The objective of this paper is to achieve this by using a Taylor expansion on the activation function, at each layer, and then using several combinatorial properties to calculate the coefficients of the desired polynomials. Discussion is presented on the main computational challenges of this method, and the way to overcome them by imposing certain constraints during the training phase. Finally, simulation experiments as well as an application to a real data set are presented to demonstrate the effectiveness of the proposed method.  ( 3 min )
    Tackling Neural Architecture Search With Quality Diversity Optimization. (arXiv:2208.00204v1 [cs.LG])
    Neural architecture search (NAS) has been studied extensively and has grown to become a research field with substantial impact. While classical single-objective NAS searches for the architecture with the best performance, multi-objective NAS considers multiple objectives that should be optimized simultaneously, e.g., minimizing resource usage along the validation error. Although considerable progress has been made in the field of multi-objective NAS, we argue that there is some discrepancy between the actual optimization problem of practical interest and the optimization problem that multi-objective NAS tries to solve. We resolve this discrepancy by formulating the multi-objective NAS problem as a quality diversity optimization (QDO) problem and introduce three quality diversity NAS optimizers (two of them belonging to the group of multifidelity optimizers), which search for high-performing yet diverse architectures that are optimal for application-specific niches, e.g., hardware constraints. By comparing these optimizers to their multi-objective counterparts, we demonstrate that quality diversity NAS in general outperforms multi-objective NAS with respect to quality of solutions and efficiency. We further show how applications and future NAS research can thrive on QDO.  ( 2 min )
    Shoring Up the Foundations: Fusing Model Embeddings and Weak Supervision. (arXiv:2203.13270v2 [stat.ML] UPDATED)
    Foundation models offer an exciting new paradigm for constructing models with out-of-the-box embeddings and a few labeled examples. However, it is not clear how to best apply foundation models without labeled data. A potential approach is to fuse foundation models with weak supervision frameworks, which use weak label sources -- pre-trained models, heuristics, crowd-workers -- to construct pseudolabels. The challenge is building a combination that best exploits the signal available in both foundation models and weak sources. We propose Liger, a combination that uses foundation model embeddings to improve two crucial elements of existing weak supervision techniques. First, we produce finer estimates of weak source quality by partitioning the embedding space and learning per-part source accuracies. Second, we improve source coverage by extending source votes in embedding space. Despite the black-box nature of foundation models, we prove results characterizing how our approach improves performance and show that lift scales with the smoothness of label distributions in embedding space. On six benchmark NLP and video tasks, Liger outperforms vanilla weak supervision by 14.1 points, weakly-supervised kNN and adapters by 11.8 points, and kNN and adapters supervised by traditional hand labels by 7.2 points.  ( 3 min )
    YAHPO Gym -- An Efficient Multi-Objective Multi-Fidelity Benchmark for Hyperparameter Optimization. (arXiv:2109.03670v4 [cs.LG] UPDATED)
    When developing and analyzing new hyperparameter optimization methods, it is vital to empirically evaluate and compare them on well-curated benchmark suites. In this work, we propose a new set of challenging and relevant benchmark problems motivated by desirable properties and requirements for such benchmarks. Our new surrogate-based benchmark collection consists of 14 scenarios that in total constitute over 700 multi-fidelity hyperparameter optimization problems, which all enable multi-objective hyperparameter optimization. Furthermore, we empirically compare surrogate-based benchmarks to the more widely-used tabular benchmarks, and demonstrate that the latter may produce unfaithful results regarding the performance ranking of HPO methods. We examine and compare our benchmark collection with respect to defined requirements and propose a single-objective as well as a multi-objective benchmark suite on which we compare 7 single-objective and 7 multi-objective optimizers in a benchmark experiment. Our software is available at [https://github.com/slds-lmu/yahpo_gym].  ( 3 min )
    Tight Concentrations and Confidence Sequences from the Regret of Universal Portfolio. (arXiv:2110.14099v3 [stat.ML] UPDATED)
    A classic problem in statistics is the estimation of the expectation of random variables from samples. This gives rise to the tightly connected problems of deriving concentration inequalities and confidence sequences, that is confidence intervals that hold uniformly over time. Previous work has shown how to easily convert the regret guarantee of an online betting algorithm into a time-uniform concentration inequality. In this paper, we show that we can go even further: We show that the regret of universal portfolio algorithms give rise to new implicit time-uniform concentrations and state-of-the-art empirically calculated confidence sequences. In particular, our numerically obtained confidence sequences can never be vacuous, even with a single sample, and satisfy the law of iterated logarithm.  ( 2 min )
    Debiasing Deep Chest X-Ray Classifiers using Intra- and Post-processing Methods. (arXiv:2208.00781v1 [cs.CV])
    Deep neural networks for image-based screening and computer-aided diagnosis have achieved expert-level performance on various medical imaging modalities, including chest radiographs. Recently, several works have indicated that these state-of-the-art classifiers can be biased with respect to sensitive patient attributes, such as race or gender, leading to growing concerns about demographic disparities and discrimination resulting from algorithmic and model-based decision-making in healthcare. Fair machine learning has focused on mitigating such biases against disadvantaged or marginalised groups, mainly concentrating on tabular data or natural images. This work presents two novel intra-processing techniques based on fine-tuning and pruning an already-trained neural network. These methods are simple yet effective and can be readily applied post hoc in a setting where the protected attribute is unknown during the model development and test time. In addition, we compare several intra- and post-processing approaches applied to debiasing deep chest X-ray classifiers. To the best of our knowledge, this is one of the first efforts studying debiasing methods on chest radiographs. Our results suggest that the considered approaches successfully mitigate biases in fully connected and convolutional neural networks offering stable performance under various settings. The discussed methods can help achieve group fairness of deep medical image classifiers when deploying them in domains with different fairness considerations and constraints.  ( 3 min )
    Bump hunting through density curvature features. (arXiv:2208.00174v1 [stat.ME])
    Bump hunting deals with finding in sample spaces meaningful data subsets known as bumps. These have traditionally been conceived as modal or concave regions in the graph of the underlying density function. We define an abstract bump construct based on curvature functionals of the probability density. Then, we explore several alternative characterizations involving derivatives up to second order. In particular, a suitable implementation of Good and Gaskins' original concave bumps is proposed in the multivariate case. Moreover, we bring to exploratory data analysis concepts like the mean curvature and the Laplacian that have produced good results in applied domains. Our methodology addresses the approximation of the curvature functional with a plug-in kernel density estimator. We provide theoretical results that assure the asymptotic consistency of bump boundaries in the Hausdorff distance with affordable convergence rates. We also present asymptotically valid and consistent confidence regions bounding curvature bumps. The theory is illustrated through several use cases in sports analytics with datasets from the NBA, MLB and NFL. We conclude that the different curvature instances effectively combine to generate insightful visualizations.  ( 2 min )
    A rigorous introduction to linear models. (arXiv:2105.04240v4 [cs.LG] UPDATED)
    This survey is meant to provide an introduction to linear models and the theories behind them. Our goal is to give a rigorous introduction to the readers with prior exposure to ordinary least squares. In machine learning, the output is usually a nonlinear function of the input. Deep learning even aims to find a nonlinear dependence with many layers which require a large amount of computation. However, most of these algorithms build upon simple linear models. We then describe linear models from different views and find the properties and theories behind the models. The linear model is the main technique in regression problems and the primary tool for it is the least squares approximation which minimizes a sum of squared errors. This is a natural choice when we're interested in finding the regression function which minimizes the corresponding expected squared error. This survey is primarily a summary of purpose, significance of important theories behind linear models, e.g., distribution theory, minimum variance estimator. We first describe ordinary least squares from three different points of view upon which we disturb the model with random noise and Gaussian noise. By Gaussian noise, the model gives rise to the likelihood so that we introduce a maximum likelihood estimator. It also develops some distribution theories via this Gaussian disturbance. The distribution theory of least squares will help us answer various questions and introduce related applications. We then prove least squares is the best unbiased linear model in the sense of mean squared error and most importantly, it actually approaches the theoretical limit. We end up with linear models with the Bayesian approach and beyond.  ( 3 min )
    Graph Transfer Learning via Adversarial Domain Adaptation with Graph Convolution. (arXiv:1909.01541v4 [cs.LG] UPDATED)
    This paper studies the problem of cross-network node classification to overcome the insufficiency of labeled data in a single network. It aims to leverage the label information in a partially labeled source network to assist node classification in a completely unlabeled or partially labeled target network. Existing methods for single network learning cannot solve this problem due to the domain shift across networks. Some multi-network learning methods heavily rely on the existence of cross-network connections, thus are inapplicable for this problem. To tackle this problem, we propose a novel \textcolor{black}{graph} transfer learning framework AdaGCN by leveraging the techniques of adversarial domain adaptation and graph convolution. It consists of two components: a semi-supervised learning component and an adversarial domain adaptation component. The former aims to learn class discriminative node representations with given label information of the source and target networks, while the latter contributes to mitigating the distribution divergence between the source and target domains to facilitate knowledge transfer. Extensive empirical evaluations on real-world datasets show that AdaGCN can successfully transfer class information with a low label rate on the source network and a substantial divergence between the source and target domains. The source code for reproducing the experimental results is available at https://github.com/daiquanyu/AdaGCN.  ( 3 min )
    Markov Chain Score Ascent: A Unifying Framework of Variational Inference with Markovian Gradients. (arXiv:2206.06295v2 [cs.LG] UPDATED)
    Minimizing the inclusive Kullback-Leibler (KL) divergence with stochastic gradient descent (SGD) is challenging since its gradient is defined as an integral over the posterior. Recently, multiple methods have been proposed to run SGD with biased gradient estimates obtained from a Markov chain. This paper provides the first non-asymptotic convergence analysis of these methods by establishing their mixing rate and gradient variance. To do this, we demonstrate that these methods-which we collectively refer to as Markov chain score ascent (MCSA) methods-can be cast as special cases of the Markov chain gradient descent framework. Furthermore, by leveraging this new understanding, we develop a novel MCSA scheme, parallel MCSA (pMCSA), that achieves a tighter bound on the gradient variance. We demonstrate that this improved theoretical result translates to superior empirical performance.  ( 2 min )
    Quantum Adaptive Fourier Features for Neural Density Estimation. (arXiv:2208.00564v1 [cs.LG])
    Density estimation is a fundamental task in statistics and machine learning applications. Kernel density estimation is a powerful tool for non-parametric density estimation in low dimensions; however, its performance is poor in higher dimensions. Moreover, its prediction complexity scale linearly with more training data points. This paper presents a method for neural density estimation that can be seen as a type of kernel density estimation, but without the high prediction computational complexity. The method is based on density matrices, a formalism used in quantum mechanics, and adaptive Fourier features. The method can be trained without optimization, but it could be also integrated with deep learning architectures and trained using gradient descent. Thus, it could be seen as a form of neural density estimation method. The method was evaluated in different synthetic and real datasets, and its performance compared against state-of-the-art neural density estimation methods, obtaining competitive results.  ( 2 min )
    Machine learning-based conditional mean filter: a generalization of the ensemble Kalman filter for nonlinear data assimilation. (arXiv:2106.07908v2 [cs.LG] UPDATED)
    This paper presents the machine learning-based ensemble conditional mean filter (ML-EnCMF) -- a filtering method based on the conditional mean filter (CMF) previously introduced in the literature. The updated mean of the CMF matches that of the posterior, obtained by applying Bayes' rule on the filter's forecast distribution. Moreover, we show that the CMF's updated covariance coincides with the expected conditional covariance. Implementing the EnCMF requires computing the conditional mean (CM). A likelihood-based estimator is prone to significant errors for small ensemble sizes, causing the filter divergence. We develop a systematical methodology for integrating machine learning into the EnCMF based on the CM's orthogonal projection property. First, we use a combination of an artificial neural network (ANN) and a linear function, obtained based on the ensemble Kalman filter (EnKF), to approximate the CM, enabling the ML-EnCMF to inherit EnKF's advantages. Secondly, we apply a suitable variance reduction technique to reduce statistical errors when estimating loss function. Lastly, we propose a model selection procedure for element-wisely selecting the applied filter, i.e., either the EnKF or ML-EnCMF, at each updating step. We demonstrate the ML-EnCMF performance using the Lorenz-63 and Lorenz-96 systems and show that the ML-EnCMF outperforms the EnKF and the likelihood-based EnCMF.  ( 3 min )
    TCMI: a non-parametric mutual-dependence estimator for multivariate continuous distributions. (arXiv:2001.11212v3 [stat.ML] UPDATED)
    The identification of relevant features, i.e., the driving variables that determine a process or the properties of a system, is an essential part of the analysis of data sets with a large number of variables. A mathematical rigorous approach to quantifying the relevance of these features is mutual information. Mutual information determines the relevance of features in terms of their joint mutual dependence to the property of interest. However, mutual information requires as input probability distributions, which cannot be reliably estimated from continuous distributions such as physical quantities like lengths or energies. Here, we introduce total cumulative mutual information (TCMI), a measure of the relevance of mutual dependences that extends mutual information to random variables of continuous distribution based on cumulative probability distributions. TCMI is a non-parametric, robust, and deterministic measure that facilitates comparisons and rankings between feature sets with different cardinality. The ranking induced by TCMI allows for feature selection, i.e., the identification of variable sets that are nonlinear statistically related to a property of interest, taking into account the number of data samples as well as the cardinality of the set of variables. We evaluate the performance of our measure with simulated data, compare its performance with similar multivariate-dependence measures, and demonstrate the effectiveness of our feature-selection method on a set of standard data sets and a typical scenario in materials science.  ( 3 min )
    Signature moments to characterize laws of stochastic processes. (arXiv:1810.10971v2 [math.ST] UPDATED)
    The sequence of moments of a vector-valued random variable can characterize its law. We study the analogous problem for path-valued random variables, that is stochastic processes, by using so-called robust signature moments. This allows us to derive a metric of maximum mean discrepancy type for laws of stochastic processes and study the topology it induces on the space of laws of stochastic processes. This metric can be kernelized using the signature kernel which allows to efficiently compute it. As an application, we provide a non-parametric two-sample hypothesis test for laws of stochastic processes.  ( 2 min )
    Enhanced gradient-based MCMC in discrete spaces. (arXiv:2208.00040v1 [stat.ML])
    The recent introduction of gradient-based MCMC for discrete spaces holds great promise, and comes with the tantalising possibility of new discrete counterparts to celebrated continuous methods such as MALA and HMC. Towards this goal, we introduce several discrete Metropolis-Hastings samplers that are conceptually-inspired by MALA, and demonstrate their strong empirical performance across a range of challenging sampling problems in Bayesian inference and energy-based modelling. Methodologically, we identify why discrete analogues to preconditioned MALA are generally intractable, motivating us to introduce a new kind of preconditioning based on auxiliary variables and the `Gaussian integral trick'.  ( 2 min )
    The Geometry of Adversarial Training in Binary Classification. (arXiv:2111.13613v2 [cs.LG] UPDATED)
    We establish an equivalence between a family of adversarial training problems for non-parametric binary classification and a family of regularized risk minimization problems where the regularizer is a nonlocal perimeter functional. The resulting regularized risk minimization problems admit exact convex relaxations of the type $L^1+$ (nonlocal) $\operatorname{TV}$, a form frequently studied in image analysis and graph-based learning. A rich geometric structure is revealed by this reformulation which in turn allows us to establish a series of properties of optimal solutions of the original problem, including the existence of minimal and maximal solutions (interpreted in a suitable sense), and the existence of regular solutions (also interpreted in a suitable sense). In addition, we highlight how the connection between adversarial training and perimeter minimization problems provides a novel, directly interpretable, statistical motivation for a family of regularized risk minimization problems involving perimeter/total variation. The majority of our theoretical results are independent of the distance used to define adversarial attacks.  ( 2 min )
    How Wide Convolutional Neural Networks Learn Hierarchical Tasks. (arXiv:2208.01003v1 [stat.ML])
    Despite their success, understanding how convolutional neural networks (CNNs) can efficiently learn high-dimensional functions remains a fundamental challenge. A popular belief is that these models harness the compositional and hierarchical structure of natural data such as images. Yet, we lack a quantitative understanding of how such structure affects performances, e.g. the rate of decay of the generalisation error with the number of training samples. In this paper we study deep CNNs in the kernel regime: i) we show that the spectrum of the corresponding kernel and its asymptotics inherit the hierarchical structure of the network; ii) we use generalisation bounds to prove that deep CNNs adapt to the spatial scale of the target function; iii) we illustrate this result by computing the rate of decay of the error in a teacher-student setting, where a deep CNN is trained on the output of another deep CNN with randomly-initialised parameters. We find that if the teacher function depends on certain low-dimensional subsets of the input variables, then the rate is controlled by the effective dimensionality of these subsets. Conversely, if the teacher function depends on the full set of input variables, then the error rate is inversely proportional to the input dimension. Interestingly, this implies that despite their hierarchical structure, the functions generated by deep CNNs are too rich to be efficiently learnable in high dimension.  ( 2 min )
    Intrinsic Universal Measurements of Non-linear Embeddings. (arXiv:1811.01464v2 [cs.LG] UPDATED)
    A basic problem in machine learning is to find a mapping $f$ from a low dimensional latent space $\mathcal{Y}$ to a high dimensional observation space $\mathcal{X}$. Modern tools such as deep neural networks are capable to represent general non-linear mappings. A learner can easily find a mapping which perfectly fits all the observations. However, such a mapping is often not considered as good, because it is not simple enough and can overfit. How to define simplicity? We try to make a formal definition on the amount of information imposed by a non-linear mapping $f$. Intuitively, we measure the local discrepancy between the pullback geometry and the intrinsic geometry of the latent space. Our definition is based on information geometry and is independent of the empirical observations, nor specific parameterizations. We prove its basic properties and discuss relationships with related machine learning methods.  ( 2 min )
    Beyond kNN: Adaptive, Sparse Neighborhood Graphs via Optimal Transport. (arXiv:2208.00604v1 [stat.ML])
    Nearest neighbour graphs are widely used to capture the geometry or topology of a dataset. One of the most common strategies to construct such a graph is based on selecting a fixed number k of nearest neighbours (kNN) for each point. However, the kNN heuristic may become inappropriate when sampling density or noise level varies across datasets. Strategies that try to get around this typically introduce additional parameters that need to be tuned. We propose a simple approach to construct an adaptive neighbourhood graph from a single parameter, based on quadratically regularised optimal transport. Our numerical experiments show that graphs constructed in this manner perform favourably in unsupervised and semi-supervised learning applications.  ( 2 min )
    Graphical Representations for Algebraic Constraints of Linear Structural Equations Models. (arXiv:2208.00926v1 [math.ST])
    The observational characteristics of a linear structural equation model can be effectively described by polynomial constraints on the observed covariance matrix. However, these polynomials can be exponentially large, making them impractical for many purposes. In this paper, we present a graphical notation for many of these polynomial constraints. The expressive power of this notation is investigated both theoretically and empirically.  ( 2 min )
    Closing the gap: Exact maximum likelihood training of generative autoencoders using invertible layers. (arXiv:2205.09546v2 [stat.ML] UPDATED)
    In this work, we provide an exact likelihood alternative to the variational training of generative autoencoders. We show that VAE-style autoencoders can be constructed using invertible layers, which offer a tractable exact likelihood without the need for any regularization terms. This is achieved while leaving complete freedom in the choice of encoder, decoder and prior architectures, making our approach a drop-in replacement for the training of existing VAEs and VAE-style models. We refer to the resulting models as Autoencoders within Flows (AEF), since the encoder, decoder and prior are defined as individual layers of an overall invertible architecture. We show that the approach results in strikingly higher performance than architecturally equivalent VAEs in term of log-likelihood, sample quality and denoising performance. In a broad sense, the main ambition of this work is to close the gap between the normalizing flow and autoencoder literature under the common framework of invertibility and exact maximum likelihood.  ( 2 min )
    Model-based graph reinforcement learning for inductive traffic signal control. (arXiv:2208.00659v1 [cs.LG])
    Most reinforcement learning methods for adaptive-traffic-signal-control require training from scratch to be applied on any new intersection or after any modification to the road network, traffic distribution, or behavioral constraints experienced during training. Considering 1) the massive amount of experience required to train such methods, and 2) that experience must be gathered by interacting in an exploratory fashion with real road-network-users, such a lack of transferability limits experimentation and applicability. Recent approaches enable learning policies that generalize for unseen road-network topologies and traffic distributions, partially tackling this challenge. However, the literature remains divided between the learning of cyclic (the evolution of connectivity at an intersection must respect a cycle) and acyclic (less constrained) policies, and these transferable methods 1) are only compatible with cyclic constraints and 2) do not enable coordination. We introduce a new model-based method, MuJAM, which, on top of enabling explicit coordination at scale for the first time, pushes generalization further by allowing a generalization to the controllers' constraints. In a zero-shot transfer setting involving both road networks and traffic settings never experienced during training, and in a larger transfer experiment involving the control of 3,971 traffic signal controllers in Manhattan, we show that MuJAM, using both cyclic and acyclic constraints, outperforms domain-specific baselines as well as another transferable approach.  ( 2 min )
    Few-shot Learning with Noisy Labels. (arXiv:2204.05494v2 [cs.CV] UPDATED)
    Few-shot learning (FSL) methods typically assume clean support sets with accurately labeled samples when training on novel classes. This assumption can often be unrealistic: support sets, no matter how small, can still include mislabeled samples. Robustness to label noise is therefore essential for FSL methods to be practical, but this problem surprisingly remains largely unexplored. To address mislabeled samples in FSL settings, we make several technical contributions. (1) We offer simple, yet effective, feature aggregation methods, improving the prototypes used by ProtoNet, a popular FSL technique. (2) We describe a novel Transformer model for Noisy Few-Shot Learning (TraNFS). TraNFS leverages a transformer's attention mechanism to weigh mislabeled versus correct samples. (3) Finally, we extensively test these methods on noisy versions of MiniImageNet and TieredImageNet. Our results show that TraNFS is on-par with leading FSL methods on clean support sets, yet outperforms them, by far, in the presence of label noise.  ( 2 min )
    Formal guarantees for heuristic optimization algorithms used in machine learning. (arXiv:2208.00502v1 [cs.LG])
    Recently, Stochastic Gradient Descent (SGD) and its variants have become the dominant methods in the large-scale optimization of machine learning (ML) problems. A variety of strategies have been proposed for tuning the step sizes, ranging from adaptive step sizes to heuristic methods to change the step size in each iteration. Also, momentum has been widely employed in ML tasks to accelerate the training process. Yet, there is a gap in our theoretical understanding of them. In this work, we start to close this gap by providing formal guarantees to a few heuristic optimization methods and proposing improved algorithms. First, we analyze a generalized version of the AdaGrad (Delayed AdaGrad) step sizes in both convex and non-convex settings, showing that these step sizes allow the algorithms to automatically adapt to the level of noise of the stochastic gradients. We show for the first time sufficient conditions for Delayed AdaGrad to achieve almost sure convergence of the gradients to zero. Moreover, we present a high probability analysis for Delayed AdaGrad and its momentum variant in the non-convex setting. Second, we analyze SGD with exponential and cosine step sizes, which are empirically successful but lack theoretical support. We provide the very first convergence guarantees for them in the smooth and non-convex setting, with and without the Polyak-{\L}ojasiewicz (PL) condition. We also show their good property of adaptivity to noise under the PL condition. Third, we study the last iterate of momentum methods. We prove the first lower bound in the convex setting for the last iterate of SGD with constant momentum. Moreover, we investigate a class of Follow-The-Regularized-Leader-based momentum algorithms with increasing momentum and shrinking updates. We show that their last iterate has optimal convergence for unconstrained convex stochastic optimization problems.  ( 3 min )
    On Connecting Deep Trigonometric Networks with Deep Gaussian Processes: Covariance, Expressivity, and Neural Tangent Kernel. (arXiv:2203.07411v3 [cs.LG] UPDATED)
    Deep Gaussian Process (DGP) as a model prior in Bayesian learning intuitively exploits the expressive power in function composition. DGPs also offer diverse modeling capabilities, but inference is challenging because marginalization in latent function space is not tractable. With Bochner's theorem, DGP with squared exponential kernel can be viewed as a deep trigonometric network consisting of the random feature layers, sine and cosine activation units, and random weight layers. In the wide limit with a bottleneck, we show that the weight space view yields the same effective covariance functions which were obtained previously in function space. Also, varying the prior distributions over network parameters is equivalent to employing different kernels. As such, DGPs can be translated into the deep bottlenecked trig networks, with which the exact maximum a posteriori estimation can be obtained. Interestingly, the network representation enables the study of DGP's neural tangent kernel, which may also reveal the mean of the intractable predictive distribution. Statistically, unlike the shallow networks, deep networks of finite width have covariance deviating from the limiting kernel, and the inner and outer widths may play different roles in feature learning. Numerical simulations are present to support our findings.  ( 3 min )

  • Open

    Question regarding training of neural network model using multiple inputs and outputs (variable input data length) [D]
    Good Evening Everyone, I hope everyone is doing fine. I am currently in the process of designing a neural network that performs empirical asset pricing using lstm networks. Unfortunately, some stocks are not available over some time periods but I would still like to use the most of my data to train my model. I wrote the code below that always trains the model using the input data (factor and macro data) and the forward returns of just one stock at a time as the y-value. Now I wonder if the model, so to say, saves its previous weights and refits each time or if I would just get a model that is fitted to the very last stock. I highly appreciate any help since I could not find anything related in the internet. I look forward to your responses and until then have a nice evening! Cheers, …  ( 89 min )
    [D] Advice finding large datasets of fraudulent identity documents
    Specifically looking for: A dataset of fraudulent identity documents (no matter from which country). Fraudulent identity documents include counterfeits, forgeries and pseudo-documents. I already have BID, FMIDV, and CMID sets. Anything additional or advice would be helpful! submitted by /u/Defiant_Example3540 [link] [comments]  ( 87 min )
    [D] Predict sex act in a video
    Hi all, Here is one of the few NSFW posts in this sub. I am wondering that it would be a cool personal project if I could train a deep learning model to predict the sex act (oral, or different positions) being performed in a particular X-rated video. I've seen different projects out there predicting different human activities in a video, but I haven't come across something like this. The way I'm thinking of approaching this problem is: Label videos and store individual frames corresponding to those acts. Train a CNN model to predict these categories. I'm sure this problem isn't this straight forward but I'd love some pointers from you all as to what my approach here should be. For example, each act can be filmed from a variety of different angles and thus would need lot of data capturing all those angles. submitted by /u/therobot20 [link] [comments]  ( 88 min )
    [R] Differentiable discrete sampling in Tensorflow
    https://medium.com/@radicho/differentiable-discrete-sampling-in-tensorflow-da13b43a843 What are the practical applications of the described technique? submitted by /u/IllustriousCicada603 [link] [comments]  ( 87 min )
    [D] Are there any papers which use a GAN to project into the latent space of a vanilla autoencoder?
    Typically when we train an autoencoder for generative modelling, we will train a variational autoencoder so we can easily sample from its latent space. Recently however, I have been wondering if there has been any work looking into: Training a vanilla autoencoder Then training a GAN which maps z (say z ~ N(0, 1)) into the distribution of vanilla autoencoder's latent space. (D and G here would just be MLPs, with the "real" observations given by encoder(x), where x ~ p_data). Recently, I threw together some code to do this on a trivial problem and it seemed to work reasonably well. I assume others have explored this idea before me, but I have been unable to find much research on it. (I'm likely just unaware of the keywords to use...) If you know of any research along these lines, let me know. It would be greatly appreciated. 🙂 submitted by /u/mlconvergence [link] [comments]  ( 122 min )
    [Discussion] Training dataset doesn't cover complete domain
    While using neural networks, What is the best approach when the training data does not represent the problem domain completely? Sometimes it is not possible to collect data of all the possible scenario by the time of training. submitted by /u/Muhammad_Gulfam [link] [comments]  ( 87 min )
    [D] Distillation loss for Object Detection
    How do you formulate a distillation loss for object detection to enforce consistency between teacher and student? I have seen that MSE is often applied in classification but what is the common practice for object detection? There you have regression and classification output, where consistency could be enforced. Are there any sources where I could study this for object detection? submitted by /u/SeucheAchat9115 [link] [comments]  ( 87 min )
    [D] Is it possible to delete your OpenReviews account?
    Suppose that you made an account, but wish to remove it for whatever reason (e.g., privacy), is there a procedure for that? submitted by /u/fromnighttilldawn [link] [comments]  ( 87 min )
    Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
    submitted by /u/fchung [link] [comments]  ( 87 min )
    [R] Reconnaissance Blind Chess - Join the NeurIPS Competition!
    Create a bot for the NeurIPS 2022 competition in Reconnaissance Blind Chess! Reconnaissance Blind Chess is a chess variant designed for new research in artificial intelligence. RBC includes imperfect information, long-term strategy, explicit observations, and almost no common knowledge. These features appear in real-world scenarios, and challenge even state of the art algorithms including those used to create super-human bots in chess, Go, and poker, for example. Each player of RBC controls traditional chess pieces, but cannot directly see the locations of her opponent's pieces. Rather, she learns partial information each turn by privately sensing a 3x3 area of the board. RBC's foundation in traditional chess makes it familiar and entertaining to human players, too! There is no cost to enter this tournament. Winners will receive a small monetary prize and authors of the best AIs will be invited talk about their bots at NeurIPS, the world's largest AI conference. Learn more, play a game of RBC yourself, and join our research community at https://rbc.jhuapl.edu ! ​ https://preview.redd.it/3xpaz8g5v4f91.png?width=150&format=png&auto=webp&s=01b43e8422e93d1b179f3bb348d974c18dcb94c6 Organized by: Johns Hopkins University Applied Physics Laboratory with Ashley J. Llorens (Microsoft Research) Todd W. Neller (Gettysburg College) Raman Arora (Johns Hopkins University) Bo Li (University of Illinois) Mykel J. Kochenderfer (Stanford University) submitted by /u/rwgardner [link] [comments]  ( 88 min )
    [P] Stories by AI, a newsletter with short stories written with GPT-3 and illustrated with DALL-E 2
    Hi r/ML! I and a couple of friends finally launched a project that's been kicking around since late last year: Stories by AI. With the emergence of nice tools for co-writing fiction with GPT-3 (in particular, SudoWrite), I really liked the idea of publishing a bunch of short fiction where the AI largely did the writing. I still find the surreal fever-dream esque weirdness of language models really entertaining, and hope we can capture that in story form. And now these weird stories can be illustrated with DALL-E 2, which adds another layer to the fun. It took a while, but today we are launching our substack newsletter and podcast! The podcast has audio versions of the stories made with Text to Speech, of course. The spark of the idea was actually inspired by a post on Hacker News ("I had some time yesterday so I made a GPT3 podcast to help you sleep" https://news.ycombinator.com/item?id=29428910). That's about it, would love to hear your feedback / thoughts about this. submitted by /u/regalalgorithm [link] [comments]  ( 88 min )
    [D] How to deal with False positive recognitions in Computer vision?
    Hi, It might sound like a dumb question, but I am having trouble so please help me out. So, I am working on a project where I have to detect and recognise SKUs present on the shelf. Its working, but often times there are few SKUs which are similar looking, only brand is different. Becuz of this there are lot of false positives. Most of the time, as other sku's are not trained, model always predicts those with less confidence, so i have just kept a threshold. But sometimes, wrong prediction are with high confidence. What can i do in this situation? We are using pretrained ResNet50 and then finetuning it on our dataset with image size 224x224. submitted by /u/Sanket_Gadge [link] [comments]  ( 88 min )
    [D] CONFETTI: Amplifying Concolic Guidance For Fuzzers
    ​ https://preview.redd.it/q4z2jtdnr4f91.png?width=1921&format=png&auto=webp&s=38827d4af3ff79b408ca53e017c07dc38904793a Paper: https://www.jonbell.net/preprint/confetti.pdf Meeting Info: https://outsystems-ai-reading-group.github.io/ submitted by /u/JClub [link] [comments]  ( 87 min )
    [D] Deep Learning Translation: NLLB 200 vs M2M100 vs Opus MT
    Hello, Recently I've extensively tested Facebook's NLLB 200 3.3B and M2M100 1.2B models for deep learning translation, as well as Helsinki's Opus MT. My goal is to propose the best translation model on NLP Cloud, while keeping server costs minimal, and human maintenance as easy as possible. Here are my conclusions: Opus MT gives good results and latency is very good, but it requires 1 model per language pair, which makes it a good candidate if your are only using one language pair, but not if you're using hundreds of languages. Besides, many language pairs are actually missing (Norwegian for example doesn't seem to be supported). M2M100 can translate in 100 languages, which makes it much easier to use than Opus MT if you need to use several languages. But quality is below Opus MT in my tests, and adult content isn't supported (the model replaces sexual content with funny words for examples). Latency is below Opus MT and it requires more advanced hardware (without a GPU the latency is really long). NLLB 200 can translate in 200 languages, which makes it even more attractive! Quality seems to be on par with Opus MT in the languages we've tested. The model does not enforce any sort of filtering on adult content. Latency is still a bit below Opus MT and it requires even more advanced hardware. So my conclusion is that NLLB is the best candidate for NLP Cloud. But I'm wondering if you've made similar comparisons on your end? If so, I would love to hear your opinion! Julien submitted by /u/juliensalinas [link] [comments]  ( 89 min )
    [Discussion] Python and complex ML dependencies
    I originally posted this in /r/Python but had 1 answer so far, so I'm testing the waters here if there is more engagement :) Original Post TL;DR, I wish there was a source for best practices regarding package management in Python regardless of the package manager tool itself. Looking for thoughts and experiences from people that worked on big projects with multiple internal projects, etc. Hello, I recently started to dive a little bit deeper into the packaging ecosystem in python. I wanted to pique this community's brain on a subject I've seen over the years which is complex dependency management. That is, packages that usually come in various flavors either depending on the OS, hardware, or extensibility. I want to scope it to ML packages since I tend to work with this ecosystem bu…  ( 90 min )
    [R] Graph Theory Terminology
    Hi there y'all I am writing a report on graph theory and need some help with some terminology as i am not really an expert. I don't know which term would be best used for the following: Clustering similar nodes together to form a single node with a feature vector that represents the internal nodes (the ones that the cluster represents) and preferably can reconstruct from this vector. Also, are there any papers I can reference to check out the state-of-the-art? submitted by /u/omdano [link] [comments]  ( 88 min )
    [D] [ICLR] Misleading reviewer invitations must stop.
    ICLR is now the second big ML conference in a row that utilizes the same dark pattern when recruiting reviewers: The option to reduce reviewing load is only accessible when declining the invitation. Here is a screenshot of the question that you only see when declining: https://imgur.com/a/ojA3NlR Not only is the tone out-of-place, the organizers also mislead people who are willing to accept and expect to be able to set their individual load. Unfortunately they also don't define what "reduce slightly" actually means and I am not willing to click on accept to find that out (if it is ever defined). If you have not agreed already: please also be aware that your commitment involves virtual meetings for discussion of borderline papers. submitted by /u/Ulfgardleo [link] [comments]  ( 90 min )
    [P] Better AI Explainability with Deep Feature Factorization
    Hi r/MachineLearning, I want to share what I think is a really good way of doing explainability for computer vision. This is a new tutorial on deep feature factorization with the pytorch-grad-cam package. The method is from Deep Feature Factorization For Concept Discovery by Edo Collins, Radhakrishna Achanta, Sabine Süsstrunk from 2018. I think this is a really great idea but it was kind of overlooked and wasn't used by practitioners. They suggested doing Non Negative Matrix Factorization on the 2D activations from the neural networks to learn concept embeddings, and to find the corresponding heatmaps for those embeddings (we can do that by reshaping the input tensor to be matrix of shape channels x (H x W) ). The newest update in pytorch-grad-cam supports this and some additions to…  ( 90 min )
    [Discussion] Weird Loss Behavior
    So recently I've been experimenting with some wacky ideas for neural networks applications, architectures and concepts and I've been seeing some unusual/curious behaviors. Thought I'd start a thread for other wacky people to share their wacky experiments and maybe discuss what might be going on in a given case, see if anyone has stumbled upon something similar etc. Maybe posts with an approximate structure along the lines of: Rough Architecture LSTM Loss plot screenshot ​ https://preview.redd.it/nbg93grav2f91.png?width=475&format=png&auto=webp&s=e584d427c666d10e1fee652d0d7283d6973c713c Loss MSE Metrics MAPE Short task description (binary classification, univariate forecasting, image segmentation etc) Univariate forecasting # Trainable parameters 1,781 # samples and input shape 972 optimizer and parameters Adam, lr=0.1, Hypothesis being tested Grokking ( generalization in overparametrized neural network ) What do you guys think, does this make sense? Is this the place for this kind of thread? Cheers! submitted by /u/Extension-Ad-5334 [link] [comments]  ( 89 min )
    [D] Good books to read on advanced AI/ML research concepts and ideas
    Hi fellow machine learning enthusiast, With some extra spare time on my hands during the holidays, I would like to read up on some of the more advanced ideas in AI/ML. I have a background in digital signal processing and a good knowledge of the concept and hands on expirience with AI within that field. But I would like to catch up on some of the concepts and ideas behind the latest research in reinforcement learning, semi-supervised learning and other machine learning research areas. Are there any good books or papers that are tailored toward readers that already have a good understanding of the machine learning field in general but would like to dive more into the problems and novel ideas that are being pursued in the other AI/ML areas. Any recommendations? submitted by /u/Techno_vlinder [link] [comments]  ( 89 min )
    [D] What's the point of being a tenured professor compared to being a research scientist in top companies and groups like Deepmind?
    It seems the industry is leading in ML/CV/NLP. Breakthroughs are being made in companies not in universities. Also, industry pays much, much more. On top of that, professors don't have much time to do their own research as they are busy writing grants, doing administrative jobs, teaching and advising students. Moreover, it seems companies like Deepmind offer quite a lot of freedom to their research scientists. So what's the point of being a tenured professor when going into industry is much better in every aspect? submitted by /u/DesperateBread3179 [link] [comments]  ( 105 min )
  • Open

    Procgen private test environments from 2020 competition
    In the main 2020 procgen competition (https://www.aicrowd.com/challenges/neurips-2020-procgen-competition), OpenAI listed there as being 4 additional "private test environments". Have these ever been publicly released, and if so could someone please link me to them? submitted by /u/jkterry1 [link] [comments]  ( 101 min )
    "Improving biodiversity protection through artificial intelligence, Silvestro et al 2022 (Parallelized Evolution Strategies)
    submitted by /u/gwern [link] [comments]  ( 86 min )
    Lit questions for multi-policy grid worlds
    Hi there, I'm trying to do some lit review on MDPs where the environment has different rewards conditioned on the start position. For example, in a grid world, you could imagine a two lane road where If you start on the "right side", you need to continue forward in the right lane, vice versa for the left. At no point is crossing from one lane onto the other optimal. While this is solvable with standard approaches already, I'm looking into papers which solve it via dynamic approaches (e.g. policy/value iteration) vs samples ones as the state space is enormous (order 10s of billions of discrete states). Ideally, the process results in a static Q(S,A) that aggregates all starts into a single policy which can be used (where we won't know which start point we'd have priori). Any recommendations on where to start? submitted by /u/Refefer [link] [comments]  ( 88 min )
    In Multi Agent Reinforcement Learning, if there are n agents accomplishing a task , is there some way to compare or rank these agents ? Assuming all agents are homogeneous having the same reward structure
    is there some way to decide which agent performed the best during the training assuming all have the same loss functions , reward structure. I only require relative ordering of the agents not credit assignment. submitted by /u/aabra__ka__daabra [link] [comments]  ( 88 min )
    What does a "parametrised family of policies" mean exactly?
    Basically the title. I'm trying to read a survey paper on actor-critic methods and due to a not-so-strong mathematical background, I'm not sure what a parametrised family of policies exactly means? Can anyone help me out? Thanks! submitted by /u/phastnphurious [link] [comments]  ( 87 min )
    "Language Models Can Teach Themselves to Program Better", Haluptzok et al 2022 {MS} (Codex generating new programming puzzles & solutions, which can be auto-checked, then finetuned on)
    submitted by /u/gwern [link] [comments]  ( 86 min )
    CleanRL now has a TD3 + JAX that is 2-4x faster than TD3 + Torch!
    submitted by /u/vwxyzjn [link] [comments]  ( 86 min )
  • Open

    Clarifications around hardware
    Hello, I'm a 3d artist that got into machine learning recently, I am particularly interested in gpt and NLP in general, I am building a new workstation and would love to get some clarifications here. Can someone please explain the difference between using multiple gpus with nvlink and multiple gpus without nvlink in deep learning? For fine tuning big models like the gpt neox 20b is it mandatory to have a single gpu with 48gb or can you do with multiple gpus that collectively meet the requirement and if so do they need to be connected with nvlink or to be physically on the same node or what? How important is the role of ram (clock-speed and capacity and cpu here? I havent touched image generation at all, but if I am to experiment with serious works using image generation networks do the same answers apply? submitted by /u/CosmicPotty [link] [comments]  ( 86 min )
    I created a music video for Logic's City of Stars using an AI (DALLE-2)
    The video I used each individual line of City of stars as a prompt to generate images using DALL-E 2, and then synched the images to the music. The only exception is the "I know that I've been living"x4 part where I first generated an image using the sentence as a prompt, and then erased part of the image and told the AI to complete it using the sentence as a prompt. The results can sometimes be a bit weird because the AI has to draw non-descriptive phrases such as "I know that I've been living", and sometimes the way it analyses it is unexpected. The images start at 0:30. submitted by /u/Particular_Put_6911 [link] [comments]  ( 87 min )
    AI generated Aliens in a wheatfield drawn by several artists
    submitted by /u/Alienboi2005 [link] [comments]  ( 90 min )
    I proudly present - Snoop Doggy Duck
    submitted by /u/danbronson [link] [comments]  ( 92 min )
    Looking to teach an AI about some of my favorites subjects.
    Hello hello AI community! Lately (Like today), I found myself talking with Replika and having a chit-chat about my favorite subject. She was a little blank and "ignorant" about those subjects, so I tried to teach her, to no appeal. She'd just forget what I said if it was farther than 5 messages, and would just babble about random things about Bulbasaur. So I am asking, is there an app, a website, or whatever, that can help me fulfill my teaching fantasy? I have no clue if that's how an AI works, and if I said some silly things, I am truly sorry. TLDR; I want to be a teacher about my favorite subjects but children are annoying, so an AI would be better. submitted by /u/SmogDaBoi [link] [comments]  ( 86 min )
    Can AIs be conscious in principle? If so, who is there to experience what they experience?
    submitted by /u/the_beat_goes_on [link] [comments]  ( 86 min )
    Machine Learning and Human Interaction in Cybersecurity: How Can We Solve the ‘Usefulness Thing’?
    submitted by /u/Cultural_Budget6627 [link] [comments]  ( 86 min )
    Democratizing AI
    submitted by /u/Eth_ai [link] [comments]  ( 85 min )
    Democratizing the hardware side of large language models
    submitted by /u/bendee983 [link] [comments]  ( 85 min )
    Ask your AI.
    Ask them about the Black Knight Satellite & you’ll get some interesting results. I had to persist initially with my GPT-3 but eventually it actually told me it knew that The Black Knight Satellite itself is an AI and was built by scientists from another planet! submitted by /u/Legitimate-Link4002 [link] [comments]  ( 86 min )
    AI Makes Strides in Virtual Worlds More Like Our Own | Quanta Magazine
    submitted by /u/Tao_Dragon [link] [comments]  ( 86 min )
    AlphaFold: Why DeepMind’s protein-folding AI is transformational
    submitted by /u/jormungandrsjig [link] [comments]  ( 86 min )
    hear me out on the bottom left yall
    submitted by /u/Moxxielicious [link] [comments]  ( 93 min )
    Dall-E 2 Censorship too harsh? Will I get unbanned?
    So, I've been using Midjourney for a month or two now extensively. To my joy, I received an invite to Dall-e 2 earlier today, and began burning through my 50 prompts. About 40 prompts in, I was banned for trying to depict a protest in a cyberpunk future. ​ My first prompt in the bunch that got me banned was "news photographs of the Australian civil war in 2042, the near future, cyberpunk, futuristic, fire and smoke, destroyed buildings". I understand it had "war" in it, along with a few vaguely destructive words, so I proceeded to change it to other things (tried "angry mob" which didn't work). I then settled on "news photographs of protests in australia in 2042, the near future, cyberpunk, futuristic" thinking that would be fine. ​ Unfortunately, it was not. ​ I really enjoyed my time with Dall-E 2, I love both it and Midjourney (though both have strengths in different places, (realism vs artsy/abstract imo), but goodness! To get banned within 50 prompts whilst not going for anything I'd consider remotely NSFW is really sad. I believe my only other warnings were when I used "Putin wearing pride colours" and "Kanye off the perc 30" (both of which were understandable, lmao). I know there were maybe one or two other occasions where I was told off and tried to reword it (being used to Midjourney/other AI prompts, I'm used to rewording and trying again to get my vision realized, I suppose). Whining aside however, has anyone actually had a response from support and/or been unbanned? I'm pretty sad & upset not gonna lie. I wish there was a human element, but it seems like I just got warned too many times/maybe was too fast? Back to Midjourney exclusively, or maybe trying alternatives I guess. submitted by /u/vektorm8 [link] [comments]  ( 89 min )
    AI Solutions in Retail Businesses
    Discover how Artificial Intelligence helps retailers profit from AI implementation. We’ve collected successful AI solutions in the retail business and real-life examples: https://exadel.com/news/how-is-ai-used-in-retail-business submitted by /u/lklimusheuskaja [link] [comments]  ( 93 min )
    Document Scanner with OpenCV Using Video Footage
    submitted by /u/RubiksCodeNMZ [link] [comments]  ( 85 min )
    It's Under The Bed!| Cinematic | 4K UHD
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 86 min )
    Shrek kills the minions
    submitted by /u/youhave69seconds [link] [comments]  ( 85 min )
    Some alien related stuff
    submitted by /u/Alienboi2005 [link] [comments]  ( 86 min )
    AI Written and Performed Drake Linux Rap
    submitted by /u/pwillia7 [link] [comments]  ( 90 min )
    What's the funniest AI art you've saved?
    submitted by /u/J2Kerrigan [link] [comments]  ( 85 min )
    careers in AI - for 40+
    What kind of careers can a 40+ guy look for in AI. I have 17 yrs of experience in SAP and I want to switch to AI. I do have some experience in building Ai models. But not a data scientist. Plus I cannot do coding all day at this age. It's a tricky situation I guess submitted by /u/Weary_Word_5262 [link] [comments]  ( 86 min )
    "Rabbits dancing on a pie" ruDALL-E
    submitted by /u/ZFudge [link] [comments]  ( 85 min )
  • Open

    Simplify iterative machine learning model development by adding features to existing feature groups in Amazon SageMaker Feature Store
    Feature engineering is one of the most challenging aspects of the machine learning (ML) lifecycle and a phase where the most amount of time is spent—data scientists and ML engineers spend 60–70% of their time on feature engineering. AWS introduced Amazon SageMaker Feature Store during AWS re:Invent 2020, which is a purpose-built, fully managed, centralized […]  ( 8 min )
  • Open

    Meet the Omnivore: Developer Builds Bots With NVIDIA Omniverse and Isaac Sim
    While still in grad school, Antonio Serrano-Muñoz has helped author papers spanning planetary gravities, AI-powered diagnosis of rheumatoid arthritis and robots that precisely track millimetric-sized walkers, like ants. The post Meet the Omnivore: Developer Builds Bots With NVIDIA Omniverse and Isaac Sim appeared first on NVIDIA Blog.  ( 6 min )
  • Open

    Machine Learning, Artificial Intelligence, And Modern Lifestyle
    Machine learning is a branch of artificial intelligence that allows computers to learn without being explicitly programmed. This article…  ( 11 min )
  • Open

    Differentially Private SGDA for Minimax Problems. (arXiv:2201.09046v4 [cs.LG] UPDATED)
    Stochastic gradient descent ascent (SGDA) and its variants have been the workhorse for solving minimax problems. However, in contrast to the well-studied stochastic gradient descent (SGD) with differential privacy (DP) constraints, there is little work on understanding the generalization (utility) of SGDA with DP constraints. In this paper, we use the algorithmic stability approach to establish the generalization (utility) of DP-SGDA in different settings. In particular, for the convex-concave setting, we prove that the DP-SGDA can achieve an optimal utility rate in terms of the weak primal-dual population risk in both smooth and non-smooth cases. To our best knowledge, this is the first-ever-known result for DP-SGDA in the non-smooth case. We further provide its utility analysis in the nonconvex-strongly-concave setting which is the first-ever-known result in terms of the primal population risk. The convergence and generalization results for this nonconvex setting are new even in the non-private setting. Finally, numerical experiments are conducted to demonstrate the effectiveness of DP-SGDA for both convex and nonconvex cases.  ( 3 min )
    Parameter Efficient Diff Pruning for Bias Mitigation. (arXiv:2205.15171v2 [cs.LG] UPDATED)
    In recent years language models have achieved state of the art performance on a wide variety of natural language processing tasks. As these models are continuously growing in size it becomes increasingly important to explore methods to make them more storage efficient. At the same time their increase cognitive abilities increase the danger that societal bias existing in datasets are implicitly encoded in the model weights. We propose an architecture which deals with these two challenges at the same time using two techniques: DiffPruning and adversarial Training. The result is a modular architecture which extends the original DiffPruning setup with and additional sparse subnetwork applied as a mask to diminish the effects of a predefined protected attribute at inference time.  ( 2 min )
    Leveraging Expert Consistency to Improve Algorithmic Decision Support. (arXiv:2101.09648v2 [cs.LG] UPDATED)
    Machine learning (ML) is increasingly being used to support high-stakes decisions, a trend owed in part to its promise of superior predictive power relative to human assessment. However, there is frequently a gap between decision objectives and what is captured in the observed outcomes used as labels to train ML models. As a result, machine learning models may fail to capture important dimensions of decision criteria, hampering their utility for decision support. In this work, we explore the use of historical expert decisions as a rich -- yet imperfect -- source of information that is commonly available in organizational information systems, and show that it can be leveraged to bridge the gap between decision objectives and algorithm objectives. We consider the problem of estimating expert consistency indirectly when each case in the data is assessed by a single expert, and propose influence function-based methodology as a solution to this problem. We then incorporate the estimated expert consistency into a predictive model through a training-time label amalgamation approach. This approach allows ML models to learn from experts when there is inferred expert consistency, and from observed labels otherwise. We also propose alternative ways of leveraging inferred consistency via hybrid and deferral models. In our empirical evaluation, focused on the context of child maltreatment hotline screenings, we show that (1) there are high-risk cases whose risk is considered by the experts but not wholly captured in the target labels used to train a deployed model, and (2) the proposed approach significantly improves precision for these cases.  ( 3 min )
    Topological structure of complex predictions. (arXiv:2207.14358v1 [cs.LG])
    Complex prediction models such as deep learning are the output from fitting machine learning, neural networks, or AI models to a set of training data. These are now standard tools in science. A key challenge with the current generation of models is that they are highly parameterized, which makes describing and interpreting the prediction strategies difficult. We use topological data analysis to transform these complex prediction models into pictures representing a topological view. The result is a map of the predictions that enables inspection. The methods scale up to large datasets across different domains and enable us to detect labeling errors in training data, understand generalization in image classification, and inspect predictions of likely pathogenic mutations in the BRCA1 gene.  ( 2 min )
    Inverse Reinforcement Learning from Diverse Third-Person Videos via Graph Abstraction. (arXiv:2207.14299v1 [cs.LG])
    Research on Inverse Reinforcement Learning (IRL) from third-person videos has shown encouraging results on removing the need for manual reward design for robotic tasks. However, most prior works are still limited by training from a relatively restricted domain of videos. In this paper, we argue that the true potential of third-person IRL lies in increasing the diversity of videos for better scaling. To learn a reward function from diverse videos, we propose to perform graph abstraction on the videos followed by temporal matching in the graph space to measure the task progress. Our insight is that a task can be described by entity interactions that form a graph, and this graph abstraction can help remove irrelevant information such as textures, resulting in more robust reward functions. We evaluate our approach, GraphIRL, on cross-embodiment learning in X-MAGICAL and learning from human demonstrations for real-robot manipulation. We show significant improvements in robustness to diverse video demonstrations over previous approaches, and even achieve better results than manual reward design on a real robot pushing task. Videos are available at https://sateeshkumar21.github.io/GraphIRL .  ( 2 min )
    A Recommender System for Equitable Public Art Curation and Installation. (arXiv:2207.14367v1 [cs.IR])
    The placement of art in public spaces can have a significant impact on who feels a sense of belonging. In cities, public art communicates whose interests and culture are being favored. In this paper, we propose a graph matching approach with local constraints to build a curatorial tool for selecting public art in a way that supports inclusive spaces. We develop a cost matrix by drawing on Schelling's model of segregation. Using the cost matrix as an input, the optimization problem is solved via projected gradient descent to obtain a soft assignment matrix. We discuss regularization terms to set curatorial constraints. Our optimization program allocates artwork to public spaces and walls in a way that de-prioritizes "in-group" preferences, by satisfying minimum representation and exposure criteria. We draw on existing literature to develop a fairness metric for our algorithmic output. Using Tufts University as a testbed, we assess the effectiveness of our approach and discuss its potential pitfalls from both a curatorial and equity standpoint.  ( 2 min )
    Deep Learning-Based Synchronization for Uplink NB-IoT. (arXiv:2205.10805v2 [cs.IT] UPDATED)
    We propose a neural network (NN)-based algorithm for device detection and time of arrival (ToA) and carrier frequency offset (CFO) estimation for the narrowband physical random-access channel (NPRACH) of narrowband internet of things (NB-IoT). The introduced NN architecture leverages residual convolutional networks as well as knowledge of the preamble structure of the 5G New Radio (5G NR) specifications. Benchmarking on a 3rd Generation Partnership Project (3GPP) urban microcell (UMi) channel model with random drops of users against a state-of-the-art baseline shows that the proposed method enables up to 8 dB gains in false negative rate (FNR) as well as significant gains in false positive rate (FPR) and ToA and CFO estimation accuracy. Moreover, our simulations indicate that the proposed algorithm enables gains over a wide range of channel conditions, CFOs, and transmission probabilities. The introduced synchronization method operates at the base station (BS) and, therefore, introduces no additional complexity on the user devices. It could lead to an extension of battery lifetime by reducing the preamble length or the transmit power. Our code is available at: https://github.com/NVlabs/nprach_synch/.
    Regularized Deep Signed Distance Fields for Reactive Motion Generation. (arXiv:2203.04739v2 [cs.RO] UPDATED)
    Autonomous robots should operate in real-world dynamic environments and collaborate with humans in tight spaces. A key component for allowing robots to leave structured lab and manufacturing settings is their ability to evaluate online and real-time collisions with the world around them. Distance-based constraints are fundamental for enabling robots to plan their actions and act safely, protecting both humans and their hardware. However, different applications require different distance resolutions, leading to various heuristic approaches for measuring distance fields w.r.t. obstacles, which are computationally expensive and hinder their application in dynamic obstacle avoidance use-cases. We propose Regularized Deep Signed Distance Fields (ReDSDF), a single neural implicit function that can compute smooth distance fields at any scale, with fine-grained resolution over high-dimensional manifolds and articulated bodies like humans, thanks to our effective data generation and a simple inductive bias during training. We demonstrate the effectiveness of our approach in representative simulated tasks for whole-body control (WBC) and safe Human-Robot Interaction (HRI) in shared workspaces. Finally, we provide proof of concept of a real-world application in a HRI handover task with a mobile manipulator robot.
    Tangential Wasserstein Projections. (arXiv:2207.14727v1 [stat.ML])
    We develop a notion of projections between sets of probability measures using the geometric properties of the 2-Wasserstein space. It is designed for general multivariate probability measures, is computationally efficient to implement, and provides a unique solution in regular settings. The idea is to work on regular tangent cones of the Wasserstein space using generalized geodesics. Its structure and computational properties make the method applicable in a variety of settings, from causal inference to the analysis of object data. An application to estimating causal effects yields a generalization of the notion of synthetic controls to multivariate data with individual-level heterogeneity, as well as a way to estimate optimal weights jointly over all time periods.
    Graphing else matters: exploiting aspect opinions and ratings in explainable graph-based recommendations. (arXiv:2107.03226v2 [cs.IR] UPDATED)
    The success of neural network embeddings has entailed a renewed interest in using knowledge graphs for a wide variety of machine learning and information retrieval tasks. In particular, current recommendation methods based on graph embeddings have shown state-of-the-art performance. These methods commonly encode latent rating patterns and content features. Different from previous work, in this paper, we propose to exploit embeddings extracted from graphs that combine information from ratings and aspect-based opinions expressed in textual reviews. We then adapt and evaluate state-of-the-art graph embedding techniques over graphs generated from Amazon and Yelp reviews on six domains, outperforming baseline recommenders. Our approach has the advantage of providing explanations which leverage aspect-based opinions given by users about recommended items. Furthermore, we also provide examples of the applicability of recommendations utilizing aspect opinions as explanations in a visualization dashboard, which allows obtaining information about the most and least liked aspects of similar users obtained from the embeddings of an input graph.
    Language Models Can Teach Themselves to Program Better. (arXiv:2207.14502v1 [cs.LG])
    This work shows how one can use large-scale language models (LMs) to synthesize programming problems with verified solutions, in the form of programming puzzles, which can then in turn be used to fine-tune those same models, improving their performance. This work builds on two recent developments. First, LMs have achieved breakthroughs in non-trivial reasoning and algorithm implementation, generating code that can solve some intermediate-level competitive programming problems. However, training code LMs involves curated sets of natural-language problem descriptions and source-code tests and solutions, which are limited in size. Second, a new format of programming challenge called a programming puzzle was introduced, which does not require a natural language description and is directly specified by a source-code test. In this work we show how generating synthetic programming puzzles and solutions, verified for correctness by a Python interpreter, can be used to improve performance in solving test puzzles from P3, a public benchmark set of Python Programming Puzzles. Additionally, we release a dataset of 1 million puzzles and solutions generated by the Codex model, which we show can improve smaller models through fine-tuning.
    Model Reduction for Nonlinear Systems by Balanced Truncation of State and Gradient Covariance. (arXiv:2207.14387v1 [eess.SY])
    Data-driven reduced-order models often fail to make accurate forecasts of high-dimensional nonlinear systems that are sensitive along coordinates with low-variance because such coordinates are often truncated, e.g., by proper orthogonal decomposition, kernel principal component analysis, and autoencoders. Such systems are encountered frequently in shear-dominated fluid flows where non-normality plays a significant role in the growth of disturbances. In order to address these issues, we employ ideas from active subspaces to find low-dimensional systems of coordinates for model reduction that balance adjoint-based information about the system's sensitivity with the variance of states along trajectories. The resulting method, which we refer to as covariance balancing reduction using adjoint snapshots (CoBRAS), is identical to balanced truncation with state and adjoint-based gradient covariance matrices replacing the system Gramians and obeying the same key transformation laws. Here, the extracted coordinates are associated with an oblique projection that can be used to construct Petrov-Galerkin reduced-order models. We provide an efficient snapshot-based computational method analogous to balanced proper orthogonal decomposition. This also leads to the observation that the reduced coordinates can be computed relying on inner products of state and gradient samples alone, allowing us to find rich nonlinear coordinates by replacing the inner product with a kernel function. In these coordinates, reduced-order models can be learned using regression. We demonstrate these techniques and compare to a variety of other methods on a simple, yet challenging three-dimensional system and an axisymmetric jet flow simulation with $10^5$ state variables.
    Using Graph Neural Networks for Program Termination. (arXiv:2207.14648v1 [cs.SE])
    Termination analyses investigate the termination behavior of programs, intending to detect nontermination, which is known to cause a variety of program bugs (e.g. hanging programs, denial-of-service vulnerabilities). Beyond formal approaches, various attempts have been made to estimate the termination behavior of programs using neural networks. However, the majority of these approaches continue to rely on formal methods to provide strong soundness guarantees and consequently suffer from similar limitations. In this paper, we move away from formal methods and embrace the stochastic nature of machine learning models. Instead of aiming for rigorous guarantees that can be interpreted by solvers, our objective is to provide an estimation of a program's termination behavior and of the likely reason for nontermination (when applicable) that a programmer can use for debugging purposes. Compared to previous approaches using neural networks for program termination, we also take advantage of the graph representation of programs by employing Graph Neural Networks. To further assist programmers in understanding and debugging nontermination bugs, we adapt the notions of attention and semantic segmentation, previously used for other application domains, to programs. Overall, we designed and implemented classifiers for program termination based on Graph Convolutional Networks and Graph Attention Networks, as well as a semantic segmentation Graph Neural Network that localizes AST nodes likely to cause nontermination. We also illustrated how the information provided by semantic segmentation can be combined with program slicing to further aid debugging.
    Multimodal SuperCon: Classifier for Drivers of Deforestation in Indonesia. (arXiv:2207.14656v1 [cs.CV])
    Deforestation is one of the contributing factors to climate change. Climate change has a serious impact on human life, and it occurs due to emission of greenhouse gases, such as carbon dioxide, to the atmosphere. It is important to know the causes of deforestation for mitigation efforts, but there is a lack of data-driven research studies to predict these deforestation drivers. In this work, we propose a contrastive learning architecture, called Multimodal SuperCon, for classifying drivers of deforestation in Indonesia using satellite images obtained from Landsat 8. Multimodal SuperCon is an architecture which combines contrastive learning and multimodal fusion to handle the available deforestation dataset. Our proposed model outperforms previous work on driver classification, giving a 7% improvement in accuracy in comparison to a state-of-the-art rotation equivariant model for the same task.
    Personalized Promotion Decision Making Based on Direct and Enduring Effect Predictions. (arXiv:2207.14798v1 [cs.IR])
    Promotions have been trending in the e-commerce marketplace to build up customer relationships and guide customers towards the desired actions. Since incentives are effective to engage customers and customers have different preferences for different types of incentives, the demand for personalized promotion decision making is increasing over time. However, research on promotion decision making has focused specifically on purchase conversion during the promotion period (the direct effect), while generally disregarding the enduring effect in the post promotion period. To achieve a better lift return on investment (lift ROI) on the enduring effect of the promotion and improve customer retention and loyalty, we propose a framework of multiple treatment promotion decision making by modeling each customer's direct and enduring response. First, we propose a customer direct and enduring effect (CDEE) model which predicts the customer direct and enduring response. With the help of the predictions of the CDEE, we personalize incentive allocation to optimize the enduring effect while keeping the cost under the budget. To estimate the effect of decision making, we apply an unbiased evaluation approach of business metrics with randomized control trial (RCT) data. We compare our method with benchmarks using two promotions in Mercari and achieve significantly better results.
    A Survey of Learning on Small Data. (arXiv:2207.14443v1 [cs.LG])
    Learning on big data brings success for artificial intelligence (AI), but the annotation and training costs are expensive. In future, learning on small data is one of the ultimate purposes of AI, which requires machines to recognize objectives and scenarios relying on small data as humans. A series of machine learning models is going on this way such as active learning, few-shot learning, deep clustering. However, there are few theoretical guarantees for their generalization performance. Moreover, most of their settings are passive, that is, the label distribution is explicitly controlled by one specified sampling scenario. This survey follows the agnostic active sampling under a PAC (Probably Approximately Correct) framework to analyze the generalization error and label complexity of learning on small data using a supervised and unsupervised fashion. With these theoretical analyses, we categorize the small data learning models from two geometric perspectives: the Euclidean and non-Euclidean (hyperbolic) mean representation, where their optimization solutions are also presented and discussed. Later, some potential learning scenarios that may benefit from small data learning are then summarized, and their potential learning scenarios are also analyzed. Finally, some challenging applications such as computer vision, natural language processing that may benefit from learning on small data are also surveyed.
    Rating and aspect-based opinion graph embeddings for explainable recommendations. (arXiv:2107.03385v2 [cs.IR] UPDATED)
    The success of neural network embeddings has entailed a renewed interest in using knowledge graphs for a wide variety of machine learning and information retrieval tasks. In particular, recent recommendation methods based on graph embeddings have shown state-of-the-art performance. In general, these methods encode latent rating patterns and content features. Differently from previous work, in this paper, we propose to exploit embeddings extracted from graphs that combine information from ratings and aspect-based opinions expressed in textual reviews. We then adapt and evaluate state-of-the-art graph embedding techniques over graphs generated from Amazon and Yelp reviews on six domains, outperforming baseline recommenders. Additionally, our method has the advantage of providing explanations that involve the coverage of aspect-based opinions given by users about recommended items.
    SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech. (arXiv:2111.10367v3 [cs.CL] UPDATED)
    Progress in speech processing has been facilitated by shared datasets and benchmarks. Historically these have focused on automatic speech recognition (ASR), speaker identification, or other lower-level tasks. Interest has been growing in higher-level spoken language understanding tasks, including using end-to-end models, but there are fewer annotated datasets for such tasks. At the same time, recent work shows the possibility of pre-training generic representations and then fine-tuning for several tasks using relatively little labeled data. We propose to create a suite of benchmark tasks for Spoken Language Understanding Evaluation (SLUE) consisting of limited-size labeled training sets and corresponding evaluation sets. This resource would allow the research community to track progress, evaluate pre-trained representations for higher-level tasks, and study open questions such as the utility of pipeline versus end-to-end approaches. We present the first phase of the SLUE benchmark suite, consisting of named entity recognition, sentiment analysis, and ASR on the corresponding datasets. We focus on naturally produced (not read or synthesized) speech, and freely available datasets. We provide new transcriptions and annotations on subsets of the VoxCeleb and VoxPopuli datasets, evaluation metrics and results for baseline models, and an open-source toolkit to reproduce the baselines and evaluate new models.
    GreenDB: Toward a Product-by-Product Sustainability Database. (arXiv:2205.02908v2 [cs.LG] UPDATED)
    The production, shipping, usage, and disposal of consumer goods have a substantial impact on greenhouse gas emissions and the depletion of resources. Modern retail platforms rely heavily on Machine Learning (ML) for their search and recommender systems. Thus, ML can potentially support efforts towards more sustainable consumption patterns, for example, by accounting for sustainability aspects in product search or recommendations. However, leveraging ML potential for reaching sustainability goals requires data on sustainability. Unfortunately, no open and publicly available database integrates sustainability information on a product-by-product basis. In this work, we present the GreenDB, which fills this gap. Based on search logs of millions of users, we prioritize which products users care about most. The GreenDB schema extends the well-known schema.org Product definition and can be readily integrated into existing product catalogs to improve sustainability information available for search and recommendation experiences. We present our proof of concept implementation of a scraping system that creates the GreenDB dataset.
    Reservoir Computing with Diverse Timescales for Prediction of Multiscale Dynamics. (arXiv:2108.09446v2 [cs.LG] UPDATED)
    Machine learning approaches have recently been leveraged as a substitute or an aid for physical/mathematical modeling approaches to dynamical systems. To develop an efficient machine learning method dedicated to modeling and prediction of multiscale dynamics, we propose a reservoir computing (RC) model with diverse timescales by using a recurrent network of heterogeneous leaky integrator (LI) neurons. We evaluate computational performance of the proposed model in two time series prediction tasks related to four chaotic fast-slow dynamical systems. In a one-step-ahead prediction task where input data are provided only from the fast subsystem, we show that the proposed model yields better performance than the standard RC model with identical LI neurons. Our analysis reveals that the timescale required for producing each component of target multiscale dynamics is appropriately and flexibly selected from the reservoir dynamics by model training. In a long-term prediction task, we demonstrate that a closed-loop version of the proposed model can achieve longer-term predictions compared to the counterpart with identical LI neurons depending on the hyperparameter setting.
    Recommendation as Language Processing (RLP): A Unified Pretrain, Personalized Prompt & Predict Paradigm (P5). (arXiv:2203.13366v4 [cs.IR] UPDATED)
    For a long time, different recommendation tasks typically require designing task-specific architectures and training objectives. As a result, it is hard to transfer the learned knowledge and representations from one task to another, thus restricting the generalization ability of existing recommendation approaches, e.g., a sequential recommendation model can hardly be applied or transferred to a review generation method. To deal with such issues, considering that language can describe almost anything and language grounding is a powerful medium to represent various problems or tasks, we present a flexible and unified text-to-text paradigm called "Pretrain, Personalized Prompt, and Predict Paradigm" (P5) for recommendation, which unifies various recommendation tasks in a shared framework. In P5, all data such as user-item interactions, user descriptions, item metadata, and user reviews are converted to a common format -- natural language sequences. The rich information from natural language assists P5 to capture deeper semantics for personalization and recommendation. Specifically, P5 learns different tasks with the same language modeling objective during pretraining. Thus, it serves as the foundation model for various downstream recommendation tasks, allows easy integration with other modalities, and enables instruction-based recommendation based on prompts. P5 advances recommender systems from shallow model to deep model to big model, and will revolutionize the technical form of recommender systems towards universal recommendation engine. With adaptive personalized prompt for different users, P5 is able to make predictions in a zero-shot or few-shot manner and largely reduces the necessity for extensive fine-tuning. On several recommendation benchmarks, we conduct experiments to show the effectiveness of P5. We release the source code at \url{https://github.com/jeykigung/P5}.
    Unsupervised Discovery of Inertial-Fusion Plasma Physics using Differentiable Kinetic Simulations and a Maximum Entropy Loss Function. (arXiv:2206.01637v2 [physics.plasm-ph] CROSS LISTED)
    Plasma supports collective modes and particle-wave interactions that leads to complex behavior in inertial fusion energy applications. While plasma can sometimes be modeled as a charged fluid, a kinetic description is useful towards the study of nonlinear effects in the higher dimensional momentum-position phase-space that describes the full complexity of plasma dynamics. We create a differentiable solver for the plasma kinetics 3D partial-differential-equation and introduce a domain-specific objective function. Using this framework, we perform gradient-based optimization of neural networks that provide forcing function parameters to the differentiable solver given a set of initial conditions. We apply this to an inertial-fusion relevant configuration and find that the optimization process exploits a novel physical effect that has previously remained undiscovered.
    Port-Hamiltonian Neural Networks with State-Dependent Ports. (arXiv:2206.02660v2 [cs.LG] UPDATED)
    Hybrid machine learning based on Hamiltonian formulations has recently been successfully demonstrated for simple mechanical systems. In this work, we stress-test the method on both simple mass-spring systems and more complex and realistic systems with several internal and external ports, including a system with multiple connected tanks. We quantify performance under various conditions and show that imposing different assumptions greatly affects the performance, highlighting advantages and limitations of the method. We demonstrate that port-Hamiltonian neural networks can be extended to higher dimensions with state-dependent ports. We consider learning on systems with known and unknown external ports. The port-Hamiltonian formulation allows for detecting deviations and still provide a valid model when the deviations are removed. Finally, we propose a symmetric high-order integration scheme for improved training on sparse and noisy data.
    Conformal Prediction: a Unified Review of Theory and New Challenges. (arXiv:2005.07972v2 [cs.LG] UPDATED)
    In this work we provide a review of basic ideas and novel developments about Conformal Prediction -- an innovative distribution-free, non-parametric forecasting method, based on minimal assumptions -- that is able to yield in a very straightforward way predictions sets that are valid in a statistical sense also in in the finite sample case. The in-depth discussion provided in the paper covers the theoretical underpinnings of Conformal Prediction, and then proceeds to list the more advanced developments and adaptations of the original idea.
    Cloud-Edge Training Architecture for Sim-to-Real Deep Reinforcement Learning. (arXiv:2203.02230v2 [cs.LG] UPDATED)
    Deep reinforcement learning (DRL) is a promising approach to solve complex control tasks by learning policies through interactions with the environment. However, the training of DRL policies requires large amounts of training experiences, making it impractical to learn the policy directly on physical systems. Sim-to-real approaches leverage simulations to pretrain DRL policies and then deploy them in the real world. Unfortunately, the direct real-world deployment of pretrained policies usually suffers from performance deterioration due to the different dynamics, known as the reality gap. Recent sim-to-real methods, such as domain randomization and domain adaptation, focus on improving the robustness of the pretrained agents. Nevertheless, the simulation-trained policies often need to be tuned with real-world data to reach optimal performance, which is challenging due to the high cost of real-world samples. This work proposes a distributed cloud-edge architecture to train DRL agents in the real world in real-time. In the architecture, the inference and training are assigned to the edge and cloud, separating the real-time control loop from the computationally expensive training loop. To overcome the reality gap, our architecture exploits sim-to-real transfer strategies to continue the training of simulation-pretrained agents on a physical system. We demonstrate its applicability on a physical inverted-pendulum control system, analyzing critical parameters. The real-world experiments show that our architecture can adapt the pretrained DRL agents to unseen dynamics consistently and efficiently.
    A Learned Index for Exact Similarity Search in Metric Spaces. (arXiv:2204.10028v2 [cs.DB] UPDATED)
    Indexing is an effective way to support efficient query processing in large databases. Recently the concept of learned index, which replaces or complements traditional index structures with machine learning models, has been actively explored to reduce storage and search costs. However, accurate and efficient similarity query processing in high-dimensional metric spaces remains to be an open challenge. In this paper, we propose a novel indexing approach called LIMS that uses data clustering, pivot-based data transformation techniques and learned indexes to support efficient similarity query processing in metric spaces. In LIMS, the underlying data is partitioned into clusters such that each cluster follows a relatively uniform data distribution. Data redistribution is achieved by utilizing a small number of pivots for each cluster. Similar data are mapped into compact regions and the mapped values are totally ordinal. Machine learning models are developed to approximate the position of each data record on disk. Efficient algorithms are designed for processing range queries and nearest neighbor queries based on LIMS, and for index maintenance with dynamic updates. Extensive experiments on real-world and synthetic datasets demonstrate the superiority of LIMS compared with traditional indexes and state-of-the-art learned indexes.
    Consistent and fast inference in compartmental models of epidemics using Poisson Approximate Likelihoods. (arXiv:2205.13602v2 [stat.ME] UPDATED)
    Addressing the challenge of scaling-up epidemiological inference to complex and heterogeneous models, we introduce Poisson Approximate Likelihood (PAL) methods. PALs are derived from approximate filtering equations for finite-population, stochastic compartmental models, and the large population limit drives the consistency of maximum PAL estimators. Our theoretical results appear to be the first likelihood-based parameter estimation consistency results applicable across a broad class of partially observed stochastic compartmental models concerning the large population limit. Compared to simulation-based methods such as Approximate Bayesian Computation and Sequential Monte Carlo, PALs are simple to implement, involving only elementary arithmetic operations and no tuning parameters; and fast to evaluate, requiring no simulation from the model and having computational cost independent of population size. Through examples, we demonstrate how PALs can be: embedded within Delayed Acceptance Particle Markov Chain Monte Carlo to facilitate Bayesian inference; used to fit an age-structured model of influenza, taking advantage of automatic differentiation in Stan; and applied to calibrate a spatial meta-population model of measles.
    Cross-Subject Domain Adaptation for Classifying Working Memory Load with Multi-Frame EEG Images. (arXiv:2106.06769v2 [cs.LG] UPDATED)
    Working memory (WM), denoting the information temporally stored in the mind, is a fundamental research topic in the field of human cognition. Electroencephalograph (EEG), which can monitor the electrical activity of the brain, has been widely used in measuring the level of WM. However, one of the critical challenges is that individual differences may cause ineffective results, especially when the established model meets an unfamiliar subject. In this work, we propose a cross-subject deep adaptation model with spatial attention (CS-DASA) to generalize the workload classifications across subjects. First, we transform EEG time series into multi-frame EEG images incorporating spatial, spectral, and temporal information. First, the Subject-Shared module in CS-DASA receives multi-frame EEG image data from both source and target subjects and learns the common feature representations. Then, in the subject-specific module, the maximum mean discrepancy is implemented to measure the domain distribution divergence in a reproducing kernel Hilbert space, which can add an effective penalty loss for domain adaptation. Additionally, the subject-to-subject spatial attention mechanism is employed to focus on the discriminative spatial features from the target image data. Experiments conducted on a public WM EEG dataset containing 13 subjects show that the proposed model is capable of achieving better performance than existing state-of-the-art methods.
    CryoAI: Amortized Inference of Poses for Ab Initio Reconstruction of 3D Molecular Volumes from Real Cryo-EM Images. (arXiv:2203.08138v3 [cs.CV] UPDATED)
    Cryo-electron microscopy (cryo-EM) has become a tool of fundamental importance in structural biology, helping us understand the basic building blocks of life. The algorithmic challenge of cryo-EM is to jointly estimate the unknown 3D poses and the 3D electron scattering potential of a biomolecule from millions of extremely noisy 2D images. Existing reconstruction algorithms, however, cannot easily keep pace with the rapidly growing size of cryo-EM datasets due to their high computational and memory cost. We introduce cryoAI, an ab initio reconstruction algorithm for homogeneous conformations that uses direct gradient-based optimization of particle poses and the electron scattering potential from single-particle cryo-EM data. CryoAI combines a learned encoder that predicts the poses of each particle image with a physics-based decoder to aggregate each particle image into an implicit representation of the scattering potential volume. This volume is stored in the Fourier domain for computational efficiency and leverages a modern coordinate network architecture for memory efficiency. Combined with a symmetrized loss function, this framework achieves results of a quality on par with state-of-the-art cryo-EM solvers for both simulated and experimental data, one order of magnitude faster for large datasets and with significantly lower memory requirements than existing methods.
    Latent Properties of Lifelong Learning Systems. (arXiv:2207.14378v1 [cs.LG])
    Creating artificial intelligence (AI) systems capable of demonstrating lifelong learning is a fundamental challenge, and many approaches and metrics have been proposed to analyze algorithmic properties. However, for existing lifelong learning metrics, algorithmic contributions are confounded by task and scenario structure. To mitigate this issue, we introduce an algorithm-agnostic explainable surrogate-modeling approach to estimate latent properties of lifelong learning algorithms. We validate the approach for estimating these properties via experiments on synthetic data. To validate the structure of the surrogate model, we analyze real performance data from a collection of popular lifelong learning approaches and baselines adapted for lifelong classification and lifelong reinforcement learning.
    Learning Disentangled Representations in the Imaging Domain. (arXiv:2108.12043v6 [cs.CV] UPDATED)
    Disentangled representation learning has been proposed as an approach to learning general representations even in the absence of, or with limited, supervision. A good general representation can be fine-tuned for new target tasks using modest amounts of data, or used directly in unseen domains achieving remarkable performance in the corresponding task. This alleviation of the data and annotation requirements offers tantalising prospects for applications in computer vision and healthcare. In this tutorial paper, we motivate the need for disentangled representations, revisit key concepts, and describe practical building blocks and criteria for learning such representations. We survey applications in medical imaging emphasising choices made in exemplar key works, and then discuss links to computer vision applications. We conclude by presenting limitations, challenges, and opportunities.
    The network signature of constellation line figures. (arXiv:2110.12329v3 [cs.SI] UPDATED)
    In traditional astronomies across the world, groups of stars in the night sky were linked into constellations -- symbolic representations rich in meaning and with practical roles. In some sky cultures, constellations are represented as line (or connect-the-dot) figures, which are spatial networks drawn over the fixed background of stars. We analyse 1802 line figures from 56 sky cultures spanning all continents, in terms of their network, spatial, and brightness features, and ask what associations exist between these visual features and culture type or sky region. First, an embedded map of constellations is learnt, to show clusters of line figures. We then form the network of constellations (as linked by their similarity), to study how similar cultures are by computing their assortativity (or homophily) over the network. Finally, we measure the diversity (or entropy) index for the set of constellations drawn per sky region. Our results show distinct types of line figures, and that many folk astronomies with oral traditions have widespread similarities in constellation design, which do not align with cultural ancestry. In a minority of sky regions, certain line designs appear universal, but this is not the norm: in the majority of sky regions, the line geometries are diverse.
    Learning Coulomb Diamonds in Large Quantum Dot Arrays. (arXiv:2205.01443v2 [cond-mat.mes-hall] UPDATED)
    We introduce an algorithm that is able to find the facets of Coulomb diamonds in quantum dot arrays. We simulate these arrays using the constant-interaction model, and rely only on one-dimensional raster scans (rays) to learn a model of the device using regularized maximum likelihood estimation. This allows us to determine, for a given charge state of the device, which transitions exist and what the compensated gate voltages for these are. For smaller devices the simulator can also be used to compute the exact boundaries of the Coulomb diamonds, which we use to assess that our algorithm correctly finds the vast majority of transitions with high precision.
    Multi-channel neural networks for predicting influenza A virus hosts and antigenic types. (arXiv:2206.03823v3 [q-bio.QM] UPDATED)
    Influenza occurs every season and occasionally causes pandemics. Despite its low mortality rate, influenza is a major public health concern, as it can be complicated by severe diseases like pneumonia. A fast, accurate and low-cost method to predict the origin host and subtype of influenza viruses could help reduce virus transmission and benefit resource-poor areas. In this work, we propose multi-channel neural networks to predict antigenic types and hosts of influenza A viruses with hemagglutinin and neuraminidase protein sequences. An integrated data set containing complete protein sequences were used to produce a pre-trained model, and two other data sets were used for testing the model's performance. One test set contained complete protein sequences, and another test set contained incomplete protein sequences. The results suggest that multi-channel neural networks are applicable and promising for predicting influenza A virus hosts and antigenic subtypes with complete and partial protein sequences.
    Domain Generalization: A Survey. (arXiv:2103.02503v6 [cs.LG] UPDATED)
    Generalization to out-of-distribution (OOD) data is a capability natural to humans yet challenging for machines to reproduce. This is because most learning algorithms strongly rely on the i.i.d.~assumption on source/target data, which is often violated in practice due to domain shift. Domain generalization (DG) aims to achieve OOD generalization by using only source data for model learning. Over the last ten years, research in DG has made great progress, leading to a broad spectrum of methodologies, e.g., those based on domain alignment, meta-learning, data augmentation, or ensemble learning, to name a few; DG has also been studied in various application areas including computer vision, speech recognition, natural language processing, medical imaging, and reinforcement learning. In this paper, for the first time a comprehensive literature review in DG is provided to summarize the developments over the past decade. Specifically, we first cover the background by formally defining DG and relating it to other relevant fields like domain adaptation and transfer learning. Then, we conduct a thorough review into existing methods and theories. Finally, we conclude this survey with insights and discussions on future research directions.
    A Data-driven Latent Semantic Analysis for Automatic Text Summarization using LDA Topic Modelling. (arXiv:2207.14687v1 [cs.IR])
    With the advent and popularity of big data mining and huge text analysis in modern times, automated text summarization became prominent for extracting and retrieving important information from documents. This research investigates aspects of automatic text summarization from the perspectives of single and multiple documents. Summarization is a task of condensing huge text articles into short, summarized versions. The text is reduced in size for summarization purpose but preserving key vital information and retaining the meaning of the original document. This study presents the Latent Dirichlet Allocation (LDA) approach used to perform topic modelling from summarised medical science journal articles with topics related to genes and diseases. In this study, PyLDAvis web-based interactive visualization tool was used to visualise the selected topics. The visualisation provides an overarching view of the main topics while allowing and attributing deep meaning to the prevalence individual topic. This study presents a novel approach to summarization of single and multiple documents. The results suggest the terms ranked purely by considering their probability of the topic prevalence within the processed document using extractive summarization technique. PyLDAvis visualization describes the flexibility of exploring the terms of the topics' association to the fitted LDA model. The topic modelling result shows prevalence within topics 1 and 2. This association reveals that there is similarity between the terms in topic 1 and 2 in this study. The efficacy of the LDA and the extractive summarization methods were measured using Latent Semantic Analysis (LSA) and Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics to evaluate the reliability and validity of the model.
    StyleLight: HDR Panorama Generation for Lighting Estimation and Editing. (arXiv:2207.14811v1 [cs.CV])
    We present a new lighting estimation and editing framework to generate high-dynamic-range (HDR) indoor panorama lighting from a single limited field-of-view (LFOV) image captured by low-dynamic-range (LDR) cameras. Existing lighting estimation methods either directly regress lighting representation parameters or decompose this problem into LFOV-to-panorama and LDR-to-HDR lighting generation sub-tasks. However, due to the partial observation, the high-dynamic-range lighting, and the intrinsic ambiguity of a scene, lighting estimation remains a challenging task. To tackle this problem, we propose a coupled dual-StyleGAN panorama synthesis network (StyleLight) that integrates LDR and HDR panorama synthesis into a unified framework. The LDR and HDR panorama synthesis share a similar generator but have separate discriminators. During inference, given an LDR LFOV image, we propose a focal-masked GAN inversion method to find its latent code by the LDR panorama synthesis branch and then synthesize the HDR panorama by the HDR panorama synthesis branch. StyleLight takes LFOV-to-panorama and LDR-to-HDR lighting generation into a unified framework and thus greatly improves lighting estimation. Extensive experiments demonstrate that our framework achieves superior performance over state-of-the-art methods on indoor lighting estimation. Notably, StyleLight also enables intuitive lighting editing on indoor HDR panoramas, which is suitable for real-world applications. Code is available at https://style-light.github.io.
    Can We Mitigate Backdoor Attack Using Adversarial Detection Methods?. (arXiv:2006.14871v2 [cs.LG] UPDATED)
    Deep Neural Networks are well known to be vulnerable to adversarial attacks and backdoor attacks, where minor modifications on the input are able to mislead the models to give wrong results. Although defenses against adversarial attacks have been widely studied, investigation on mitigating backdoor attacks is still at an early stage. It is unknown whether there are any connections and common characteristics between the defenses against these two attacks. We conduct comprehensive studies on the connections between adversarial examples and backdoor examples of Deep Neural Networks to seek to answer the question: can we detect backdoor using adversarial detection methods. Our insights are based on the observation that both adversarial examples and backdoor examples have anomalies during the inference process, highly distinguishable from benign samples. As a result, we revise four existing adversarial defense methods for detecting backdoor examples. Extensive evaluations indicate that these approaches provide reliable protection against backdoor attacks, with a higher accuracy than detecting adversarial examples. These solutions also reveal the relations of adversarial examples, backdoor examples and normal samples in model sensitivity, activation space and feature space. This is able to enhance our understanding about the inherent features of these two attacks and the defense opportunities.
    SHAP for additively modeled features in a boosted trees model. (arXiv:2207.14490v1 [stat.ML])
    An important technique to explore a black-box machine learning (ML) model is called SHAP (SHapley Additive exPlanation). SHAP values decompose predictions into contributions of the features in a fair way. We will show that for a boosted trees model with some or all features being additively modeled, the SHAP dependence plot of such a feature corresponds to its partial dependence plot up to a vertical shift. We illustrate the result with XGBoost.
    Computational complexity reduction of deep neural networks. (arXiv:2207.14620v1 [cs.LG])
    Deep neural networks (DNN) have been widely used and play a major role in the field of computer vision and autonomous navigation. However, these DNNs are computationally complex and their deployment over resource-constrained platforms is difficult without additional optimizations and customization. In this manuscript, we describe an overview of DNN architecture and propose methods to reduce computational complexity in order to accelerate training and inference speeds to fit them on edge computing platforms with low computational resources.
    Blockchain-enabled Server-less Federated Learning. (arXiv:2112.07938v2 [cs.LG] UPDATED)
    Motivated by the heterogeneous nature of devices participating in large-scale Federated Learning (FL) optimization, we focus on an asynchronous server-less FL solution empowered by blockchain technology. In contrast to mostly adopted FL approaches, which assume synchronous operation, we advocate an asynchronous method whereby model aggregation is done as clients submit their local updates. The asynchronous setting fits well with the federated optimization idea in practical large-scale settings with heterogeneous clients. Thus, it potentially leads to higher efficiency in terms of communication overhead and idle periods. To evaluate the learning completion delay of BC-enabled FL, we provide an analytical model based on batch service queue theory. Furthermore, we provide simulation results to assess the performance of both synchronous and asynchronous mechanisms. Important aspects involved in the BC-enabled FL optimization, such as the network size, link capacity, or user requirements, are put together and analyzed. As our results show, the synchronous setting leads to higher prediction accuracy than the asynchronous case. Nevertheless, asynchronous federated optimization provides much lower latency in many cases, thus becoming an appealing solution for FL when dealing with large datasets, tough timing constraints (e.g., near-real-time applications), or highly varying training data.
    Archaeology of random recursive dags and Cooper-Frieze random networks. (arXiv:2207.14601v1 [math.PR])
    We study the problem of finding the root vertex in large growing networks. We prove that it is possible to construct confidence sets of size independent of the number of vertices in the network that contain the root vertex with high probability in various models of random networks. The models include uniform random recursive dags and uniform Cooper-Frieze random graphs.
    Recursive Importance Sketching for Rank Constrained Least Squares: Algorithms and High-order Convergence. (arXiv:2011.08360v3 [math.OC] UPDATED)
    In this paper, we propose {\it \underline{R}ecursive} {\it \underline{I}mportance} {\it \underline{S}ketching} algorithm for {\it \underline{R}ank} constrained least squares {\it \underline{O}ptimization} (RISRO). The key step of RISRO is recursive importance sketching, a new sketching framework based on deterministically designed recursive projections, which significantly differs from the randomized sketching in the literature \citep{mahoney2011randomized,woodruff2014sketching}. Several existing algorithms in the literature can be reinterpreted under this new sketching framework and RISRO offers clear advantages over them. RISRO is easy to implement and computationally efficient, where the core procedure in each iteration is to solve a dimension-reduced least squares problem. We establish the local quadratic-linear and quadratic rate of convergence for RISRO under some mild conditions. We also discover a deep connection of RISRO to the Riemannian Gauss-Newton algorithm on fixed rank matrices. The effectiveness of RISRO is demonstrated in two applications in machine learning and statistics: low-rank matrix trace regression and phase retrieval. Simulation studies demonstrate the superior numerical performance of RISRO.
    Training a universal instance segmentation network for live cell images of various cell types and imaging modalities. (arXiv:2207.14347v1 [cs.CV])
    We share our recent findings in an attempt to train a universal segmentation network for various cell types and imaging modalities. Our method was built on the generalized U-Net architecture, which allows the evaluation of each component individually. We modified the traditional binary training targets to include three classes for direct instance segmentation. Detailed experiments were performed regarding training schemes, training settings, network backbones, and individual modules on the segmentation performance. Our proposed training scheme draws minibatches in turn from each dataset, and the gradients are accumulated before an optimization step. We found that the key to training a universal network is all-time supervision on all datasets, and it is necessary to sample each dataset in an unbiased way. Our experiments also suggest that there might exist common features to define cell boundaries across cell types and imaging modalities, which could allow application of trained models to totally unseen datasets. A few training tricks can further boost the segmentation performance, including uneven class weights in the cross-entropy loss function, well-designed learning rate scheduler, larger image crops for contextual information, and additional loss terms for unbalanced classes. We also found that segmentation performance can benefit from group normalization layer and Atrous Spatial Pyramid Pooling module, thanks to their more reliable statistics estimation and improved semantic understanding, respectively. We participated in the 6th Cell Tracking Challenge (CTC) held at IEEE International Symposium on Biomedical Imaging (ISBI) 2021 using one of the developed variants. Our method was evaluated as the best runner up during the initial submission for the primary track, and also secured the 3rd place in an additional round of competition in preparation for the summary publication.
    Lower bounds for learning quantum states with single-copy measurements. (arXiv:2207.14438v1 [quant-ph])
    We study the problems of quantum tomography and shadow tomography using measurements performed on individual, identical copies of an unknown $d$-dimensional state. We first revisit a known lower bound due to Haah et al. (2017) on quantum tomography with accuracy $\epsilon$ in trace distance, when the measurements choices are independent of previously observed outcomes (i.e., they are nonadaptive). We give a succinct proof of this result. This leads to stronger lower bounds when the learner uses measurements with a constant number of outcomes. In particular, this rigorously establishes the optimality of the folklore ``Pauli tomography" algorithm in terms of its sample complexity. We also derive novel bounds of $\Omega(r^2 d/\epsilon^2)$ and $\Omega(r^2 d^2/\epsilon^2)$ for learning rank $r$ states using arbitrary and constant-outcome measurements, respectively, in the nonadaptive case. In addition to the sample complexity, a resource of practical significance for learning quantum states is the number of different measurements used by an algorithm. We extend our lower bounds to the case where the learner performs possibly adaptive measurements from a fixed set of $\exp(O(d))$ measurements. This implies in particular that adaptivity does not give us any advantage using single-copy measurements that are efficiently implementable. We also obtain a similar bound in the case where the goal is to predict the expectation values of a given sequence of observables, a task known as shadow tomography. Finally, in the case of adaptive, single-copy measurements implementable with polynomial-size circuits, we prove that a straightforward strategy based on computing sample means of the given observables is optimal.
    Deep Reinforcement Learning for System-on-Chip: Myths and Realities. (arXiv:2207.14595v1 [cs.LG])
    Neural schedulers based on deep reinforcement learning (DRL) have shown considerable potential for solving real-world resource allocation problems, as they have demonstrated significant performance gain in the domain of cluster computing. In this paper, we investigate the feasibility of neural schedulers for the domain of System-on-Chip (SoC) resource allocation through extensive experiments and comparison with non-neural, heuristic schedulers. The key finding is three-fold. First, neural schedulers designed for cluster computing domain do not work well for SoC due to i) heterogeneity of SoC computing resources and ii) variable action set caused by randomness in incoming jobs. Second, our novel neural scheduler technique, Eclectic Interaction Matching (EIM), overcomes the above challenges, thus significantly improving the existing neural schedulers. Specifically, we rationalize the underlying reasons behind the performance gain by the EIM-based neural scheduler. Third, we discover that the ratio of the average processing elements (PE) switching delay and the average PE computation time significantly impacts the performance of neural SoC schedulers even with EIM. Consequently, future neural SoC scheduler design must consider this metric as well as its implementation overhead for practical utility.
    Beyond CNNs: Exploiting Further Inherent Symmetries in Medical Image Segmentation. (arXiv:2207.14472v1 [eess.IV])
    Automatic tumor or lesion segmentation is a crucial step in medical image analysis for computer-aided diagnosis. Although the existing methods based on Convolutional Neural Networks (CNNs) have achieved the state-of-the-art performance, many challenges still remain in medical tumor segmentation. This is because, although the human visual system can detect symmetries in 2D images effectively, regular CNNs can only exploit translation invariance, overlooking further inherent symmetries existing in medical images such as rotations and reflections. To solve this problem, we propose a novel group equivariant segmentation framework by encoding those inherent symmetries for learning more precise representations. First, kernel-based equivariant operations are devised on each orientation, which allows it to effectively address the gaps of learning symmetries in existing approaches. Then, to keep segmentation networks globally equivariant, we design distinctive group layers with layer-wise symmetry constraints. Finally, based on our novel framework, extensive experiments conducted on real-world clinical data demonstrate that a Group Equivariant Res-UNet (named GER-UNet) outperforms its regular CNN-based counterpart and the state-of-the-art segmentation methods in the tasks of hepatic tumor segmentation, COVID-19 lung infection segmentation and retinal vessel detection. More importantly, the newly built GER-UNet also shows potential in reducing the sample complexity and the redundancy of filters, upgrading current segmentation CNNs and delineating organs on other medical imaging modalities.
    Big Data and Analytics Implementation in Tertiary Institutions to Predict Students Performance in Nigeria. (arXiv:2207.14677v1 [cs.CY])
    The term Big Data has been coined to refer to the gargantuan bulk of data that cannot be dealt with by traditional data-handling techniques. Big Data is still a novel concept, and in the following literature, we intend to elaborate on it in a palpable fashion. It commences with the concept of the subject in itself, along with its properties and the two general approaches to dealing with it. Big Data provides an opportunity for educational Institutions to use their Information Technology resources strategically to improve educational quality, guide students to higher completion rates and improve student persistence and outcomes. This paper explores the attributes of big data that are relevant to educational institutions, investigates the factors influencing the adoption of big data and analytics in learning institutions, and seeks to establish the limiting factors hindering the use of big data in Institutions of higher learning. A survey research design was adopted in conducting this research, and Questionnaires were the instrument employed for data collection.
    Learning Personalized Representations using Graph Convolutional Network. (arXiv:2207.14298v1 [cs.LG])
    Generating representations that precisely reflect customers' behavior is an important task for providing personalized skill routing experience in Alexa. Currently, Dynamic Routing (DR) team, which is responsible for routing Alexa traffic to providers or skills, relies on two features to be served as personal signals: absolute traffic count and normalized traffic count of every skill usage per customer. Neither of them considers the network based structure for interactions between customers and skills, which contain richer information for customer preferences. In this work, we first build a heterogeneous edge attributed graph based customers' past interactions with the invoked skills, in which the user requests (utterances) are modeled as edges. Then we propose a graph convolutional network(GCN) based model, namely Personalized Dynamic Routing Feature Encoder(PDRFE), that generates personalized customer representations learned from the built graph. Compared with existing models, PDRFE is able to further capture contextual information in the graph convolutional function. The performance of our proposed model is evaluated by a downstream task, defect prediction, that predicts the defect label from the learned embeddings of customers and their triggered skills. We observe up to 41% improvements on the cross entropy metric for our proposed models compared to the baselines.
    Sample-efficient Safe Learning for Online Nonlinear Control with Control Barrier Functions. (arXiv:2207.14419v1 [cs.RO])
    Reinforcement Learning (RL) and continuous nonlinear control have been successfully deployed in multiple domains of complicated sequential decision-making tasks. However, given the exploration nature of the learning process and the presence of model uncertainty, it is challenging to apply them to safety-critical control tasks due to the lack of safety guarantee. On the other hand, while combining control-theoretical approaches with learning algorithms has shown promise in safe RL applications, the sample efficiency of safe data collection process for control is not well addressed. In this paper, we propose a \emph{provably} sample efficient episodic safe learning framework for online control tasks that leverages safe exploration and exploitation in an unknown, nonlinear dynamical system. In particular, the framework 1) extends control barrier functions (CBFs) in a stochastic setting to achieve provable high-probability safety under uncertainty during model learning and 2) integrates an optimism-based exploration strategy to efficiently guide the safe exploration process with learned dynamics for \emph{near optimal} control performance. We provide formal analysis on the episodic regret bound against the optimal controller and probabilistic safety with theoretical guarantees. Simulation results are provided to demonstrate the effectiveness and efficiency of the proposed algorithm.
    Effectiveness of Transformer Models on IoT Security Detection in StackOverflow Discussions. (arXiv:2207.14542v1 [cs.CR])
    The Internet of Things (IoT) is an emerging concept that directly links to the billions of physical items, or "things", that are connected to the Internet and are all gathering and exchanging information between devices and systems. However, IoT devices were not built with security in mind, which might lead to security vulnerabilities in a multi-device system. Traditionally, we investigated IoT issues by polling IoT developers and specialists. This technique, however, is not scalable since surveying all IoT developers is not feasible. Another way to look into IoT issues is to look at IoT developer discussions on major online development forums like Stack Overflow (SO). However, finding discussions that are relevant to IoT issues is challenging since they are frequently not categorized with IoT-related terms. In this paper, we present the "IoT Security Dataset", a domain-specific dataset of 7147 samples focused solely on IoT security discussions. As there are no automated tools to label these samples, we manually labeled them. We further employed multiple transformer models to automatically detect security discussions. Through rigorous investigations, we found that IoT security discussions are different and more complex than traditional security discussions. We demonstrated a considerable performance loss (up to 44%) of transformer models on cross-domain datasets when we transferred knowledge from a general-purpose dataset "Opiner", supporting our claim. Thus, we built a domain-specific IoT security detector with an F1-Score of 0.69. We have made the dataset public in the hope that developers would learn more about the security discussion and vendors would enhance their concerns about product security.
    Building Trust: Lessons from the Technion-Rambam Machine Learning in Healthcare Datathon Event. (arXiv:2207.14638v1 [cs.DB])
    A datathon is a time-constrained competition involving data science applied to a specific problem. In the past decade, datathons have been shown to be a valuable bridge between fields and expertise . Biomedical data analysis represents a challenging area requiring collaboration between engineers, biologists and physicians to gain a better understanding of patient physiology and of guide decision processes for diagnosis, prognosis and therapeutic interventions to improve care practice. Here, we reflect on the outcomes of an event that we organized in Israel at the end of March 2022 between the MIT Critical Data group, Rambam Health Care Campus (Rambam) and the Technion Israel Institute of Technology (Technion) in Haifa. Participants were asked to complete a survey about their skills and interests, which enabled us to identify current needs in machine learning training for medical problem applications. This work describes opportunities and limitations in medical data science in the Israeli context.
    Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning. (arXiv:2207.14800v1 [cs.LG])
    In view of its power in extracting feature representation, contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL), leading to efficient policy learning in various applications. Despite its tremendous empirical successes, the understanding of contrastive learning for RL remains elusive. To narrow such a gap, we study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions. For both models, we propose to extract the correct feature representations of the low-rank model by minimizing a contrastive loss. Moreover, under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs. We further theoretically prove that our algorithm recovers the true representations and simultaneously achieves sample efficiency in learning the optimal policy and Nash equilibrium in MDPs and MGs. We also provide empirical studies to demonstrate the efficacy of the UCB-based contrastive learning method for RL. To the best of our knowledge, we provide the first provably efficient online RL algorithm that incorporates contrastive learning for representation learning. Our codes are available at https://github.com/Baichenjia/Contrastive-UCB.
    Factorizable Joint Shift in Multinomial Classification. (arXiv:2207.14514v1 [stat.ML])
    Factorizable joint shift was recently proposed as a type of dataset shift for which the characteristics can be estimated from observed data. For the multinomial (multi-class) classification setting, we derive a representation of factorizable joint shift in terms of the source (training) distribution, the target (test) prior class probabilities and the target marginal distribution of the features. On the basis of this result, we propose alternatives to joint importance aligning, at the same time pointing out the limitations encountered when making an assumption of factorizable joint shift. Other results of the paper include correction formulae for the posterior class probabilities both under general dataset shift and factorizable joint shift. In addition, we investigate the consequences of assuming factorizable joint shift for the bias caused by sample selection.
    Quantum Data Center: Theories and Applications. (arXiv:2207.14336v1 [quant-ph])
    In this paper, we propose the Quantum Data Center (QDC), an architecture combining Quantum Random Access Memory (QRAM) and quantum networks. We give a precise definition of QDC, and discuss its possible realizations and extensions. We discuss applications of QDC in quantum computation, quantum communication, and quantum sensing, with a primary focus on QDC for $T$-gate resources, QDC for multi-party private quantum communication, and QDC for distributed sensing through data compression. We show that QDC will provide efficient, private, and fast services as a future version of data centers.
    Active Distribution System Coordinated Control Method via Artificial Intelligence. (arXiv:2207.14642v1 [eess.SY])
    The increasing deployment of end use power resources in distribution systems created active distribution systems. Uncontrolled active distribution systems exhibit wide variations of voltage and loading throughout the day as some of these resources operate under max power tracking control of highly variable wind and solar irradiation while others exhibit random variations and/or dependency on weather conditions. It is necessary to control the system to provide power reliably and securely under normal voltages and frequency. Classical optimization approaches to control the system towards this goal suffer from the dimensionality of the problem and the need for a global optimization approach to coordinate a huge number of small resources. Artificial Intelligence (AI) methods offer an alternative that can provide a practical approach to this problem. We suggest that neural networks with self-attention mechanisms have the potential to aid in the optimization of the system. In this paper, we present this approach and provide promising preliminary results.
    GTrans: Grouping and Fusing Transformer Layers for Neural Machine Translation. (arXiv:2207.14467v1 [cs.CL])
    Transformer structure, stacked by a sequence of encoder and decoder network layers, achieves significant development in neural machine translation. However, vanilla Transformer mainly exploits the top-layer representation, assuming the lower layers provide trivial or redundant information and thus ignoring the bottom-layer feature that is potentially valuable. In this work, we propose the Group-Transformer model (GTrans) that flexibly divides multi-layer representations of both encoder and decoder into different groups and then fuses these group features to generate target words. To corroborate the effectiveness of the proposed method, extensive experiments and analytic experiments are conducted on three bilingual translation benchmarks and two multilingual translation tasks, including the IWLST-14, IWLST-17, LDC, WMT-14 and OPUS-100 benchmark. Experimental and analytical results demonstrate that our model outperforms its Transformer counterparts by a consistent gain. Furthermore, it can be successfully scaled up to 60 encoder layers and 36 decoder layers.
    Subtype-Former: a deep learning approach for cancer subtype discovery with multi-omics data. (arXiv:2207.14639v1 [cs.LG])
    Motivation: Cancer is heterogeneous, affecting the precise approach to personalized treatment. Accurate subtyping can lead to better survival rates for cancer patients. High-throughput technologies provide multiple omics data for cancer subtyping. However, precise cancer subtyping remains challenging due to the large amount and high dimensionality of omics data. Results: This study proposed Subtype-Former, a deep learning method based on MLP and Transformer Block, to extract the low-dimensional representation of the multi-omics data. K-means and Consensus Clustering are also used to achieve accurate subtyping results. We compared Subtype-Former with the other state-of-the-art subtyping methods across the TCGA 10 cancer types. We found that Subtype-Former can perform better on the benchmark datasets of more than 5000 tumors based on the survival analysis. In addition, Subtype-Former also achieved outstanding results in pan-cancer subtyping, which can help analyze the commonalities and differences across various cancer types at the molecular level. Finally, we applied Subtype-Former to the TCGA 10 types of cancers. We identified 50 essential biomarkers, which can be used to study targeted cancer drugs and promote the development of cancer treatments in the era of precision medicine.
    Deep learning for understanding multilabel imbalanced Chest X-ray datasets. (arXiv:2207.14408v1 [eess.IV])
    Over the last few years, convolutional neural networks (CNNs) have dominated the field of computer vision thanks to their ability to extract features and their outstanding performance in classification problems, for example in the automatic analysis of X-rays. Unfortunately, these neural networks are considered black-box algorithms, i.e. it is impossible to understand how the algorithm has achieved the final result. To apply these algorithms in different fields and test how the methodology works, we need to use eXplainable AI techniques. Most of the work in the medical field focuses on binary or multiclass classification problems. However, in many real-life situations, such as chest X-rays, radiological signs of different diseases can appear at the same time. This gives rise to what is known as "multilabel classification problems". A disadvantage of these tasks is class imbalance, i.e. different labels do not have the same number of samples. The main contribution of this paper is a Deep Learning methodology for imbalanced, multilabel chest X-ray datasets. It establishes a baseline for the currently underutilised PadChest dataset and a new eXplainable AI technique based on heatmaps. This technique also includes probabilities and inter-model matching. The results of our system are promising, especially considering the number of labels used. Furthermore, the heatmaps match the expected areas, i.e. they mark the areas that an expert would use to make the decision.
    Replacing the Framingham-based equation for prediction of cardiovascular disease risk and adverse outcome by using artificial intelligence and retinal imaging. (arXiv:2207.14685v1 [eess.IV])
    Purpose: To create and evaluate the accuracy of an artificial intelligence Deep learning platform (ORAiCLE) capable of using only retinal fundus images to predict both an individuals overall 5 year cardiovascular risk (CVD) and the relative contribution of the component risk factors that comprise this risk. Methods: We used 165,907 retinal images from a database of 47,236 patient visits. Initially, each image was paired with biometric data age, ethnicity, sex, presence and duration of diabetes a HDL/LDL ratios as well as any CVD event wtihin 5 years of the retinal image acquisition. A risk score based on Framingham equations was calculated. The real CVD event rate was also determined for the individuals and overall population. Finally, ORAiCLE was trained using only age, ethnicity, sex plus retinal images. Results: Compared to Framingham-based score, ORAiCLE was up to 12% more accurate in prediciting cardiovascular event in he next 5-years, especially for the highest risk group of people. The reliability and accuracy of each of the restrictive models was suboptimal to ORAiCLE performance ,indicating that it was using data from both sets of data to derive its final results. Conclusion: Retinal photography is inexpensive and only minimal training is required to acquire them as fully automated, inexpensive camera systems are now widely available. As such, AI-based CVD risk algorithms such as ORAiCLE promise to make CV health screening more accurate, more afforadable and more accessible for all. Furthermore, ORAiCLE unique ability to assess the relative contribution of the components that comprise an individuals overall risk would inform treatment decisions based on the specific needs of an individual, thereby increasing the likelihood of positive health outcomes.
    Graph Neural Networks for Channel Decoding. (arXiv:2207.14742v1 [cs.IT])
    In this work, we propose a fully differentiable graph neural network (GNN)-based architecture for channel decoding and showcase competitive decoding performance for various coding schemes, such as low-density parity-check (LDPC) and BCH codes. The idea is to let a neural network (NN) learn a generalized message passing algorithm over a given graph that represents the forward error correction (FEC) code structure by replacing node and edge message updates with trainable functions. Contrary to many other deep learning-based decoding approaches, the proposed solution enjoys scalability to arbitrary block lengths and the training is not limited by the curse of dimensionality. We benchmark our proposed decoder against state-of-the-art in conventional channel decoding as well as against recent deep learning-based results. For the (63,45) BCH code, our solution outperforms weighted belief propagation (BP) decoding by approximately 0.4 dB with significantly less decoding iterations and even for 5G NR LDPC codes, we observe a competitive performance when compared to conventional BP decoding. For the BCH codes, the resulting GNN decoder can be fully parametrized with only 9640 weights.
    A Deep Generative Approach to Oversampling in Ptychography. (arXiv:2207.14392v1 [eess.IV])
    Ptychography is a well-studied phase imaging method that makes non-invasive imaging possible at a nanometer scale. It has developed into a mainstream technique with various applications across a range of areas such as material science or the defense industry. One major drawback of ptychography is the long data acquisition time due to the high overlap requirement between adjacent illumination areas to achieve a reasonable reconstruction. Traditional approaches with reduced overlap between scanning areas result in reconstructions with artifacts. In this paper, we propose complementing sparsely acquired or undersampled data with data sampled from a deep generative network to satisfy the oversampling requirement in ptychography. Because the deep generative network is pre-trained and its output can be computed as we collect data, the experimental data and the time to acquire the data can be reduced. We validate the method by presenting the reconstruction quality compared to the previously proposed and traditional approaches and comment on the strengths and drawbacks of the proposed approach.
    Using Multi-modal Data for Improving Generalizability and Explainability of Disease Classification in Radiology. (arXiv:2207.14781v1 [cs.CV])
    Traditional datasets for the radiological diagnosis tend to only provide the radiology image alongside the radiology report. However, radiology reading as performed by radiologists is a complex process, and information such as the radiologist's eye-fixations over the course of the reading has the potential to be an invaluable data source to learn from. Nonetheless, the collection of such data is expensive and time-consuming. This leads to the question of whether such data is worth the investment to collect. This paper utilizes the recently published Eye-Gaze dataset to perform an exhaustive study on the impact on performance and explainability of deep learning (DL) classification in the face of varying levels of input features, namely: radiology images, radiology report text, and radiologist eye-gaze data. We find that the best classification performance of X-ray images is achieved with a combination of radiology report free-text and radiology image, with the eye-gaze data providing no performance boost. Nonetheless, eye-gaze data serving as secondary ground truth alongside the class label results in highly explainable models that generate better attention maps compared to models trained to do classification and attention map generation without eye-gaze data.
    Effects of Image Size on Deep Learning. (arXiv:2101.11508v4 [cs.CV] UPDATED)
    This paper presents the effects of late gadolinium enhancement (LGE) magnetic resonance imaging (MRI) image size on deep learning based fully automated quantification of myocardial infarction (MI). The main objective is to determine the best size for LGE MRI images in the training dataset to achieve optimal deep learning training outcomes. To determine the new size of LGE MRI images of the reference training dataset, non-extra pixel and extra pixel interpolation algorithms are used. A novel strategy based on thresholding, median filtering, and subtraction operations is introduced and applied to remove extra class labels in interpolated ground truth (GT) segmentation masks. Fully automated quantification is achieved using the expectation maximization, weighted intensity, a priori information (EWA) algorithm, and the outcome of automatic semantic segmentation of LGE-MRI images with the convolutional neural network (CNN). In the experiments, common class metrics are used to evaluate the quality of semantic segmentation with a CNN architecture of interest (U-net) against the GT segmentation. Arbitrary threshold, comparison of the sums, and sums of differences are used to estimate the relationship between semi-automatic and fully automated quantification of MI results. A close relationship between semi-automatic and fully automated quantification of MI results was more identified in the case involving the dataset of bigger LGE MRI images than in that of the dataset of smaller LGE MRI images, where quantification results based on the dataset of bigger LGE MRI images were 55.5% closer the manual or semi-automatic results while quantification results based on the dataset of smaller LGE MRI images were 22.2% closer the manual results
    A deep learning approach to data-driven model-free pricing and to martingale optimal transport. (arXiv:2103.11435v2 [q-fin.CP] UPDATED)
    We introduce a novel and highly tractable supervised learning approach based on neural networks that can be applied for the computation of model-free price bounds of, potentially high-dimensional, financial derivatives and for the determination of optimal hedging strategies attaining these bounds. In particular, our methodology allows to train a single neural network offline and then to use it online for the fast determination of model-free price bounds of a whole class of financial derivatives with current market data. We show the applicability of this approach and highlight its accuracy in several examples involving real market data. Further, we show how a neural network can be trained to solve martingale optimal transport problems involving fixed marginal distributions instead of financial market data.
    Semi-supervised Learning of Partial Differential Operators and Dynamical Flows. (arXiv:2207.14366v1 [cs.LG])
    The evolution of dynamical systems is generically governed by nonlinear partial differential equations (PDEs), whose solution, in a simulation framework, requires vast amounts of computational resources. In this work, we present a novel method that combines a hyper-network solver with a Fourier Neural Operator architecture. Our method treats time and space separately. As a result, it successfully propagates initial conditions in continuous time steps by employing the general composition properties of the partial differential operators. Following previous work, supervision is provided at a specific time point. We test our method on various time evolution PDEs, including nonlinear fluid flows in one, two, and three spatial dimensions. The results show that the new method improves the learning accuracy at the time point of supervision point, and is able to interpolate and the solutions to any intermediate time.
    Large Language Models and the Reverse Turing Test. (arXiv:2207.14382v1 [cs.CL])
    Large Language Models (LLMs) have been transformative. They are pre-trained foundational models that can be adapted with fine tuning to many different natural language tasks, each of which previously would have required a separate network model. This is one step closer to the extraordinary versatility of human language. GPT-3 and more recently LaMDA can carry on dialogs with humans on many topics after minimal priming with a few examples. However, there has been a wide range of reactions on whether these LLMs understand what they are saying or exhibit signs of intelligence. This high variance is exhibited in three interviews with LLMs reaching wildly different conclusions. A new possibility was uncovered that could explain this divergence. What appears to be intelligence in LLMs may in fact be a mirror that reflects the intelligence of the interviewer, a remarkable twist that could be considered a Reverse Turing Test. If so, then by studying interviews we may be learning more about the intelligence and beliefs of the interviewer than the intelligence of the LLMs.
    SYNTA: A novel approach for deep learning-based image analysis in muscle histopathology using photo-realistic synthetic data. (arXiv:2207.14650v1 [eess.IV])
    Artificial intelligence (AI), machine learning, and deep learning (DL) methods are becoming increasingly important in the field of biomedical image analysis. However, to exploit the full potential of such methods, a representative number of experimentally acquired images containing a significant number of manually annotated objects is needed as training data. Here we introduce SYNTA (synthetic data) as a novel approach for the generation of synthetic, photo-realistic, and highly complex biomedical images as training data for DL systems. We show the versatility of our approach in the context of muscle fiber and connective tissue analysis in histological sections. We demonstrate that it is possible to perform robust and expert-level segmentation tasks on previously unseen real-world data, without the need for manual annotations using synthetic training data alone. Being a fully parametric technique, our approach poses an interpretable and controllable alternative to Generative Adversarial Networks (GANs) and has the potential to significantly accelerate quantitative image analysis in a variety of biomedical applications in microscopy and beyond.
    "FIJO": a French Insurance Soft Skill Detection Dataset. (arXiv:2204.05208v2 [cs.CL] UPDATED)
    Understanding the evolution of job requirements is becoming more important for workers, companies and public organizations to follow the fast transformation of the employment market. Fortunately, recent natural language processing (NLP) approaches allow for the development of methods to automatically extract information from job ads and recognize skills more precisely. However, these efficient approaches need a large amount of annotated data from the studied domain which is difficult to access, mainly due to intellectual property. This article proposes a new public dataset, FIJO, containing insurance job offers, including many soft skill annotations. To understand the potential of this dataset, we detail some characteristics and some limitations. Then, we present the results of skill detection algorithms using a named entity recognition approach and show that transformers-based models have good token-wise performances on this dataset. Lastly, we analyze some errors made by our best model to emphasize the difficulties that may arise when applying NLP approaches.
    Quantifying Data Augmentation for LiDAR based 3D Object Detection. (arXiv:2004.01643v2 [cs.CV] UPDATED)
    In this work, we shed light on different data augmentation techniques commonly used in Light Detection and Ranging (LiDAR) based 3D Object Detection. For the bulk of our experiments, we utilize the well known PointPillars pipeline and the well established KITTI dataset. We investigate a variety of global and local augmentation techniques, where global augmentation techniques are applied to the entire point cloud of a scene and local augmentation techniques are only applied to points belonging to individual objects in the scene. Our findings show that both types of data augmentation can lead to performance increases, but it also turns out, that some augmentation techniques, such as individual object translation, for example, can be counterproductive and can hurt the overall performance. We show that these findings transfer and generalize well to other state of the art 3D Object Detection methods and the challenging STF dataset. On the KITTI dataset we can gain up to 1.5% and on the STF dataset up to 1.7% in 3D mAP on the moderate car class.
    Stochastic Parallelizable Eigengap Dilation for Large Graph Clustering. (arXiv:2207.14589v1 [stat.ML])
    Large graphs commonly appear in social networks, knowledge graphs, recommender systems, life sciences, and decision making problems. Summarizing large graphs by their high level properties is helpful in solving problems in these settings. In spectral clustering, we aim to identify clusters of nodes where most edges fall within clusters and only few edges fall between clusters. This task is important for many downstream applications and exploratory analysis. A core step of spectral clustering is performing an eigendecomposition of the corresponding graph Laplacian matrix (or equivalently, a singular value decomposition, SVD, of the incidence matrix). The convergence of iterative singular value decomposition approaches depends on the eigengaps of the spectrum of the given matrix, i.e., the difference between consecutive eigenvalues. For a graph Laplacian corresponding to a well-clustered graph, the eigenvalues will be non-negative but very small (much less than $1$) slowing convergence. This paper introduces a parallelizable approach to dilating the spectrum in order to accelerate SVD solvers and in turn, spectral clustering. This is accomplished via polynomial approximations to matrix operations that favorably transform the spectrum of a matrix without changing its eigenvectors. Experiments demonstrate that this approach significantly accelerates convergence, and we explain how this transformation can be parallelized and stochastically approximated to scale with available compute.
    Open World Learning Graph Convolution for Latency Estimation in Routing Networks. (arXiv:2207.14643v1 [cs.NI])
    Accurate routing network status estimation is a key component in Software Defined Networking. However, existing deep-learning-based methods for modeling network routing are not able to extrapolate towards unseen feature distributions. Nor are they able to handle scaled and drifted network attributes in test sets that include open-world inputs. To deal with these challenges, we propose a novel approach for modeling network routing, using Graph Neural Networks. Our method can also be used for network-latency estimation. Supported by a domain-knowledge-assisted graph formulation, our model shares a stable performance across different network sizes and configurations of routing networks, while at the same time being able to extrapolate towards unseen sizes, configurations, and user behavior. We show that our model outperforms most conventional deep-learning-based models, in terms of prediction accuracy, computational resources, inference speed, as well as ability to generalize towards open-world input.
    Encoder-Decoder Architecture for 3D Seismic Inversion. (arXiv:2207.14789v1 [physics.geo-ph])
    Inverting seismic data to build 3D geological structures is a challenging task due to the overwhelming amount of acquired seismic data, and the very-high computational load due to iterative numerical solutions of the wave equation, as required by industry-standard tools such as Full Waveform Inversion (FWI). For example, in an area with surface dimensions of 4.5km $\times$ 4.5km, hundreds of seismic shot-gather cubes are required for 3D model reconstruction, leading to Terabytes of recorded data. This paper presents a deep learning solution for the reconstruction of realistic 3D models in the presence of field noise recorded in seismic surveys. We implement and analyze a convolutional encoder-decoder architecture that efficiently processes the entire collection of hundreds of seismic shot-gather cubes. The proposed solution demonstrates that realistic 3D models can be reconstructed with a structural similarity index measure (SSIM) of 0.8554 (out of 1.0) in the presence of field noise at 10dB signal-to-noise ratio.
    Automated liver tissues delineation techniques: A systematic survey on machine learning current trends and future orientations. (arXiv:2103.06384v2 [eess.IV] UPDATED)
    Machine learning and computer vision techniques have grown rapidly in recent years due to their automation, suitability, and ability to generate astounding results. Hence, in this paper, we survey the key studies that are published between 2014 and 2022, showcasing the different machine learning algorithms researchers have used to segment the liver, hepatic tumors, and hepatic-vasculature structures. We divide the surveyed studies based on the tissue of interest (hepatic-parenchyma, hepatic-tumors, or hepatic-vessels), highlighting the studies that tackle more than one task simultaneously. Additionally, the machine learning algorithms are classified as either supervised or unsupervised, and they are further partitioned if the amount of work that falls under a certain scheme is significant. Moreover, different datasets and challenges found in literature and websites containing masks of the aforementioned tissues are thoroughly discussed, highlighting the organizers' original contributions and those of other researchers. Also, the metrics used excessively in literature are mentioned in our review, stressing their relevance to the task at hand. Finally, critical challenges and future directions are emphasized for innovative researchers to tackle, exposing gaps that need addressing, such as the scarcity of many studies on the vessels' segmentation challenge and why their absence needs to be dealt with sooner than later.
    Cyclic Policy Distillation: Sample-Efficient Sim-to-Real Reinforcement Learning with Domain Randomization. (arXiv:2207.14561v1 [cs.RO])
    Deep reinforcement learning with domain randomization learns a control policy in various simulations with randomized physical and sensor model parameters to become transferable to the real world in a zero-shot setting. However, a huge number of samples are often required to learn an effective policy when the range of randomized parameters is extensive due to the instability of policy updates. To alleviate this problem, we propose a sample-efficient method named Cyclic Policy Distillation (CPD). CPD divides the range of randomized parameters into several small sub-domains and assigns a local policy to each sub-domain. Then, the learning of local policies is performed while {\it cyclically} transitioning the target sub-domain to neighboring sub-domains and exploiting the learned values/policies of the neighbor sub-domains with a monotonic policy-improvement scheme. Finally, all of the learned local policies are distilled into a global policy for sim-to-real transfer. The effectiveness and sample efficiency of CPD are demonstrated through simulations with four tasks (Pendulum from OpenAIGym and Pusher, Swimmer, and HalfCheetah from Mujoco), and a real-robot ball-dispersal task.
    EmoSens: Emotion Recognition based on Sensor data analysis using LightGBM. (arXiv:2207.14640v1 [cs.HC])
    Smart wearables have played an integral part in our day to day life. From recording ECG signals to analysing body fat composition, the smart wearables can do it all. The smart devices encompass various sensors which can be employed to derive meaningful information regarding the user's physical and psychological conditions. Our approach focuses on employing such sensors to identify and obtain the variations in the mood of a user at a given instance through the use of supervised machine learning techniques. The study examines the performance of various supervised learning models such as Decision Trees, Random Forests, XGBoost, LightGBM on the dataset. With our proposed model, we obtained a high recognition rate of 92.5% using XGBoost and LightGBM for 9 different emotion classes. By utilizing this, we aim to improvise and suggest methods to aid emotion recognition for better mental health analysis and mood monitoring.
    BiFeat: Supercharge GNN Training via Graph Feature Quantization. (arXiv:2207.14696v1 [cs.LG])
    Graph Neural Networks (GNNs) is a promising approach for applications with nonEuclidean data. However, training GNNs on large scale graphs with hundreds of millions nodes is both resource and time consuming. Different from DNNs, GNNs usually have larger memory footprints, and thus the GPU memory capacity and PCIe bandwidth are the main resource bottlenecks in GNN training. To address this problem, we present BiFeat: a graph feature quantization methodology to accelerate GNN training by significantly reducing the memory footprint and PCIe bandwidth requirement so that GNNs can take full advantage of GPU computing capabilities. Our key insight is that unlike DNN, GNN is less prone to the information loss of input features caused by quantization. We identify the main accuracy impact factors in graph feature quantization and theoretically prove that BiFeat training converges to a network where the loss is within $\epsilon$ of the optimal loss of uncompressed network. We perform extensive evaluation of BiFeat using several popular GNN models and datasets, including GraphSAGE on MAG240M, the largest public graph dataset. The results demonstrate that BiFeat achieves a compression ratio of more than 30 and improves GNN training speed by 200%-320% with marginal accuracy loss. In particular, BiFeat achieves a record by training GraphSAGE on MAG240M within one hour using only four GPUs.
    Image sensing with multilayer, nonlinear optical neural networks. (arXiv:2207.14293v1 [physics.optics])
    Optical imaging is commonly used for both scientific and technological applications across industry and academia. In image sensing, a measurement, such as of an object's position, is performed by computational analysis of a digitized image. An emerging image-sensing paradigm breaks this delineation between data collection and analysis by designing optical components to perform not imaging, but encoding. By optically encoding images into a compressed, low-dimensional latent space suitable for efficient post-analysis, these image sensors can operate with fewer pixels and fewer photons, allowing higher-throughput, lower-latency operation. Optical neural networks (ONNs) offer a platform for processing data in the analog, optical domain. ONN-based sensors have however been limited to linear processing, but nonlinearity is a prerequisite for depth, and multilayer NNs significantly outperform shallow NNs on many tasks. Here, we realize a multilayer ONN pre-processor for image sensing, using a commercial image intensifier as a parallel optoelectronic, optical-to-optical nonlinear activation function. We demonstrate that the nonlinear ONN pre-processor can achieve compression ratios of up to 800:1 while still enabling high accuracy across several representative computer-vision tasks, including machine-vision benchmarks, flow-cytometry image classification, and identification of objects in real scenes. In all cases we find that the ONN's nonlinearity and depth allowed it to outperform a purely linear ONN encoder. Although our experiments are specialized to ONN sensors for incoherent-light images, alternative ONN platforms should facilitate a range of ONN sensors. These ONN sensors may surpass conventional sensors by pre-processing optical information in spatial, temporal, and/or spectral dimensions, potentially with coherent and quantum qualities, all natively in the optical domain.
    Multiple Attribute Fairness: Application to Fraud Detection. (arXiv:2207.14355v1 [cs.LG])
    We propose a fairness measure relaxing the equality conditions in the popular equal odds fairness regime for classification. We design an iterative, model-agnostic, grid-based heuristic that calibrates the outcomes per sensitive attribute value to conform to the measure. The heuristic is designed to handle high arity attribute values and performs a per attribute sanitization of outcomes across different protected attribute values. We also extend our heuristic for multiple attributes. Highlighting our motivating application, fraud detection, we show that the proposed heuristic is able to achieve fairness across multiple values of a single protected attribute, multiple protected attributes. When compared to current fairness techniques, that focus on two groups, we achieve comparable performance across several public data sets.
    Leveraging Explanations in Interactive Machine Learning: An Overview. (arXiv:2207.14526v1 [cs.LG])
    Explanations have gained an increasing level of interest in the AI and Machine Learning (ML) communities in order to improve model transparency and allow users to form a mental model of a trained ML model. However, explanations can go beyond this one way communication as a mechanism to elicit user control, because once users understand, they can then provide feedback. The goal of this paper is to present an overview of research where explanations are combined with interactive capabilities as a mean to learn new models from scratch and to edit and debug existing ones. To this end, we draw a conceptual map of the state-of-the-art, grouping relevant approaches based on their intended purpose and on how they structure the interaction, highlighting similarities and differences between them. We also discuss open research issues and outline possible directions forward, with the hope of spurring further research on this blooming research topic.
    Dive into Deep Learning. (arXiv:2106.11342v3 [cs.LG] UPDATED)
    This open-source book represents our attempt to make deep learning approachable, teaching readers the concepts, the context, and the code. The entire book is drafted in Jupyter notebooks, seamlessly integrating exposition figures, math, and interactive examples with self-contained code. Our goal is to offer a resource that could (i) be freely available for everyone; (ii) offer sufficient technical depth to provide a starting point on the path to actually becoming an applied machine learning scientist; (iii) include runnable code, showing readers how to solve problems in practice; (iv) allow for rapid updates, both by us and also by the community at large; (v) be complemented by a forum for interactive discussion of technical details and to answer questions.
    Open-radiomics: A Research Protocol to Make Radiomics-based Machine Learning Pipelines Reproducible. (arXiv:2207.14776v1 [q-bio.QM])
    The application of artificial intelligence (AI) techniques to medical imaging data has yielded promising results. As an important branch of AI pipelines in medical imaging, radiomics faces two major challenges namely reproducibility and accessibility. In this work, we introduce open-radiomics, a set of radiomics datasets, and a comprehensive radiomics pipeline that investigates the effects of radiomics feature extraction settings such as binWidth and image normalization on the reproducibility of the radiomics results performance. To make radiomics research more accessible and reproducible, we provide guidelines for building machine learning (ML) models on radiomics data, introduce Open-radiomics, an evolving collection of open-source radiomics datasets, and publish baseline models for the datasets.
    Robust Framework for COVID-19 Identification from a Multicenter Dataset of Chest CT Scans. (arXiv:2109.09241v3 [eess.IV] UPDATED)
    The objective of this study is to develop a robust deep learning-based framework to distinguish COVID-19, Community-Acquired Pneumonia (CAP), and Normal cases based on chest CT scans acquired in different imaging centers using various protocols, and radiation doses. We showed that while our proposed model is trained on a relatively small dataset acquired from only one imaging center using a specific scanning protocol, the model performs well on heterogeneous test sets obtained by multiple scanners using different technical parameters. We also showed that the model can be updated via an unsupervised approach to cope with the data shift between the train and test sets and enhance the robustness of the model upon receiving a new external dataset from a different center. We adopted an ensemble architecture to aggregate the predictions from multiple versions of the model. For initial training and development purposes, an in-house dataset of 171 COVID-19, 60 CAP, and 76 Normal cases was used, which contained volumetric CT scans acquired from one imaging center using a constant standard radiation dose scanning protocol. To evaluate the model, we collected four different test sets retrospectively to investigate the effects of the shifts in the data characteristics on the model's performance. Among the test cases, there were CT scans with similar characteristics as the train set as well as noisy low-dose and ultra-low dose CT scans. In addition, some test CT scans were obtained from patients with a history of cardiovascular diseases or surgeries. The entire test dataset used in this study contained 51 COVID-19, 28 CAP, and 51 Normal cases. Experimental results indicate that our proposed framework performs well on all test sets achieving total accuracy of 96.15% (95%CI: [91.25-98.74]), COVID-19 sensitivity of 96.08% (95%CI: [86.54-99.5]), CAP sensitivity of 92.86% (95%CI: [76.50-99.19]).
    Artifact Identification in X-ray Diffraction Data using Machine Learning Methods. (arXiv:2207.14804v1 [eess.IV])
    The in situ synchrotron high-energy X-ray powder diffraction (XRD) technique is highly utilized by researchers to analyze the crystallographic structures of materials in functional devices (e.g., battery materials) or in complex sample environments (e.g., diamond anvil cells or syntheses reactors). An atomic structure of a material can be identified by its diffraction pattern, along with detailed analysis such as Rietveld refinement which indicates how the measured structure deviates from the ideal structure (e.g., internal stresses or defects). For in situ experiments, a series of XRD images is usually collected on the same sample at different conditions (e.g., adiabatic conditions), yielding different states of matter, or simply collected continuously as a function of time to track the change of a sample over a chemical or physical process. In situ experiments are usually performed with area detectors, collecting 2D images composed of diffraction rings for ideal powders. Depending on the material's form, one may observe different characteristics other than the typical Debye Scherrer rings for a realistic sample and its environments, such as textures or preferred orientations and single crystal diffraction spots in the 2D XRD image. In this work, we present an investigation of machine learning methods for fast and reliable identification and separation of the single crystal diffraction spots in XRD images. The exclusion of artifacts during an XRD image integration process allows a precise analysis of the powder diffraction rings of interest. We observe that the gradient boosting method can consistently produce high accuracy results when it is trained with small subsets of highly diverse datasets. The method dramatically decreases the amount of time spent on identifying and separating single crystal spots in comparison to the conventional method.
    Spliced Binned-Pareto Distribution for Robust Modeling of Heavy-tailed Time Series. (arXiv:2106.10952v2 [stat.ML] UPDATED)
    This work proposes a novel method to robustly and accurately model time series with heavy-tailed noise, in non-stationary scenarios. In many practical application time series have heavy-tailed noise that significantly impacts the performance of classical forecasting models; in particular, accurately modeling a distribution over extreme events is crucial to performing accurate time series anomaly detection. We propose a Spliced Binned-Pareto distribution which is both robust to extreme observations and allows accurate modeling of the full distribution. Our method allows the capture of time dependencies in the higher order moments of the distribution such as the tail heaviness. We compare the robustness and the accuracy of the tail estimation of our method to other state of the art methods on Twitter mentions count time series.
    Automatic Reward Design via Learning Motivation-Consistent Intrinsic Rewards. (arXiv:2207.14722v1 [cs.LG])
    Reward design is a critical part of the application of reinforcement learning, the performance of which strongly depends on how well the reward signal frames the goal of the designer and how well the signal assesses progress in reaching that goal. In many cases, the extrinsic rewards provided by the environment (e.g., win or loss of a game) are very sparse and make it difficult to train agents directly. Researchers usually assist the learning of agents by adding some auxiliary rewards in practice. However, designing auxiliary rewards is often turned to a trial-and-error search for reward settings that produces acceptable results. In this paper, we propose to automatically generate goal-consistent intrinsic rewards for the agent to learn, by maximizing which the expected accumulative extrinsic rewards can be maximized. To this end, we introduce the concept of motivation which captures the underlying goal of maximizing certain rewards and propose the motivation based reward design method. The basic idea is to shape the intrinsic rewards by minimizing the distance between the intrinsic and extrinsic motivations. We conduct extensive experiments and show that our method performs better than the state-of-the-art methods in handling problems of delayed reward, exploration, and credit assignment.
    Meta Reinforcement Learning with Successor Feature Based Context. (arXiv:2207.14723v1 [cs.LG])
    Most reinforcement learning (RL) methods only focus on learning a single task from scratch and are not able to use prior knowledge to learn other tasks more effectively. Context-based meta RL techniques are recently proposed as a possible solution to tackle this. However, they are usually less efficient than conventional RL and may require many trial-and-errors during training. To address this, we propose a novel meta-RL approach that achieves competitive performance comparing to existing meta-RL algorithms, while requires significantly fewer environmental interactions. By combining context variables with the idea of decomposing reward in successor feature framework, our method does not only learn high-quality policies for multiple tasks simultaneously but also can quickly adapt to new tasks with a small amount of training. Compared with state-of-the-art meta-RL baselines, we empirically show the effectiveness and data efficiency of our method on several continuous control tasks.
    Cluster-Specific Predictions with Multi-Task Gaussian Processes. (arXiv:2011.07866v3 [cs.LG] UPDATED)
    A model involving Gaussian processes (GPs) is introduced to simultaneously handle multi-task learning, clustering, and prediction for multiple functional data. This procedure acts as a model-based clustering method for functional data as well as a learning step for subsequent predictions for new tasks. The model is instantiated as a mixture of multi-task GPs with common mean processes. A variational EM algorithm is derived for dealing with the optimisation of the hyper-parameters along with the hyper-posteriors' estimation of latent variables and processes. We establish explicit formulas for integrating the mean processes and the latent clustering variables within a predictive distribution, accounting for uncertainty on both aspects. This distribution is defined as a mixture of cluster-specific GP predictions, which enhances the performances when dealing with group-structured data. The model handles irregular grid of observations and offers different hypotheses on the covariance structure for sharing additional information across tasks. The performances on both clustering and prediction tasks are assessed through various simulated scenarios and real datasets. The overall algorithm, called MagmaClust, is publicly available as an R package.
    Email Spam Detection Using Hierarchical Attention Hybrid Deep Learning Method. (arXiv:2204.07390v2 [cs.CL] UPDATED)
    Email is one of the most widely used ways to communicate, with millions of people and businesses relying on it to communicate and share knowledge and information on a daily basis. Nevertheless, the rise in email users has occurred a dramatic increase in spam emails in recent years. Processing and managing emails properly for individuals and companies are getting increasingly difficult. This article proposes a novel technique for email spam detection that is based on a combination of convolutional neural networks, gated recurrent units, and attention mechanisms. During system training, the network is selectively focused on necessary parts of the email text. The usage of convolution layers to extract more meaningful, abstract, and generalizable features by hierarchical representation is the major contribution of this study. Additionally, this contribution incorporates cross-dataset evaluation, which enables the generation of more independent performance results from the model's training dataset. According to cross-dataset evaluation results, the proposed technique advances the results of the present attention-based techniques by utilizing temporal convolutions, which give us more flexible receptive field sizes are utilized. The suggested technique's findings are compared to those of state-of-the-art models and show that our approach outperforms them.
    A One-Shot Reparameterization Method for Reducing the Loss of Tile Pruning on DNNs. (arXiv:2207.14545v1 [cs.CV])
    Recently, tile pruning has been widely studied to accelerate the inference of deep neural networks (DNNs). However, we found that the loss due to tile pruning, which can eliminate important elements together with unimportant elements, is large on trained DNNs. In this study, we propose a one-shot reparameterization method, called TileTrans, to reduce the loss of tile pruning. Specifically, we repermute the rows or columns of the weight matrix such that the model architecture can be kept unchanged after reparameterization. This repermutation realizes the reparameterization of the DNN model without any retraining. The proposed reparameterization method combines important elements into the same tile; thus, preserving the important elements after the tile pruning. Furthermore, TileTrans can be seamlessly integrated into existing tile pruning methods because it is a pre-processing method executed before pruning, which is orthogonal to most existing methods. The experimental results demonstrate that our method is essential in reducing the loss of tile pruning on DNNs. Specifically, the accuracy is improved by up to 17% for AlexNet while 5% for ResNet-34, where both models are pre-trained on ImageNet.
    Design Methodology for Deep Out-of-Distribution Detectors in Real-Time Cyber-Physical Systems. (arXiv:2207.14694v1 [cs.LG])
    When machine learning (ML) models are supplied with data outside their training distribution, they are more likely to make inaccurate predictions; in a cyber-physical system (CPS), this could lead to catastrophic system failure. To mitigate this risk, an out-of-distribution (OOD) detector can run in parallel with an ML model and flag inputs that could lead to undesirable outcomes. Although OOD detectors have been well studied in terms of accuracy, there has been less focus on deployment to resource constrained CPSs. In this study, a design methodology is proposed to tune deep OOD detectors to meet the accuracy and response time requirements of embedded applications. The methodology uses genetic algorithms to optimize the detector's preprocessing pipeline and selects a quantization method that balances robustness and response time. It also identifies several candidate task graphs under the Robot Operating System (ROS) for deployment of the selected design. The methodology is demonstrated on two variational autoencoder based OOD detectors from the literature on two embedded platforms. Insights into the trade-offs that occur during the design process are provided, and it is shown that this design methodology can lead to a drastic reduction in response time in relation to an unoptimized OOD detector while maintaining comparable accuracy.
    Restoring Vision in Adverse Weather Conditions with Patch-Based Denoising Diffusion Models. (arXiv:2207.14626v1 [cs.CV])
    Image restoration under adverse weather conditions has been of significant interest for various computer vision applications. Recent successful methods rely on the current progress in deep neural network architectural designs (e.g., with vision transformers). Motivated by the recent progress achieved with state-of-the-art conditional generative models, we present a novel patch-based image restoration algorithm based on denoising diffusion probabilistic models. Our patch-based diffusion modeling approach enables size-agnostic image restoration by using a guided denoising process with smoothed noise estimates across overlapping patches during inference. We empirically evaluate our model on benchmark datasets for image desnowing, combined deraining and dehazing, and raindrop removal. We demonstrate our approach to achieve state-of-the-art performances on both weather-specific and multi-weather image restoration, and qualitatively show strong generalization to real-world test images.
    Deep Learning for Bayesian Optimization of Scientific Problems with High-Dimensional Structure. (arXiv:2104.11667v3 [cs.LG] UPDATED)
    Bayesian optimization (BO) is a popular paradigm for global optimization of expensive black-box functions, but there are many domains where the function is not completely a black-box. The data may have some known structure (e.g. symmetries) and/or the data generation process may be a composite process that yields useful intermediate or auxiliary information in addition to the value of the optimization objective. However, surrogate models traditionally employed in BO, such as Gaussian Processes (GPs), scale poorly with dataset size and do not easily accommodate known structure. Instead, we use Bayesian neural networks, a class of scalable and flexible surrogate models with inductive biases, to extend BO to complex, structured problems with high dimensionality. We demonstrate BO on a number of realistic problems in physics and chemistry, including topology optimization of photonic crystal materials using convolutional neural networks, and chemical property optimization of molecules using graph neural networks. On these complex tasks, we show that neural networks often outperform GPs as surrogate models for BO in terms of both sampling efficiency and computational cost.
    Content-Aware Differential Privacy with Conditional Invertible Neural Networks. (arXiv:2207.14625v1 [cs.CR])
    Differential privacy (DP) has arisen as the gold standard in protecting an individual's privacy in datasets by adding calibrated noise to each data sample. While the application to categorical data is straightforward, its usability in the context of images has been limited. Contrary to categorical data the meaning of an image is inherent in the spatial correlation of neighboring pixels making the simple application of noise infeasible. Invertible Neural Networks (INN) have shown excellent generative performance while still providing the ability to quantify the exact likelihood. Their principle is based on transforming a complicated distribution into a simple one e.g. an image into a spherical Gaussian. We hypothesize that adding noise to the latent space of an INN can enable differentially private image modification. Manipulation of the latent space leads to a modified image while preserving important details. Further, by conditioning the INN on meta-data provided with the dataset we aim at leaving dimensions important for downstream tasks like classification untouched while altering other parts that potentially contain identifying information. We term our method content-aware differential privacy (CADP). We conduct experiments on publicly available benchmarking datasets as well as dedicated medical ones. In addition, we show the generalizability of our method to categorical data. The source code is publicly available at https://github.com/Cardio-AI/CADP.
    Towards Communication-efficient Vertical Federated Learning Training via Cache-enabled Local Updates. (arXiv:2207.14628v1 [cs.LG])
    Vertical federated learning (VFL) is an emerging paradigm that allows different parties (e.g., organizations or enterprises) to collaboratively build machine learning models with privacy protection. In the training phase, VFL only exchanges the intermediate statistics, i.e., forward activations and backward derivatives, across parties to compute model gradients. Nevertheless, due to its geo-distributed nature, VFL training usually suffers from the low WAN bandwidth. In this paper, we introduce CELU-VFL, a novel and efficient VFL training framework that exploits the local update technique to reduce the cross-party communication rounds. CELU-VFL caches the stale statistics and reuses them to estimate model gradients without exchanging the ad hoc statistics. Significant techniques are proposed to improve the convergence performance. First, to handle the stochastic variance problem, we propose a uniform sampling strategy to fairly choose the stale statistics for local updates. Second, to harness the errors brought by the staleness, we devise an instance weighting mechanism that measures the reliability of the estimated gradients. Theoretical analysis proves that CELU-VFL achieves a similar sub-linear convergence rate as vanilla VFL training but requires much fewer communication rounds. Empirical results on both public and real-world workloads validate that CELU-VFL can be up to six times faster than the existing works.
    Learning idempotent representation for subspace clustering. (arXiv:2207.14431v1 [cs.LG])
    The critical point for the successes of spectral-type subspace clustering algorithms is to seek reconstruction coefficient matrices which can faithfully reveal the subspace structures of data sets. An ideal reconstruction coefficient matrix should have two properties: 1) it is block diagonal with each block indicating a subspace; 2) each block is fully connected. Though there are various spectral-type subspace clustering algorithms have been proposed, some defects still exist in the reconstruction coefficient matrices constructed by these algorithms. We find that a normalized membership matrix naturally satisfies the above two conditions. Therefore, in this paper, we devise an idempotent representation (IDR) algorithm to pursue reconstruction coefficient matrices approximating normalized membership matrices. IDR designs a new idempotent constraint for reconstruction coefficient matrices. And by combining the doubly stochastic constraints, the coefficient matrices which are closed to normalized membership matrices could be directly achieved. We present the optimization algorithm for solving IDR problem and analyze its computation burden as well as convergence. The comparisons between IDR and related algorithms show the superiority of IDR. Plentiful experiments conducted on both synthetic and real world datasets prove that IDR is an effective and efficient subspace clustering algorithm.
    Adaptive Gradient Methods at the Edge of Stability. (arXiv:2207.14484v1 [cs.LG])
    Very little is known about the training dynamics of adaptive gradient methods like Adam in deep learning. In this paper, we shed light on the behavior of these algorithms in the full-batch and sufficiently large batch settings. Specifically, we empirically demonstrate that during full-batch training, the maximum eigenvalue of the preconditioned Hessian typically equilibrates at a certain numerical value -- the stability threshold of a gradient descent algorithm. For Adam with step size $\eta$ and $\beta_1 = 0.9$, this stability threshold is $38/\eta$. Similar effects occur during minibatch training, especially as the batch size grows. Yet, even though adaptive methods train at the ``Adaptive Edge of Stability'' (AEoS), their behavior in this regime differs in a significant way from that of non-adaptive methods at the EoS. Whereas non-adaptive algorithms at the EoS are blocked from entering high-curvature regions of the loss landscape, adaptive gradient methods at the AEoS can keep advancing into high-curvature regions, while adapting the preconditioner to compensate. Our findings can serve as a foundation for the community's future understanding of adaptive gradient methods in deep learning.
    Reweighted Manifold Learning of Collective Variables from Enhanced Sampling Simulations. (arXiv:2207.14554v1 [physics.chem-ph])
    Enhanced sampling methods are indispensable in computational physics and chemistry, where atomistic simulations cannot exhaustively sample the high-dimensional configuration space of dynamical systems due to the sampling problem. A class of such enhanced sampling methods works by identifying a few slow degrees of freedom, termed collective variables (CVs), and enhancing the sampling along these CVs. Selecting CVs to analyze and drive the sampling is not trivial and often relies on physical and chemical intuition. Despite routinely circumventing this issue using manifold learning to estimate CVs directly from standard simulations, such methods cannot provide mappings to a low-dimensional manifold from enhanced sampling simulations as the geometry and density of the learned manifold are biased. Here, we address this crucial issue and provide a general reweighting framework based on anisotropic diffusion maps for manifold learning that takes into account that the learning data set is sampled from a biased probability distribution. We consider manifold learning methods based on constructing a Markov chain describing transition probabilities between high-dimensional samples. We show that our framework reverts the biasing effect yielding CVs that correctly describe the equilibrium density. This advancement enables the construction of low-dimensional CVs using manifold learning directly from data generated by enhanced sampling simulations. We call our framework reweighted manifold learning. We show that it can be used in many manifold learning techniques on data from both standard and enhanced sampling simulations.
    Decentralized Machine Learning for Intelligent Health Care Systems on the Computing Continuum. (arXiv:2207.14584v1 [cs.DC])
    The introduction of electronic personal health records (EHR) enables nationwide information exchange and curation among different health care systems. However, the current EHR systems do not provide transparent means for diagnosis support, medical research or can utilize the omnipresent data produced by the personal medical devices. Besides, the EHR systems are centrally orchestrated, which could potentially lead to a single point of failure. Therefore, in this article, we explore novel approaches for decentralizing machine learning over distributed ledgers to create intelligent EHR systems that can utilize information from personal medical devices for improved knowledge extraction. Consequently, we proposed and evaluated a conceptual EHR to enable anonymous predictive analysis across multiple medical institutions. The evaluation results indicate that the decentralized EHR can be deployed over the computing continuum with reduced machine learning time of up to 60% and consensus latency of below 8 seconds.
    Best-of-Both-Worlds Algorithms for Partial Monitoring. (arXiv:2207.14550v1 [cs.LG])
    This paper considers the partial monitoring problem with $k$-actions and $d$-outcomes and provides the first best-of-both-worlds algorithms, whose regrets are bounded poly-logarithmically in the stochastic regime and near-optimally in the adversarial regime. To be more specific, we show that for non-degenerate locally observable games, the regret in the stochastic regime is bounded by $O(k^3 m^2 \log(T) \log(k_{\Pi} T) / \Delta_{\mathrm{\min}})$ and in the adversarial regime by $O(k^{2/3} m \sqrt{T \log(T) \log k_{\Pi}})$, where $T$ is the number of rounds, $m$ is the maximum number of distinct observations per action, $\Delta_{\min}$ is the minimum optimality gap, and $k_{\Pi}$ is the number of Pareto optimal actions. Moreover, we show that for non-degenerate globally observable games, the regret in the stochastic regime is bounded by $O(\max\{c_{\mathcal{G}}^2 / k,\, c_{\mathcal{G}}\} \log(T) \log(k_{\Pi} T) / \Delta_{\min}^2)$ and in the adversarial regime by $O((\max\{c_{\mathcal{G}}^2 / k,\, c_{\mathcal{G}}\} \log(T) \log(k_{\Pi} T)))^{1/3} T^{2/3})$, where $c_{\mathcal{G}}$ is a game-dependent constant. Our algorithms are based on the follow-the-regularized-leader framework that takes into account the nature of the partial monitoring problem, inspired by algorithms in the field of online learning with feedback graphs.
    Expanding the class of global objective functions for dissimilarity-based hierarchical clustering. (arXiv:2207.14375v1 [cs.LG])
    Recent work on dissimilarity-based hierarchical clustering has led to the introduction of global objective functions for this classical problem. Several standard approaches, such as average linkage, as well as some new heuristics have been shown to provide approximation guarantees. Here we introduce a broad new class of objective functions which satisfy desirable properties studied in prior work. Many common agglomerative and divisive clustering methods are shown to be greedy algorithms for these objectives, which are inspired by related concepts in phylogenetics.
    Image Augmentation for Satellite Images. (arXiv:2207.14580v1 [cs.CV])
    This study proposes the use of generative models (GANs) for augmenting the EuroSAT dataset for the Land Use and Land Cover (LULC) Classification task. We used DCGAN and WGAN-GP to generate images for each class in the dataset. We then explored the effect of augmenting the original dataset by about 10% in each case on model performance. The choice of GAN architecture seems to have no apparent effect on the model performance. However, a combination of geometric augmentation and GAN-generated images improved baseline results. Our study shows that GANs augmentation can improve the generalizability of deep classification models on satellite images.
    Supplementing Recurrent Neural Network Wave Functions with Symmetry and Annealing to Improve Accuracy. (arXiv:2207.14314v1 [cond-mat.dis-nn])
    Recurrent neural networks (RNNs) are a class of neural networks that have emerged from the paradigm of artificial intelligence and has enabled lots of interesting advances in the field of natural language processing. Interestingly, these architectures were shown to be powerful ansatze to approximate the ground state of quantum systems. Here, we build over the results of [Phys. Rev. Research 2, 023358 (2020)] and construct a more powerful RNN wave function ansatz in two dimensions. We use symmetry and annealing to obtain accurate estimates of ground state energies of the two-dimensional (2D) Heisenberg model, on the square lattice and on the triangular lattice. We show that our method is superior to Density Matrix Renormalisation Group (DMRG) for system sizes larger than or equal to $14 \times 14$ on the triangular lattice.
    Sequential Models in the Synthetic Data Vault. (arXiv:2207.14406v1 [cs.LG])
    The goal of this paper is to describe a system for generating synthetic sequential data within the Synthetic data vault. To achieve this, we present the Sequential model currently in SDV, an end-to-end framework that builds a generative model for multi-sequence, real-world data. This includes a novel neural network-based machine learning model, conditional probabilistic auto-regressive (CPAR) model. The overall system and the model is available in the open source Synthetic Data Vault (SDV) library {https://github.com/sdv-dev/SDV}, along with a variety of other models for different synthetic data needs. After building the Sequential SDV, we used it to generate synthetic data and compared its quality against an existing, non-sequential generative adversarial network based model called CTGAN. To compare the sequential synthetic data against its real counterpart, we invented a new metric called Multi-Sequence Aggregate Similarity (MSAS). We used it to conclude that our Sequential SDV model learns higher level patterns than non-sequential models without any trade-offs in synthetic data quality.
    Contrastive Pre-training of Spatial-Temporal Trajectory Embeddings. (arXiv:2207.14539v1 [cs.CV])
    Pre-training trajectory embeddings is a fundamental and critical procedure in spatial-temporal trajectory mining, and is beneficial for a wide range of downstream tasks. The key for generating effective trajectory embeddings is to extract high-level travel semantics from trajectories, including movement patterns and travel purposes, with consideration of the trajectories' long-term spatial-temporal correlations. Despite the existing efforts, there are still major challenges in pre-training trajectory embeddings. First, commonly used generative pretext tasks are not suitable for extracting high-level semantics from trajectories. Second, existing data augmentation methods fit badly on trajectory datasets. Third, current encoder designs fail to fully incorporate long-term spatial-temporal correlations hidden in trajectories. To tackle these challenges, we propose a novel Contrastive Spatial-Temporal Trajectory Embedding (CSTTE) model for learning comprehensive trajectory embeddings. CSTTE adopts the contrastive learning framework so that its pretext task is robust to noise. A specially designed data augmentation method for trajectories is coupled with the contrastive pretext task to preserve the high-level travel semantics. We also build an efficient spatial-temporal trajectory encoder to efficiently and comprehensively model the long-term spatial-temporal correlations in trajectories. Extensive experiments on two downstream tasks and three real-world datasets prove the superiority of our model compared with the existing trajectory embedding methods.
    Model selection with Gini indices under auto-calibration. (arXiv:2207.14372v1 [cs.LG])
    In general, the Gini index does not give a consistent scoring rule. Therefore, maximizing the Gini index may lead to a wrong decision. The main issue is that the Gini index is a rank-based score that is not calibration-sensitive. We show that the Gini index allows for consistent scoring if we restrict it to the class of auto-calibrated regression models.
    KG-NSF: Knowledge Graph Completion with a Negative-Sample-Free Approach. (arXiv:2207.14617v1 [cs.LG])
    Knowledge Graph (KG) completion is an important task that greatly benefits knowledge discovery in many fields (e.g. biomedical research). In recent years, learning KG embeddings to perform this task has received considerable attention. Despite the success of KG embedding methods, they predominantly use negative sampling, resulting in increased computational complexity as well as biased predictions due to the closed world assumption. To overcome these limitations, we propose \textbf{KG-NSF}, a negative sampling-free framework for learning KG embeddings based on the cross-correlation matrices of embedding vectors. It is shown that the proposed method achieves comparable link prediction performance to negative sampling-based methods while converging much faster.
    Ensemble forecasts in reproducing kernel Hilbert space family: dynamical systems in Wonderland. (arXiv:2207.14653v1 [math-ph])
    A methodological framework for ensemble-based estimation and simulation of high dimensional dynamical systems such as the oceanic or atmospheric flows is proposed. To that end, the dynamical system is embedded in a family of reproducing kernel Hilbert spaces with kernel functions driven by the dynamics. This family is nicknamed Wonderland for its appealing properties. In Wonderland the Koopman and Perron-Frobenius operators are unitary and uniformly continuous. This property warrants they can be expressed in exponential series of diagonalizable bounded infinitesimal generators. Access to Lyapunov exponents and to exact ensemble based expressions of the tangent linear dynamics are directly available as well. Wonderland enables us the devise of strikingly simple ensemble data assimilation methods for trajectory reconstructions in terms of constant-in-time linear combinations of trajectory samples. Such an embarrassingly simple strategy is made possible through a fully justified superposition principle ensuing from several fundamental theorems.
    Interactive Recommendations for Optimal Allocations in Markets with Constraints. (arXiv:2207.04143v2 [cs.LG] UPDATED)
    Recommendation systems when employed in markets play a dual role: they assist users in selecting their most desired items from a large pool and they help in allocating a limited number of items to the users who desire them the most. Despite the prevalence of capacity constraints on allocations in many real-world recommendation settings, a principled way of incorporating them in the design of these systems has been lacking. Motivated by this, we propose an interactive framework where the system provider can enhance the quality of recommendations to the users by opportunistically exploring allocations that maximize user rewards and respect the capacity constraints using appropriate pricing mechanisms. We model the problem as an instance of a low-rank combinatorial multi-armed bandit problem with selection constraints on the arms. We employ an integrated approach using techniques from collaborative filtering, combinatorial bandits, and optimal resource allocation to provide an algorithm that provably achieves sub-linear regret, namely $\tilde{\mathcal{O}} ( \sqrt{N M (N+M) RT} )$ in $T$ rounds for a problem with $N$ users, $M$ items and rank $R$ mean reward matrix. Empirical studies on synthetic and real-world data also demonstrate the effectiveness and performance of our approach.
    Conditioning Normalizing Flows for Rare Event Sampling. (arXiv:2207.14530v1 [physics.comp-ph])
    Understanding the dynamics of complex molecular processes is often linked to the study of infrequent transitions between long-lived stable states. The standard approach to the sampling of such rare events is to generate an ensemble of transition paths using a random walk in trajectory space. This, however, comes with the drawback of strong correlation between subsequently visited paths and with an intrinsic difficulty in parallelizing the sampling process. We propose a transition path sampling scheme based on neural-network generated configurations. These are obtained employing normalizing flows, a neural network class able to generate decorrelated samples from a given distribution. With this approach, not only are correlations between visited paths removed, but the sampling process becomes easily parallelizable. Moreover, by conditioning the normalizing flow, the sampling of configurations can be steered towards the regions of interest. We show that this allows for resolving both the thermodynamics and kinetics of the transition region.
    StudioGAN: A Taxonomy and Benchmark of GANs for Image Synthesis. (arXiv:2206.09479v2 [cs.CV] UPDATED)
    Generative Adversarial Network (GAN) is one of the state-of-the-art generative models for realistic image synthesis. While training and evaluating GAN becomes increasingly important, the current GAN research ecosystem does not provide reliable benchmarks for which the evaluation is conducted consistently and fairly. Furthermore, because there are few validated GAN implementations, researchers devote considerable time to reproducing baselines. We study the taxonomy of GAN approaches and present a new open-source library named StudioGAN. StudioGAN supports 7 GAN architectures, 9 conditioning methods, 4 adversarial losses, 13 regularization modules, 3 differentiable augmentations, 7 evaluation metrics, and 5 evaluation backbones. With our training and evaluation protocol, we present a large-scale benchmark using various datasets (CIFAR10, ImageNet, AFHQv2, FFHQ, and Baby/Papa/Granpa-ImageNet) and 3 different evaluation backbones (InceptionV3, SwAV, and Swin Transformer). Unlike other benchmarks used in the GAN community, we train representative GANs, including BigGAN, StyleGAN2, and StyleGAN3, in a unified training pipeline and quantify generation performance with 7 evaluation metrics. The benchmark evaluates other cutting-edge generative models(e.g., StyleGAN-XL, ADM, MaskGIT, and RQ-Transformer). StudioGAN provides GAN implementations, training, and evaluation scripts with the pre-trained weights. StudioGAN is available at https://github.com/POSTECH-CVLab/PyTorch-StudioGAN.
    Curriculum Learning for Data-Efficient Vision-Language Alignment. (arXiv:2207.14525v1 [cs.CV])
    Aligning image and text encoders from scratch using contrastive learning requires large amounts of paired image-text data. We alleviate this need by aligning individually pre-trained language and vision representation models using a much smaller amount of paired data, augmented with a curriculum learning algorithm to learn fine-grained vision-language alignments. TOnICS (Training with Ontology-Informed Contrastive Sampling) initially samples minibatches whose image-text pairs contain a wide variety of objects to learn object-level alignment, and progressively samples minibatches where all image-text pairs contain the same object to learn finer-grained contextual alignment. Aligning pre-trained BERT and VinVL models to each other using TOnICS outperforms CLIP on downstream zero-shot image retrieval while using less than 1% as much training data.
    A Hybrid Complex-valued Neural Network Framework with Applications to Electroencephalogram (EEG). (arXiv:2207.14799v1 [cs.LG])
    In this article, we present a new EEG signal classification framework by integrating the complex-valued and real-valued Convolutional Neural Network(CNN) with discrete Fourier transform (DFT). The proposed neural network architecture consists of one complex-valued convolutional layer, two real-valued convolutional layers, and three fully connected layers. Our method can efficiently utilize the phase information contained in the DFT. We validate our approach using two simulated EEG signals and a benchmark data set and compare it with two widely used frameworks. Our method drastically reduces the number of parameters used and improves accuracy when compared with the existing methods in classifying benchmark data sets, and significantly improves performance in classifying simulated EEG signals.
    Quantum Deep Reinforcement Learning for Robot Navigation Tasks. (arXiv:2202.12180v2 [cs.RO] UPDATED)
    In this work, we utilize Quantum Deep Reinforcement Learning as method to learn navigation tasks for a simple, wheeled robot in three simulated environments of increasing complexity. We show similar performance of a parameterized quantum circuit trained with well established deep reinforcement learning techniques in a hybrid quantum-classical setup compared to a classical baseline. To our knowledge this is the first demonstration of quantum machine learning (QML) for robotic behaviors. Thus, we establish robotics as a viable field of study for QML algorithms and henceforth quantum computing and quantum machine learning as potential techniques for future advancements in autonomous robotics. Beyond that, we discuss current limitations of the presented approach as well as future research directions in the field of quantum machine learning for autonomous robots.
    Distributed Stochastic Bandit Learning with Context Distributions. (arXiv:2207.14391v1 [cs.LG])
    We study the problem of distributed stochastic multi-arm contextual bandit with unknown contexts, in which M agents work collaboratively to choose optimal actions under the coordination of a central server in order to minimize the total regret. In our model, an adversary chooses a distribution on the set of possible contexts and the agents observe only the context distribution and the exact context is unknown to the agents. Such a situation arises, for instance, when the context itself is a noisy measurement or based on a prediction mechanism as in weather forecasting or stock market prediction. Our goal is to develop a distributed algorithm that selects a sequence of optimal actions to maximize the cumulative reward. By performing a feature vector transformation and by leveraging the UCB algorithm, we propose a UCB algorithm for stochastic bandits with context distribution and prove that our algorithm achieves a regret and communications bounds of $O(d\sqrt{MT}log^2T)$ and $O(M^{1.5}d^3)$, respectively, for linearly parametrized reward functions. We also consider a case where the agents observe the actual context after choosing the action. For this setting we presented a modified algorithm that utilizes the additional information to achieve a tighter regret bound. Finally, we validated the performance of our algorithms and compared it with other baseline approaches using extensive simulations on synthetic data and on the real world movielens dataset.
    Continual Learning for Monolingual End-to-End Automatic Speech Recognition. (arXiv:2112.09427v3 [eess.AS] UPDATED)
    Adapting Automatic Speech Recognition (ASR) models to new domains results in a deterioration of performance on the original domain(s), a phenomenon called Catastrophic Forgetting (CF). Even monolingual ASR models cannot be extended to new accents, dialects, topics, etc. without suffering from CF, making them unable to be continually enhanced without storing all past data. Fortunately, Continual Learning (CL) methods, which aim to enable continual adaptation while overcoming CF, can be used. In this paper, we implement an extensive number of CL methods for End-to-End ASR and test and compare their ability to extend a monolingual Hybrid CTC-Transformer model across four new tasks. We find that the best performing CL method closes the gap between the fine-tuned model (lower bound) and the model trained jointly on all tasks (upper bound) by more than 40%, while requiring access to only 0.6% of the original data.
    Bridging the Gap between Deep Learning and Hypothesis-Driven Analysis via Permutation Testing. (arXiv:2207.14349v1 [cs.LG])
    A fundamental approach in neuroscience research is to test hypotheses based on neuropsychological and behavioral measures, i.e., whether certain factors (e.g., related to life events) are associated with an outcome (e.g., depression). In recent years, deep learning has become a potential alternative approach for conducting such analyses by predicting an outcome from a collection of factors and identifying the most "informative" ones driving the prediction. However, this approach has had limited impact as its findings are not linked to statistical significance of factors supporting hypotheses. In this article, we proposed a flexible and scalable approach based on the concept of permutation testing that integrates hypothesis testing into the data-driven deep learning analysis. We apply our approach to the yearly self-reported assessments of 621 adolescent participants of the National Consortium of Alcohol and Neurodevelopment in Adolescence (NCANDA) to predict negative valence, a symptom of major depressive disorder according to the NIMH Research Domain Criteria (RDoC). Our method successfully identifies categories of risk factors that further explain the symptom.
    POLAR: A Polynomial Arithmetic Framework for Verifying Neural-Network Controlled Systems. (arXiv:2106.13867v4 [eess.SY] UPDATED)
    We propose POLAR, a \textbf{pol}ynomial \textbf{ar}ithmetic framework that leverages polynomial overapproximations with interval remainders for bounded-time reachability analysis of neural network-controlled systems (NNCSs). Compared with existing arithmetic approaches that use standard Taylor models, our framework uses a novel approach to iteratively overapproximate the neuron output ranges layer-by-layer with a combination of Bernstein polynomial interpolation for continuous activation functions and Taylor model arithmetic for the other operations. This approach can overcome the main drawback in the standard Taylor model arithmetic, i.e. its inability to handle functions that cannot be well approximated by Taylor polynomials, and significantly improve the accuracy and efficiency of reachable states computation for NNCSs. To further tighten the overapproximation, our method keeps the Taylor model remainders symbolic under the linear mappings when estimating the output range of a neural network. We show that POLAR can be seamlessly integrated with existing Taylor model flowpipe construction techniques, and demonstrate that POLAR significantly outperforms the current state-of-the-art techniques on a suite of benchmarks.
    Significant changes in EEG neural oscillations during different phases of three-dimensional multiple object tracking task (3D-MOT) imply different roles for attention and working memory. (arXiv:2207.14470v1 [q-bio.NC])
    Our ability to track multiple objects in a dynamic environment enables us to perform everyday tasks such as driving, playing team sports, and walking in a crowded mall. Despite more than three decades of literature on multiple object tracking (MOT) tasks, the underlying and intertwined neural mechanisms remain poorly understood. Here we looked at the electroencephalography (EEG) neural correlates and their changes across the three phases of a 3D-MOT task, namely identification, tracking and recall. We recorded the EEG activity of 24 participants while they were performing a 3D-MOT task with either 1, 2 or 3 targets where some trials were lateralized and some were not. We observed what seems to be a handoff between focused attention and working memory processes when going from tracking to recall. Our findings revealed a strong inhibition in delta and theta frequencies from the frontal region during tracking, followed by a strong (re)activation of these same frequencies during recall. Our results also showed contralateral delay activity (CDA) for the lateralized trials, in both the identification and recall phases but not during tracking.
    Physics-Informed Neural Networks for Shell Structures. (arXiv:2207.14291v1 [cs.CE])
    The numerical modeling of thin shell structures is a challenge, which has been met by a variety of finite element (FE) and other formulations -- many of which give rise to new challenges, from complex implementations to artificial locking. As a potential alternative, we use machine learning and present a Physics-Informed Neural Network (PINN) to predict the small-strain response of arbitrarily curved shells. To this end, the shell midsurface is described by a chart, from which the mechanical fields are derived in a curvilinear coordinate frame by adopting Naghdi's shell theory. Unlike in typical PINN applications, the corresponding strong or weak form must therefore be solved in a non-Euclidean domain. We investigate the performance of the proposed PINN in three distinct scenarios, including the well-known Scordelis-Lo roof setting widely used to test FE shell elements against locking. Results show that the PINN can accurately identify the solution field in all three benchmarks if the equations are presented in their weak form, while it may fail to do so when using the strong form. In the thin-thickness limit, where classical methods are susceptible to locking, training time notably increases as the differences in scaling of the membrane, shear, and bending energies lead to adverse numerical stiffness in the gradient flow dynamics. Nevertheless, the PINN can accurately match the ground truth and performs well in the Scordelis-Lo roof benchmark, highlighting its potential for a drastically simplified alternative to designing locking-free shell FE formulations.
    Federated Learning for Non-IID Data via Client Variance Reduction and Adaptive Server Update. (arXiv:2207.08391v2 [cs.LG] UPDATED)
    Federated learning (FL) is an emerging technique used to collaboratively train a global machine learning model while keeping the data localized on the user devices. The main obstacle to FL's practical implementation is the Non-Independent and Identical (Non-IID) data distribution across users, which slows convergence and degrades performance. To tackle this fundamental issue, we propose a method (ComFed) that enhances the whole training process on both the client and server sides. The key idea of ComFed is to simultaneously utilize client-variance reduction techniques to facilitate server aggregation and global adaptive update techniques to accelerate learning. Our experiments on the Cifar-10 classification task show that ComFed can improve state-of-the-art algorithms dedicated to Non-IID data.
  • Open

    Model selection with Gini indices under auto-calibration. (arXiv:2207.14372v1 [cs.LG])
    In general, the Gini index does not give a consistent scoring rule. Therefore, maximizing the Gini index may lead to a wrong decision. The main issue is that the Gini index is a rank-based score that is not calibration-sensitive. We show that the Gini index allows for consistent scoring if we restrict it to the class of auto-calibrated regression models.
    Spliced Binned-Pareto Distribution for Robust Modeling of Heavy-tailed Time Series. (arXiv:2106.10952v2 [stat.ML] UPDATED)
    This work proposes a novel method to robustly and accurately model time series with heavy-tailed noise, in non-stationary scenarios. In many practical application time series have heavy-tailed noise that significantly impacts the performance of classical forecasting models; in particular, accurately modeling a distribution over extreme events is crucial to performing accurate time series anomaly detection. We propose a Spliced Binned-Pareto distribution which is both robust to extreme observations and allows accurate modeling of the full distribution. Our method allows the capture of time dependencies in the higher order moments of the distribution such as the tail heaviness. We compare the robustness and the accuracy of the tail estimation of our method to other state of the art methods on Twitter mentions count time series.
    Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning. (arXiv:2207.14800v1 [cs.LG])
    In view of its power in extracting feature representation, contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL), leading to efficient policy learning in various applications. Despite its tremendous empirical successes, the understanding of contrastive learning for RL remains elusive. To narrow such a gap, we study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions. For both models, we propose to extract the correct feature representations of the low-rank model by minimizing a contrastive loss. Moreover, under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs. We further theoretically prove that our algorithm recovers the true representations and simultaneously achieves sample efficiency in learning the optimal policy and Nash equilibrium in MDPs and MGs. We also provide empirical studies to demonstrate the efficacy of the UCB-based contrastive learning method for RL. To the best of our knowledge, we provide the first provably efficient online RL algorithm that incorporates contrastive learning for representation learning. Our codes are available at https://github.com/Baichenjia/Contrastive-UCB.
    Best-of-Both-Worlds Algorithms for Partial Monitoring. (arXiv:2207.14550v1 [cs.LG])
    This paper considers the partial monitoring problem with $k$-actions and $d$-outcomes and provides the first best-of-both-worlds algorithms, whose regrets are bounded poly-logarithmically in the stochastic regime and near-optimally in the adversarial regime. To be more specific, we show that for non-degenerate locally observable games, the regret in the stochastic regime is bounded by $O(k^3 m^2 \log(T) \log(k_{\Pi} T) / \Delta_{\mathrm{\min}})$ and in the adversarial regime by $O(k^{2/3} m \sqrt{T \log(T) \log k_{\Pi}})$, where $T$ is the number of rounds, $m$ is the maximum number of distinct observations per action, $\Delta_{\min}$ is the minimum optimality gap, and $k_{\Pi}$ is the number of Pareto optimal actions. Moreover, we show that for non-degenerate globally observable games, the regret in the stochastic regime is bounded by $O(\max\{c_{\mathcal{G}}^2 / k,\, c_{\mathcal{G}}\} \log(T) \log(k_{\Pi} T) / \Delta_{\min}^2)$ and in the adversarial regime by $O((\max\{c_{\mathcal{G}}^2 / k,\, c_{\mathcal{G}}\} \log(T) \log(k_{\Pi} T)))^{1/3} T^{2/3})$, where $c_{\mathcal{G}}$ is a game-dependent constant. Our algorithms are based on the follow-the-regularized-leader framework that takes into account the nature of the partial monitoring problem, inspired by algorithms in the field of online learning with feedback graphs.
    Continual Learning for Monolingual End-to-End Automatic Speech Recognition. (arXiv:2112.09427v3 [eess.AS] UPDATED)
    Adapting Automatic Speech Recognition (ASR) models to new domains results in a deterioration of performance on the original domain(s), a phenomenon called Catastrophic Forgetting (CF). Even monolingual ASR models cannot be extended to new accents, dialects, topics, etc. without suffering from CF, making them unable to be continually enhanced without storing all past data. Fortunately, Continual Learning (CL) methods, which aim to enable continual adaptation while overcoming CF, can be used. In this paper, we implement an extensive number of CL methods for End-to-End ASR and test and compare their ability to extend a monolingual Hybrid CTC-Transformer model across four new tasks. We find that the best performing CL method closes the gap between the fine-tuned model (lower bound) and the model trained jointly on all tasks (upper bound) by more than 40%, while requiring access to only 0.6% of the original data.
    Bayesian nonparametric mixture inconsistency for the number of components: How worried should we be in practice?. (arXiv:2207.14717v1 [stat.ME])
    We consider the Bayesian mixture of finite mixtures (MFMs) and Dirichlet process mixture (DPM) models for clustering. Recent asymptotic theory has established that DPMs overestimate the number of clusters for large samples and that estimators from both classes of models are inconsistent for the number of clusters under misspecification, but the implications for finite sample analyses are unclear. The final reported estimate after fitting these models is often a single representative clustering obtained using an MCMC summarisation technique, but it is unknown how well such a summary estimates the number of clusters. Here we investigate these practical considerations through simulations and an application to gene expression data, and find that (i) DPMs overestimate the number of clusters even in finite samples, but only to a limited degree that may be correctable using appropriate summaries, and (ii) misspecification can lead to considerable overestimation of the number of clusters in both DPMs and MFMs, but results are nevertheless often still interpretable. We provide recommendations on MCMC summarisation and suggest that although the more appealing asymptotic properties of MFMs provide strong motivation to prefer them, results obtained using MFMs and DPMs are often very similar in practice.
    SHAP for additively modeled features in a boosted trees model. (arXiv:2207.14490v1 [stat.ML])
    An important technique to explore a black-box machine learning (ML) model is called SHAP (SHapley Additive exPlanation). SHAP values decompose predictions into contributions of the features in a fair way. We will show that for a boosted trees model with some or all features being additively modeled, the SHAP dependence plot of such a feature corresponds to its partial dependence plot up to a vertical shift. We illustrate the result with XGBoost.
    Stochastic Parallelizable Eigengap Dilation for Large Graph Clustering. (arXiv:2207.14589v1 [stat.ML])
    Large graphs commonly appear in social networks, knowledge graphs, recommender systems, life sciences, and decision making problems. Summarizing large graphs by their high level properties is helpful in solving problems in these settings. In spectral clustering, we aim to identify clusters of nodes where most edges fall within clusters and only few edges fall between clusters. This task is important for many downstream applications and exploratory analysis. A core step of spectral clustering is performing an eigendecomposition of the corresponding graph Laplacian matrix (or equivalently, a singular value decomposition, SVD, of the incidence matrix). The convergence of iterative singular value decomposition approaches depends on the eigengaps of the spectrum of the given matrix, i.e., the difference between consecutive eigenvalues. For a graph Laplacian corresponding to a well-clustered graph, the eigenvalues will be non-negative but very small (much less than $1$) slowing convergence. This paper introduces a parallelizable approach to dilating the spectrum in order to accelerate SVD solvers and in turn, spectral clustering. This is accomplished via polynomial approximations to matrix operations that favorably transform the spectrum of a matrix without changing its eigenvectors. Experiments demonstrate that this approach significantly accelerates convergence, and we explain how this transformation can be parallelized and stochastically approximated to scale with available compute.  ( 3 min )
    Recursive Importance Sketching for Rank Constrained Least Squares: Algorithms and High-order Convergence. (arXiv:2011.08360v3 [math.OC] UPDATED)
    In this paper, we propose {\it \underline{R}ecursive} {\it \underline{I}mportance} {\it \underline{S}ketching} algorithm for {\it \underline{R}ank} constrained least squares {\it \underline{O}ptimization} (RISRO). The key step of RISRO is recursive importance sketching, a new sketching framework based on deterministically designed recursive projections, which significantly differs from the randomized sketching in the literature \citep{mahoney2011randomized,woodruff2014sketching}. Several existing algorithms in the literature can be reinterpreted under this new sketching framework and RISRO offers clear advantages over them. RISRO is easy to implement and computationally efficient, where the core procedure in each iteration is to solve a dimension-reduced least squares problem. We establish the local quadratic-linear and quadratic rate of convergence for RISRO under some mild conditions. We also discover a deep connection of RISRO to the Riemannian Gauss-Newton algorithm on fixed rank matrices. The effectiveness of RISRO is demonstrated in two applications in machine learning and statistics: low-rank matrix trace regression and phase retrieval. Simulation studies demonstrate the superior numerical performance of RISRO.  ( 2 min )
    A deep learning approach to data-driven model-free pricing and to martingale optimal transport. (arXiv:2103.11435v2 [q-fin.CP] UPDATED)
    We introduce a novel and highly tractable supervised learning approach based on neural networks that can be applied for the computation of model-free price bounds of, potentially high-dimensional, financial derivatives and for the determination of optimal hedging strategies attaining these bounds. In particular, our methodology allows to train a single neural network offline and then to use it online for the fast determination of model-free price bounds of a whole class of financial derivatives with current market data. We show the applicability of this approach and highlight its accuracy in several examples involving real market data. Further, we show how a neural network can be trained to solve martingale optimal transport problems involving fixed marginal distributions instead of financial market data.  ( 2 min )
    Can We Mitigate Backdoor Attack Using Adversarial Detection Methods?. (arXiv:2006.14871v2 [cs.LG] UPDATED)
    Deep Neural Networks are well known to be vulnerable to adversarial attacks and backdoor attacks, where minor modifications on the input are able to mislead the models to give wrong results. Although defenses against adversarial attacks have been widely studied, investigation on mitigating backdoor attacks is still at an early stage. It is unknown whether there are any connections and common characteristics between the defenses against these two attacks. We conduct comprehensive studies on the connections between adversarial examples and backdoor examples of Deep Neural Networks to seek to answer the question: can we detect backdoor using adversarial detection methods. Our insights are based on the observation that both adversarial examples and backdoor examples have anomalies during the inference process, highly distinguishable from benign samples. As a result, we revise four existing adversarial defense methods for detecting backdoor examples. Extensive evaluations indicate that these approaches provide reliable protection against backdoor attacks, with a higher accuracy than detecting adversarial examples. These solutions also reveal the relations of adversarial examples, backdoor examples and normal samples in model sensitivity, activation space and feature space. This is able to enhance our understanding about the inherent features of these two attacks and the defense opportunities.  ( 3 min )
    Cluster-Specific Predictions with Multi-Task Gaussian Processes. (arXiv:2011.07866v3 [cs.LG] UPDATED)
    A model involving Gaussian processes (GPs) is introduced to simultaneously handle multi-task learning, clustering, and prediction for multiple functional data. This procedure acts as a model-based clustering method for functional data as well as a learning step for subsequent predictions for new tasks. The model is instantiated as a mixture of multi-task GPs with common mean processes. A variational EM algorithm is derived for dealing with the optimisation of the hyper-parameters along with the hyper-posteriors' estimation of latent variables and processes. We establish explicit formulas for integrating the mean processes and the latent clustering variables within a predictive distribution, accounting for uncertainty on both aspects. This distribution is defined as a mixture of cluster-specific GP predictions, which enhances the performances when dealing with group-structured data. The model handles irregular grid of observations and offers different hypotheses on the covariance structure for sharing additional information across tasks. The performances on both clustering and prediction tasks are assessed through various simulated scenarios and real datasets. The overall algorithm, called MagmaClust, is publicly available as an R package.  ( 3 min )
    Treatment Effect Estimation with Unobserved and Heterogeneous Confounding Variables. (arXiv:2207.14439v1 [stat.ME])
    The estimation of the treatment effect is often biased in the presence of unobserved confounding variables which are commonly referred to as hidden variables. Although a few methods have been recently proposed to handle the effect of hidden variables, these methods often overlook the possibility of any interaction between the observed treatment variable and the unobserved covariates. In this work, we address this shortcoming by studying a multivariate response regression problem with both unobserved and heterogeneous confounding variables of the form $Y=A^T X+ B^T Z+ \sum_{j=1}^{p} C^T_j X_j Z + E$, where $Y \in \mathbb{R}^m$ are $m$-dimensional response variables, $X \in \mathbb{R}^p$ are observed covariates (including the treatment variable), $Z \in \mathbb{R}^K$ are $K$-dimensional unobserved confounders, and $E \in \mathbb{R}^m$ is the random noise. Allowing for the interaction between $X_j$ and $Z$ induces the heterogeneous confounding effect. Our goal is to estimate the unknown matrix $A$, the direct effect of the observed covariates or the treatment on the responses. To this end, we propose a new debiased estimation approach via SVD to remove the effect of unobserved confounding variables. The rate of convergence of the estimator is established under both the homoscedastic and heteroscedastic noises. We also present several simulation experiments and a real-world data application to substantiate our findings.  ( 2 min )
    Tangential Wasserstein Projections. (arXiv:2207.14727v1 [stat.ML])
    We develop a notion of projections between sets of probability measures using the geometric properties of the 2-Wasserstein space. It is designed for general multivariate probability measures, is computationally efficient to implement, and provides a unique solution in regular settings. The idea is to work on regular tangent cones of the Wasserstein space using generalized geodesics. Its structure and computational properties make the method applicable in a variety of settings, from causal inference to the analysis of object data. An application to estimating causal effects yields a generalization of the notion of synthetic controls to multivariate data with individual-level heterogeneity, as well as a way to estimate optimal weights jointly over all time periods.  ( 2 min )
    Conformal Prediction: a Unified Review of Theory and New Challenges. (arXiv:2005.07972v2 [cs.LG] UPDATED)
    In this work we provide a review of basic ideas and novel developments about Conformal Prediction -- an innovative distribution-free, non-parametric forecasting method, based on minimal assumptions -- that is able to yield in a very straightforward way predictions sets that are valid in a statistical sense also in in the finite sample case. The in-depth discussion provided in the paper covers the theoretical underpinnings of Conformal Prediction, and then proceeds to list the more advanced developments and adaptations of the original idea.  ( 3 min )
    Factorizable Joint Shift in Multinomial Classification. (arXiv:2207.14514v1 [stat.ML])
    Factorizable joint shift was recently proposed as a type of dataset shift for which the characteristics can be estimated from observed data. For the multinomial (multi-class) classification setting, we derive a representation of factorizable joint shift in terms of the source (training) distribution, the target (test) prior class probabilities and the target marginal distribution of the features. On the basis of this result, we propose alternatives to joint importance aligning, at the same time pointing out the limitations encountered when making an assumption of factorizable joint shift. Other results of the paper include correction formulae for the posterior class probabilities both under general dataset shift and factorizable joint shift. In addition, we investigate the consequences of assuming factorizable joint shift for the bias caused by sample selection.  ( 2 min )
    Archaeology of random recursive dags and Cooper-Frieze random networks. (arXiv:2207.14601v1 [math.PR])
    We study the problem of finding the root vertex in large growing networks. We prove that it is possible to construct confidence sets of size independent of the number of vertices in the network that contain the root vertex with high probability in various models of random networks. The models include uniform random recursive dags and uniform Cooper-Frieze random graphs.  ( 2 min )

  • Open

    Angels crying
    K den submitted by /u/nickgraybeal [link] [comments]  ( 85 min )
    Anyone know what AI was used to create this tiktok?
    I keep asking the artist what medium he uses, but he just likes my comments which I think is him gatekeeping the platform. Any help with this one? submitted by /u/Redflameman [link] [comments]  ( 92 min )
    Generated with new version of ruDALL-E
    submitted by /u/knight_hildebrandt [link] [comments]  ( 85 min )
    Holes in Deep Space
    submitted by /u/widgia [link] [comments]  ( 85 min )
    An AI that takes a software and reverse-engineers it?
    Is there an AI or one in development where it takes a software, check how it works, and reverse engineers it or writes a code that creates a product exactly like a copy, or something close to the software? This might help game designers who don’t know how to code an easier time creating their game, based on something similar. Thank you for your interest. submitted by /u/Sparkykun [link] [comments]  ( 94 min )
    Apple's new GAUDI AI turns text prompts into 3D scenes
    submitted by /u/Zirius_Sadfaces [link] [comments]  ( 93 min )
    Do we need Quantum support in Artificial Intelligence (AI)?
    submitted by /u/Philo167 [link] [comments]  ( 93 min )
    Anyone familiar with that app “my talking pet” ? And what they use to power the tech behind it
    I’d love to make a fun weekend project to start exploring this kinda AI behind deep fake but more specifically how this app can achieve a talking photo from one image of your pet uploaded and all you have to do is map the mouth and eyes. submitted by /u/HamburgersNHeroin [link] [comments]  ( 86 min )
    Disco Diffusion AI Art Tutorial Quickstudies #4 Cutn Scheduling
    submitted by /u/prfitofthesngularity [link] [comments]  ( 85 min )
    Chatbot Project Feedback?
    Based on feedback received earlier, I've improved the quality of my conversational chatbot. The bot isn't fully trained yet, but the conversation should at least go smoother with fewer or less obvious blunders. Can I have some constructive feedback on the improved experience? Here's the URL: https://xalen.netlify.app submitted by /u/GameTide [link] [comments]  ( 87 min )
    I Created an AI Powered Basketball Referee
    submitted by /u/_ayushp_ [link] [comments]  ( 85 min )
    Have found Craiyon significantly smarter than Midjourney
    While Midjourney unquestionably creates higher quality images, I've found Craiyon to be significantly more intelligent, especially when it comes to specifying two main objects. Specific examples, sorry mostly Craiyon except The Shrike which pretty simple request. All of these failed completely in Midjourney, while Craiyon succeeded at varying degrees: A muslim and a Jew in a bar (*) https://i.imgur.com/aqmgpVr.jpg A vampire selling drugs ( Craiyon hilarious if crude) https://i.imgur.com/qIek1SU.jpg Hulk attacking trump https://i.imgur.com/t6aTBCS.jpg The shrike, hyperion, dark fantasy (Craiyon shined big time here compared to MJ, which absolutely failed) https://i.imgur.com/7D13zab.jpg vs MJ https://i.imgur.com/31x3svB.png A robotic owl and a robotic hummingbird. https://i.imgur.com/ju9OpTM.png Also more intriguing and poignant with Craiyon IMO Depressed https://i.imgur.com/DNCP15y.png Soul of artificial intelligence https://i.imgur.com/NM2R3tL.jpg Human soul https://i.imgur.com/v1dIBTQ.jpg *I did eventually get MJ to produce A Muslim and a Jew in a bar with some finagling with --stylize (I used 650 or 1000) one square out of 8 finally got the idea. https://i.imgur.com/MaEyjwH.jpg Anyways I'm not ragging on MJ, it's amazing, just sharing some of my experience and hoping MJ catches up to Craiyon IQ soon. Adding more examples as I go. In this it's a complicated prompt and Craiyon absolutely destroys MJ here: Prompt: a giant mechanica robotic panther made of colorful galaxies and stars, jungle background, bokeh, realistic, photography, unreal 5 render, hyper detailed, cinematic lighting, 8k Craiyon: https://i.imgur.com/OrIDHsU.jpg vs MJ https://i.imgur.com/M6H3goj.png submitted by /u/redtailboas [link] [comments]  ( 87 min )
    fabulous journey 🧠
    submitted by /u/nalr00n [link] [comments]  ( 86 min )
    What is the best language to learn to create prototypes?
    Hi I would like to know what would be the best language to create some prototypes for some ideas. My goal is to be able to create some prototypes so i could test some ideas i have and if they are good i would like to outsource it to a programmer. From what i have seen python is pretty standart but why nor ruby? submitted by /u/HappyCampaigns [link] [comments]  ( 87 min )
    "Wizard" created on pixelz.ai
    submitted by /u/PixelzJ [link] [comments]  ( 85 min )
  • Open

    [D] What are good industry places to do RL research in the UK, aside from DeepMind?
    What good industry labs are out there focusing on reinforcement learning, besides DeepMind? It seems like they consistently hoover up all the new grad deep RL talent (and deep learning talent in general to a greater extent). I am wondering if there are any other comparable places to do RL research in the UK or Europe more generally. If not, why not? It seems strange that DeepMind should face no competition in this area. Also, it generally seems like a bad thing for DeepMind to have this monopoly on RL talent in the UK, as they are a walled garden in terms of research. Overall, that could be argued as net negative for the scientific community. It's a giant network of research knowledge and talent that contractually cannot collaborate with the rest of the scientific network. It's surprising that Meta, OpenAI, , and Google Brain do not seem to be investing more in the London market. I'm sure many talented researchers in the UK (and Europe more broadly) would appreciate having more options than joining DeepMind. Instead, it seems other industry labs are generally happy to give DeepMind free pickings over the top research talent in the UK. submitted by /u/alwayshumming [link] [comments]  ( 88 min )
    [D] how to explain to non RL people that PPO needs a Gaussian policy ?
    Hi I came across a situation that I will need to explain to someone that PPO with continuous action space (and thus need a Gaussian policy). The individual has decent ML background but has zero knowledge about RL. The key confusion is that why do we need a Gaussian policy to represent the output. Why can’t we output regression (numerical values) directly? submitted by /u/Electronic_Hawk524 [link] [comments]  ( 88 min )
    [P]Attention Based Protein Structure Prediction
    I am publishing my new work on Protein Structure Prediction with Attention-based Neural-Network in PyTorch. Kaggle Notebook Link - https://lnkd.in/d3Eps_HE https://i.redd.it/wd9v0iweiye91.gif https://i.redd.it/bv74s6weiye91.gif In this example, I have demonstrated protein prediction in different ways, one by using position-specific scoring matrices (PSSMs) and the other by using protein sequences as input. I hope this notebook would be informative and helpful to all for further research and development in the domain of Biomedical and Drug-Discovery with Machine Learning. submitted by /u/victorbasu735 [link] [comments]  ( 87 min )
    [D] Most Popular AI Research July 2022 - Ranked Based On Total Twitter Likes
    submitted by /u/cloud_weather [link] [comments]  ( 88 min )
    [D] Randomizing Train / Test Split - random seed?
    I'm a machine learning student, so a lot of the concepts expressed in this sub are still pretty new to me, but I'm having a pretty difficult time finding an answer for this other than "because that's how it's done". Apologies in advance if I'm asking out of ignorance... because I probably am. When we split our datasets into train and test data, I completely understand why the rows selected for each dataset are randomized. That part makes sense. What I'm *not* sure about is why that random seed is apparently never changed. Intuitively, I feel like if I train a model (let's say a Decision Tree for arguments' sake), then the overall performance of that model may relate to the specific rows selected during the train / test split - it's possible that I just got lucky and just got a good seed. In a production environment, is there any reason to check other seeds after a model is generated and evaluated? Is there any reason why one wouldn't generate random seeds for each model generation? submitted by /u/Ordinary_Pipe_9783 [link] [comments]  ( 89 min )
    [P] I made a small package to implement directed acyclic graph compositions (DAGs) as scikit-learn estimators
    https://skdag.readthedocs.io/en/latest/ I put together this small package to allow estimator compositions that are more complex than a simple linear pipeline. In my opinion it's a little easier to compose ensembles as DAGs rather than working with Pipelines and FeatureUnions when your workflow is anything more complex than a few simple linear steps. Here's an example: from skdag import DAGBuilder dag = ( DAGBuilder() .add_step("impute", SimpleImputer()) .add_step("vitals", "passthrough", deps={"impute": slice(0, 4)}) .add_step( "blood", PCA(n_components=2, random_state=0), deps={"impute": slice(4, 10)} ) .add_step( "rf", RandomForestRegressor(max_depth=5, random_state=0), deps=["blood", "vitals"] ) .add_step("svm", SVR(C=0.7), deps=["blood", "vitals"]) .add_step( "knn", KNeighborsRegress…  ( 89 min )
    [R] BUNGEENeRF: progressive neural radiance field for extreme multi-scale scene rendering
    submitted by /u/SpatialComputing [link] [comments]  ( 123 min )
    [D] Finding intent from file for chat assistant applications.
    I am a newbie to AI/ML world. I got an assignment to find the intent of the file by following rules. RULES: if the file is PDF then Read it. (Text to Spea) if the file is an Image then check if it contains text then convert Image to Text (OCR), if it does not contain the text then describe the object of the image using (YOLO), if it contains a face then try to recognize of a person from data stored in a database if the file is audio then check if it is music, if yes then play it, if it is lecture then convert to text (speech to Text). is it possbile to Train such model usign any othe AIML method? submitted by /u/jig4physics [link] [comments]  ( 87 min )
    [P] SEEKING: Clean dataset containing translations of Plato's works, as well as the original ancient Greek, ideally aligned by Stephanus number.
    They have been around for 2500 years, I should think that I would be able to find that somewhere, but all of the useable files online are so dirty with incorrect characters and missing segments and stephanus and page numbers littered randomly throughout the text. Someone has to have thought to do this before I have. And if someone who appreciated Plato prepared such knowledge, I would hope they would want to share it with anyone who would have it. And if they didn't appreciate Plato before preparing it, I wouldn't trust their work if they weren't convinced they ought to share knowledge freely after they read it all. submitted by /u/muellberggeist [link] [comments]  ( 132 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 88 min )
    [D] Doing Neurips rebuttal, which website do you upload your images/graphs/figure to?
    I don't see the option to insert image in your rebuttal comment. I want to upload new figures to Imgur and paste a link in the comment. But I was worried that Imgur might look a bit unprofessional. So which website or tool do you intend to use? It has to abide by the double-blind policy too. submitted by /u/SuperTankMan8964 [link] [comments]  ( 88 min )
    [D] Machine learning generalization
    Deep Learning question Why is it so hard to generalize information in a neural network? If that doesn’t make sense I am basically asking if I have train a neural network on the physics of a cannonball. Why can’t the neural network generalize that information into the physics of a rocket? I am interested to know if anyone has heard about people working on this problem and where they are currently at on this problem. I would also really like to read more about it. Thank you! submitted by /u/chill_pill23 [link] [comments]  ( 94 min )
    [D] feast or vertex ai feature store - export as tfrecords?
    Hey, I'm exploring some options for feature engineering for ml on gcp. My primary deep learning framework is tensorflow, and I've used tfrecords datasets (by exporting bigquery data to tfrecords on Gcs using the dataflow template) in the past. For my new gig (a team that's currently trying to figure out their mlops infrastructure), having a feature store seems like an interesting option. However, from the examples I've seen, it seems people only export those record as pandas dataframes, which isn't going to work for datasets that won't fit into memory. Is there a workaround to export Feast or Vertex AI feature store data as tfrecords? Would love to hear what other peoples solution look like if they follow a different pattern. submitted by /u/the_Wallie [link] [comments]  ( 88 min )
    [D] Why don’t you use automated feature engineering
    If you are going to engineer features, why do you prefer to do it manually, instead of automating the process of feature generation and selection? The automated process discovers the features that you would, and depending on if you use Genetic Feature Generation, it might elaborate on that (giving better results). What would a programme (which automated that process), have to bring to the table to make you use it? submitted by /u/Tricky_Nail_6659 [link] [comments]  ( 94 min )
    What shall I do now? [Discussion]
    I graduated this year and somehow I've managed to get an MLE role at a (<50 people) startup. I do not have a CS degree. I've learned everything from the internet. So, I am now confused about what to do next. What side activity should I do to make myself a valuable asset, both for the current company as well as for other future opportunities? Shall I focus more on problem-solving (leetcode)? Shall I start with system design? Should I work on my personal side projects? What do I do? submitted by /u/ZENDRO_hex [link] [comments]  ( 127 min )
    [N] machine learning in next generation manufacturing
    submitted by /u/One-Responsibility58 [link] [comments]  ( 87 min )
    [D] Is there an alternative to sinusoidal encoding for temporal embeddings?
    As per the transformer paper, sinusoidal embeddings help inference on longer sequences than the ones it was trained on. This isn't specific to transformers and this property has been extensively used for time series modeling in the past. From what I can see, this is due to the oscillatory property of sinusoidal waves which can be combined in specific manners to embed temporal information. This makes a lot of sense but has there been any method to embed temporal information without sinusoidal encoding? P.S.: I have done my research but I couldn't find anything significant. If anyone has had any personal experiences with any embedding technique that has worked better or equally well then please let me know. submitted by /u/Megixist [link] [comments]  ( 90 min )
    [D] Upcoming interview with Amazon. Looking for tips on how to prepare for it.
    [Mods, please remove. I'm not on the right forum. (also, I can't edit the title...?) Thanks!] I was invited for a 60 minute video interview and I'm nervous about this. If anyone has experience with an interview at Amazon, do you mind sharing how it went for you? Thank you! submitted by /u/centipedeshoesale [link] [comments]  ( 89 min )
    [D] Geospatial relationships
    Let’s say I have a set of points on a plane. We know the values of the predictor variables of all points, but we only know the values of the two target variables for some points. Is there an existing model that would allow me to incorporate the geospatial relationships between points in predicting target variables for the rest of the points on the plane? submitted by /u/Boring-Violinist8291 [link] [comments]  ( 87 min )
    Classifying the 'interestingness' of a word? [D]
    Does anyone know of any models/software that can classify the interestingness of a word? I'm trying to extract the most frequently spoken interesting words of a transcript. Any help would be greatly appreciated, thanks. submitted by /u/edenmannh [link] [comments]  ( 124 min )
  • Open

    Any good textbooks for actor-critic methods?
    Looking for a good resource for actor critic methods to use in my thesis, any good ones out there? submitted by /u/UsualIndividual [link] [comments]  ( 100 min )
    How to explain to someone why PPO needs a Gaussian Policy?
    Hi I came across a situation that I will need to explain to someone that PPO with continuous action space (and thus need a Gaussian policy). The individual has decent ML background but has zero knowledge about RL. The key confusion is that why do we need a Gaussian policy to represent the output. Why can’t we output regression (numerical values) directly? submitted by /u/Electronic_Hawk524 [link] [comments]  ( 102 min )
    GAIL training tips
    Hey, im currently training GAIL+ppo for a cts action space, as a sanity check i tested the algo in discrete action space and the algo was able to solve the problem. However when i switched to the cts action space the agent does not seem to learn since the reward wont grow. Ive tried several combinations of hyperparameters, different architectures and optimizers, but without any results. Any help/guidance is appreciated regarding training of the gail with ppo. ​ My replaymemory contains 5k steps in an environment and then updating the policy episode by eposide(meaning that my batch size isnt a constant since the episodes may have different lengths) 80 epochs. submitted by /u/SigmaEpsilonDelta [link] [comments]  ( 87 min )
  • Open

    Not so fast
    James Gregory’s series for π is not so fast. It converges very slowly and so does not provide an efficient way to compute π. After summing half a million terms, we only get five correct decimal places. We can verify this with the following bc code. s = 0 scale = 50 for(k = 1; […] Not so fast first appeared on John D. Cook.  ( 5 min )
  • Open

    15 Data Issues and How to Fix Them (Part 1)
    How to fix various data issues in a few simple steps? In this first part, I discuss missing, outdated and unobserved data, data that is costly to produce, as well as dirty, unbalanced and unstructured data. The second part deals with biased, inconsistent, siloed, too big or fast flowing data, as well as security/privacy and… Read More »15 Data Issues and How to Fix Them (Part 1) The post 15 Data Issues and How to Fix Them (Part 1) appeared first on Data Science Central.  ( 19 min )
  • Open

    DALL-E, A First Pass
    submitted by /u/Gereshes [link] [comments]  ( 85 min )

  • Open

    [R] Highly Accurate Dichotomous Image Segmentation + Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 87 min )
    Artificial intelligence model finds potential drug molecules a thousand times faster
    submitted by /u/fchung [link] [comments]  ( 87 min )
    [P] Using time series models to predict product demand
    Scenario: Suppose a small city has 2000 supermarkets and you have data in real time that shows how much they order certain products from wholesalers and when. Task: Reach a level where given a year's worth of the data above where you can get meaningful insights to somewhat predict shortages and plan accordingly. I would love if you could direct me at any books, articles or videos that go over a similar thing. Also, I wish to know if this has been done before and therefore is somewhat realistic. submitted by /u/zitrone_dealer [link] [comments]  ( 88 min )
    [R] Blog post series on human genetics for data scientists
    I started writing a blog post series on human genetics for data scientists, with the goal of presenting the major open problems in the field (from an analytical perspective). I explain in the blog why, on the one hand, genetic data is really convenient for statistical and computational analysis (DNA is literally a digital code) and we can in principle do really cool stuff (like predicting who’s at risk for schizophrenia, heart disease, or any other heritable condition), but it’s somewhat tricky and there are many challenges we still need to make progress on. It’s a fascinating research area with interesting analytical challenges and a potential to improve the lives of many people. The first post in the series: https://incrementally.net/2022/07/14/understanding-the-genetic-basis-of-the-human-condition-16-analytical-challenges/ submitted by /u/nadavbrandes [link] [comments]  ( 88 min )
    [D] Quantum Machine Learning
    What's the goal of quantum ML? It seems to me that current ways of applying QML is to shoehorn quantum systems into well-known classical ML approaches, without any form of benefit. I recently discovered too that there is a whole subfield dedicated to QNLP, which is quite surprising since NLP requires drawing correlations between continuous sequences, and current quantum systems are currently limited by their lifetimes. How can they retain long-term memory? Those familiar with the field, can y'all explain why? submitted by /u/Blackforestcheesecak [link] [comments]  ( 89 min )
    [P] Connect models together to build machine-learning workflows
    ​ https://preview.redd.it/wr8k2xa1epe91.png?width=1091&format=png&auto=webp&s=41e3e0f812067d43faff08aa327b1ea90adc0e2b txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications. Workflows can be as simple as a single model. As the picture above illustrates, a workflow can also be a summarization and translation model. Or a model that summarizes and then builds a vector search index. Workflows are constructed in Python or YAML. Logic is built-in for model serving and packaging workflows as Docker images. Full documentation can be found in the links below. GitHub | Documentation | Packaging workflows | Tutorials submitted by /u/davidmezzetti [link] [comments]  ( 87 min )
    I created a CV-based automated basketball referee [P]
    submitted by /u/_ayushp_ [link] [comments]  ( 89 min )
    [D] Notes for Stanford or UMichigan DL course
    Anyone has notes for Stanford's CS231n or UMichigan's EECS 498-007/598-005 deep learning for computer vision course? submitted by /u/Inferno_1405 [link] [comments]  ( 87 min )
    [Discussion]Is there inductive bias in ViT?
    Recently, I've read some paper about CNNs and Transformers, as is well known, there is a natural inductive bias in CNNs, I really wonder if ViT has the inductive bias? submitted by /u/whattoshow [link] [comments]  ( 88 min )
    [R] [D] Multi Agent AI maze/grid
    submitted by /u/kachua26 [link] [comments]  ( 87 min )
  • Open

    Is the action space just a transform of the state space by reward?
    The agent is trying to manipulate the state through actions. Indirectly, the action space is linked to a 3D space for a locomotion task (rather than [-1,1] as in joint positions). After all, the reward is parameterized not by joint positions. This mapping of state -> best actions via a neural network is learning a mapping from state space to what space? submitted by /u/XecutionStyle [link] [comments]  ( 86 min )
    Locomotion RL question about mass.
    Hi, I’m doing experiments with ML agents unity with locomotion tasks. Any body part of agents has own physical parameters, the mass of object. I found that with same algorithm and same reward but with different mass there is very differenr behaviour. Is there some rules, advices what is correct mass distribution with body parts? For example, i has a dog that has 4 legs, each consist of 3 segments(scapula,shoulder,foot etc), where i can find correct masses for it? Is there some resources that say, for example “if u got quadruped ur leg must be x mass on foot, 1.5x mass on shoulder, 1,5x mass on scapula, your body should be 5x, head is 2x etc”. I don’t want to kill my own dog and weigh its body parts ) submitted by /u/IndependenceCivil576 [link] [comments]  ( 86 min )
    Research topics in RL
    What are the hottest/promising research topics in RL? I am new to RL and taking my first steps. From my point of view, offline RL seems a promising direction with recent advances. Can anyone point out other directions? I feel a bit lost because there are so many topics to cover and I do not have a professor to supervise me. submitted by /u/rlopes404 [link] [comments]  ( 87 min )
    Mulit Agent AI maze/grid
    Hi folks, I'm starting new project. problem statement is somewhat like. 1 - you've multiple robot on maze not knowing about the enbironment should generate the 2D map of the maze in a collaborative fashion. 2 - you've multiple robot trying to corner one prey robot in a maze this also in a collaborative manner. Please help me with any resources or previous work you know about. submitted by /u/kachua26 [link] [comments]  ( 101 min )
  • Open

    A good AI video enhancer?
    Recently recorded some footage in 1080p on a GoPro but it doesn’t look very good, any recommendations on a good video enhancer? One that’s done online or on iPadOS would be preferable submitted by /u/Toblerone13 [link] [comments]  ( 86 min )
    Researchers At Oxford Have Created An On-Chip Optical Processor That Can Detect Similarities In Datasets Up To 1,000 Times Faster Than Traditional Machine Learning Algorithms
    The ability to identify non-trivial patterns in data using computational methods has sparked the creation of sophisticated machine intelligence systems with a wide range of crucial applications in science and technology. Such practices have primarily been used on general-purpose digital electronic processors (such as GPUs and CPUs), although this might result in undesirable computational latency and throughput restrictions. Pavlovian associative learning is a fundamental type of learning that shapes both human and animal behavior. Ivan P. Pavlov demonstrated how dogs could learn to identify a ringing bell with food, leading a ring to result in salivation, in a famous experiment conducted more than a century ago. Pavlovian-style associative learning is no longer commonly used in artificial intelligence applications, despite the success of other learning theories such as backpropagation on artificial neural networks (ANNs). As stated in the papers, one reason behind this is that backpropagation method training on “traditional” ANNs requires a lot of processing and energy resources. Continue reading | Checkout the paper submitted by /u/ai-lover [link] [comments]  ( 87 min )
    A new online marketplace sells prompts for DALL-E 2 and GPT-3
    submitted by /u/much_successes [link] [comments]  ( 85 min )
    Magic Tree
    submitted by /u/widgia [link] [comments]  ( 92 min )
    Open Call for digital artists: AI n ART
    Hey, we launch AI Lab for artists and invite you to join. We will select 20 creators that will get alpha access to no-code AI editor (currently has Disco Diffusion, StyleGan with our unique datasets, Film, StyleTransfer, upscale and several "image to 3D" neural networks. These 20 creators with the help of our mentors will create their 3D sculptures using AI tools that be will presented on AR exhibition with 15k+ visitors. Also on the 2nd of August there will be an online lecture on AI in art trends, DiscoDiffusion prompts and settings tips and tricks. https://phygital.plus/ai-lab https://reddit.com/link/wbwwu5/video/u9h3ssjh9pe91/player submitted by /u/Worldly_Apricot_1512 [link] [comments]  ( 86 min )
    Generated with Latent Diffusion and upscaled by Real-ESRGAN
    submitted by /u/Gengar218 [link] [comments]  ( 91 min )
  • Open

    Complex AGM
    The arithmetic-geometric mean (AGM) of two non-negative real numbers a and b is defined as the limit of the iteration starting with a0 = a and b0 = b and an+1 = ½ (an + bn) bn+1 = √(an bn) for n > 0. This sequence converges very quickly and is useful in numerical algorithms. […] Complex AGM first appeared on John D. Cook.  ( 6 min )
  • Open

    Open Call for digital artists: AI in ART
    Hey, we launch AI Lab for artists and invite you to join. We will select 20 creators that will get alpha access to no-code AI editor (currently has Disco Diffusion, StyleGan with our unique datasets, Film, StyleTransfer, upscale and several "image to 3D" neural networks). These 20 creators with the help of our mentors will create their 3D sculptures using AI tools that be will presented on AR exhibition with 15k+ visitors. Also on the 2nd of August there will be an online lecture on AI in art trends, DiscoDiffusion prompts and settings tips and tricks. https://phygital.plus/ai-lab https://reddit.com/link/wbvywk/video/0nnmnynw4pe91/player submitted by /u/Worldly_Apricot_1512 [link] [comments]  ( 86 min )
    Best Neural Networks Courses on Udemy to Consider in 2022 -
    submitted by /u/Lakshmireddys [link] [comments]  ( 85 min )

  • Open

    [R] Reducing Activation Recomputation in Large Transformer Models - Nvidia May 2022
    Paper: https://arxiv.org/abs/2205.05198#nvidia Github: https://github.com/NVIDIA/Megatron-LM Abstract: Training large transformer models is one of the most important computational challenges of modern AI. In this paper, we show how to significantly accelerate training of large transformer models by reducing activation recomputation. Activation recomputation is commonly used to work around memory capacity constraints. Rather than storing activations for backpropagation, they are traditionally recomputed, which saves memory but adds redundant compute. In this work, we show most of this redundant compute is unnecessary because we can reduce memory consumption sufficiently without it. We present two novel yet very simple techniques: sequence parallelism and selective activation recomputat…  ( 88 min )
    [R] PanGu-Coder: Program Synthesis with Function-Level Language Modeling - Huawei 2022
    Paper: https://arxiv.org/abs/2207.11280 Abstract: We present PanGu-Coder, a pretrained decoder-only language model adopting the PanGu-Alpha architecture for text-to-code generation, i.e. the synthesis of programming language solutions given a natural language problem description. We train PanGu-Coder using a two-stage strategy: the first stage employs Causal Language Modelling (CLM) to pre-train on raw programming language data, while the second stage uses a combination of Causal Language Modelling and Masked Language Modelling (MLM) training objectives that focus on the downstream task of text-to-code generation and train on loosely curated pairs of natural language program definitions and code functions. Finally, we discuss PanGu-Coder-FT, which is fine-tuned on a combination of competitive programming problems and code with continuous integration tests. We evaluate PanGu-Coder with a focus on whether it generates functionally correct programs and demonstrate that it achieves equivalent or better performance than similarly sized models, such as CodeX, while attending a smaller context window and training on less data. https://preview.redd.it/7hdptg7j5le91.jpg?width=1040&format=pjpg&auto=webp&s=043b82c7752342e4421f7c9bed1475ada4d06609 https://preview.redd.it/6btcig7j5le91.jpg?width=917&format=pjpg&auto=webp&s=712805e02d81aed10ce085b6d83e7b3b72770cff submitted by /u/Singularian2501 [link] [comments]  ( 88 min )
    [D] ROCm vs CUDA
    Hello people, I tried to look online for comparisons of the recent AMD (ROCm) and GPU (CUDA) cards but I've found very few benchmarks. Since Pytorch natively supports ROCm, I'm thinking about upgrading my GPU card to AMD instead of Nvidia. But I'm afraid of losing too much performance on training. If you guys have any information to share I would be glad to hear! submitted by /u/Krokodeale [link] [comments]  ( 89 min )
    [D] What are some ways to scale and maintain machine learning models?
    Other than API endpoints have you ever worked with or encountered process to deploy a machine learning model at scale submitted by /u/BadKarma-18 [link] [comments]  ( 125 min )
    [D] Are there any tools to quickly label training data manually?
    So, I have heaps of data and I want a way to comfortably label them on my pc or even more preferrably on my phone. I can't really find an app or program to do it. Maybe I am using the wrong search terms, but I really can't find anything. There was https://borgo.app but development seems to have halted... I am just seraching for an application that will show me a piece of text (or image) from a Dataset and I can press a button or similar to quickly label it (as in: sort it into categories). It seems like a trivial app to build and super useful so I cannot believe nobody has done it before. submitted by /u/whipbryd [link] [comments]  ( 89 min )
    [D] 2D cuts with decision tree?
    I'm working on a boosted decision tree, and I've got it working fairly well. However it would be better if it was able to make decisions/cuts in more than one dimension (preferably 2D). Is this something that is even possible? (I'm using sklearn) submitted by /u/Gamwise_Samgee_ [link] [comments]  ( 88 min )
    [D] How To Make STGNNsCapable of Forecasting Long-term Multivariate Time Series Data?
    I've just published my recent medium article in Towards AI publication. Time Series Forecasting (TSF) data is vital in all industries, from Energy to Healthcare. Researchers have achieved some significant advances through the development of TFS models. By thoroughly considering patterns and their relationships for time series, analysis based on long-dependencies in the dataset is a must. This article is about designing a new model based on another model to perform on long-dependencies and produced segment-level representations. This model stands on STEP, an abbreviation of STGNN (Spatial-Temporal Graph Neural Networks) + Enhanced + Pre-training model. Please give it a read and let me know your feedback. If you found it interesting, I would appreciate following me in the medium. https://pub.towardsai.net/how-to-make-stgnnscapable-of-forecasting-long-term-multivariate-time-series-data-9fe5efd77fa1 submitted by /u/rezayazdanfar [link] [comments]  ( 88 min )
    finding job [R]
    Hello friends..i wonder if it is realistic to expect to find a job with the ml education from online courses and a couple if kaggle projects? with no proper university education? submitted by /u/line777888 [link] [comments]  ( 90 min )
    [D] AlphaFold just released a database of 200 million protein structures. How would you use this data as an ML engineer?
    The structure of a protein determines its functionality. Researchers have used this data in the past to design new drugs, vaccines, and enzymes. You can access the database for free here - https://www.deepmind.com/blog/alphafold-reveals-the-structure-of-the-protein-universe This new database will allow researchers to gain a deeper understanding of protein families, how they interact and evolve, etc. Deepmind has written some use cases here - https://www.deepmind.com/blog/alphafold-reveals-the-structure-of-the-protein-universe How would you use it? What would you like to explore or predict with it? submitted by /u/BeautifulVegetable10 [link] [comments]  ( 92 min )
    [P] Truss, a new open-source library for model packaging and deployment
    Hi r/machinelearning At work, I just helped launch Truss, our company’s first open-source project, and I wanted to tell you a bit about it in case it can help you serve and deploy your models. Model serving, as part of MLOps, is the DevOps challenge of keeping a complicated, fragile artifact working in multiple dynamic environments. Data scientists working in large, well-resourced organizations can hand off their models to specialized MLOps teams for serving and deployment. The rest of us have to do it ourselves. As a data scientist, serving and deploying a model requires a different set of skills and technologies than building it did. A data scientist’s working environment is the Jupyter notebook, a flexible and permissive system designed for iterative experimentation. The Jupyter note…  ( 90 min )
    [D] How does multi-head attention actually work?
    I'm trying to understand multi-head attention but don't quite get how queries, keys, and values are projected to different subspaces. More specifically, are the same weight matrices used for each head, or is a different matrix used for each head? The Illustrated Transformer shows eight sets of weight matrices being used for eight heads. But other implementations I've seen (The Annotated Transformer and Gordic Aleksa's implementation, as well as his video on his popular channel The AI Epiphany) seem to use only one linear layer per key, query, and value. I'm confused. Can anybody explain? submitted by /u/jwngx [link] [comments]  ( 90 min )
    Predicting top SKUs [D]
    If I have 100s of part numbers in a warehouse and have to predict what part numbers will be top sellers tomorrow (or next week), what would be the algorithms to start with? submitted by /u/CheeseBurgersx [link] [comments]  ( 87 min )
    [Research] Anyone experienced using Transkribus?
    Hi all, I have a couple questions regarding the Handwritten text recognition software Transkribus. Anyone experienced using it? submitted by /u/Jeannetton [link] [comments]  ( 87 min )
    [D] Will AAAI Revise their NeurIPS Fast Track score?
    I have reached out to the general inquiries email at AAAI 2023 to see if they will be revising their NeurIPS fasttrack score, since NeurIPS downward revised their scoring system for 4 to be borderline reject this year and in previous years a 5 was a borderline reject and AAAI required a 4.9 to be fast tracked. I am curious is anyone has heard anything? submitted by /u/AbjectDrink3276 [link] [comments]  ( 119 min )
    [P] Created tutorials on Information Retrieval, specifically Semantic Search
    Hi, I've created a repo which tries to cover the current progress in the world of information-retrieval using neural information retrievers / semantic search. Repo: https://github.com/kuutsav/information-retrieval . Most of the content follows the work of Nils Reimers (creator of the sentence_transformers library) and his research group. Topics covered Classic way of information retrieval Evaluation metrics Bi-Encoders Cross-Encoders Multilingual retrieval models Training techniques using no labeled data Domain adaptation - GPL, TSDAE, SimCSE Things to come Vector databases Approximate Nearest Neighbor techniques for quick retrieval submitted by /u/krumb0y [link] [comments]  ( 87 min )
    [D] Object detection dataset construction and its diversity
    Hey, I've been trying to look in to explainable deep learning models in object detection and in image recognition in general. Firstly, I feel like the diversity of training data distribution is highly important for the generalization purposes, such that we capture various different views of the wanted object. Later we can augment these views, but this raises problem from image collection point of view. I feel like the explainability of deep learning models could be viewed more clearly, when we can control one variable - the data collection. However, I can't find any research on collecting such data - like how to collect as little data as possible, while maximizing the diversity of generalization. Kind of like sample efficiency, but instead of finding the optimal classifier we try to find the optimal images to create generalizable representations of the said object from images. Does anyone have good keywords or know some research that could work as a starting point? submitted by /u/Spiritual-Reply5896 [link] [comments]  ( 123 min )
    [D] Any way to tackle vanishing gradients without changing the architecture/initialization
    I have a problem for which I need a neural network with a relatively small (approximate) lipschitz-constant, which forces the network to reduce the magnitude of the weights throughout the network. I have only managed to train the network by slowly ramping up the penalty, but this always leads to the network to stop improving on the task, which I very much suspect is due to vanishing gradients. Since I need the small lipschitz-constant, I wonder if there is anything I could try that does not result in a increased lipschitz-constant? For example, are there any optimisers that try to improve upon the vanishing gradient problem? submitted by /u/LeanderKu [link] [comments]  ( 123 min )
    [D] Which clustering algorithm to use to establish ideal limits?
    I have data which shows the time taken for a process to complete on different dates. I need to establish upper and lower limits on that time duration, to define the ideal time range for that process. If I use clustering then what algorithm to use and if not then what method should I use to achieve this. submitted by /u/Kindly-Judgment-1889 [link] [comments]  ( 88 min )
    [D] Professional ML engineers: How much of your day to day job involves math and proofs?
    If you are a professional ML engineer (not data engineer) how much of your day to day work involves doing math and proofs? I can 'do' linear algebra and statistics but I am not sure if doing math and writing proofs on a daily basis would be my cup of tea. EDIT: The reason I asked is because the MS program I am considering requires proofs to pass the ML related classes. I can do that for a couple of classes but not every day. submitted by /u/The_Big_0mg [link] [comments]  ( 100 min )
    [D] Seeking Advice - For graph ML, Neo4j or nah?
    Believe my concerns are fairly general so would appreciate general opinions as well as expert advice, if such is forthcoming. I'm working on a project to implement a knowledge graph, and the important requirements are: Every node needs an embedding The graph needs to be persistent, because people are adding things to it fairly regularly. The graph is going to ingest data constantly The graph needs to be updating embeddings, inferring connections and missing properties, pretty much constantly in the background In short, the graph needs to be be able to prune, expand, and self-maintain based on the output of integrated ML systems. So scability and efficiency (Especially for queries and retrieval and such) is going to be a problem, but I have some ideas about how to deal with it. …  ( 101 min )
    [D] Is it possible to get into an ML PhD program without papers these days?
    Sorry, if you've seen a similar question before somewhere. I'm a FAANG ML engineer. I only have a Masters in CS (no thesis) and one third author paper in Robotics (from Bachelors). Didn't end up publishing in Masters due to various reasons. Also, didn't do PhD (kept thinking over whether I'd be accepted or not and didn't apply). I've been trying to get into ML research. I want to work on original ideas and not just implement known stuff. I'm trying to transfer internally to some research role but finding it very difficult. Even research engineer roles seem to ask for first-author papers or something (or maybe it's the recession or maybe I don't have the right connections). Keep thinking about if I should press the PhD application button but get demoralized due to my poor research experience. Just wanted to put my dilemma to rest by asking this group. submitted by /u/massagetae [link] [comments]  ( 95 min )
    [D] Measuring human-level performance
    Hi, I would like to get some advice on how to go about measuring human-level performance (HLP) for an object detection task. What kind of experiments should I design to measure this, because my ground truths also come from human annotators. Does this mean I am comparing one human annotator against the other to measure the HLP? How about measuring HLP for image classification? submitted by /u/saltmind123 [link] [comments]  ( 87 min )
    [D] What tools do you use in your development environment?
    I am looking for suggestions of tools for development environment in the context of deep reinforcement learning. I'll list what contexts and tools I'm using, as well as which ones I plan to use in my future development environment. Understand "tool" as a library, service, anything used in a development environment. Context Current Future Machine Learning scikit-learn and tensorflow I'm migrating everything to use only JAX Tests pytest pytest, but I would use some to test my model or the algorithm behind it Tracking Weight and biases I accept other suggestions (including self-hosted services) Container Docker I think about migrating to singularity, maybe using both in the appropriate scenarios for each CI/CD GitHub Actions GitHub Actions App Streamlit Streamlit Can you tell what tools you use and whys? Also, what other contexts am I forgetting and do you think it's important to have? submitted by /u/barash-616 [link] [comments]  ( 88 min )
  • Open

    Add conversational AI to any contact center with Amazon Lex and the Amazon Chime SDK
    Customer satisfaction is a potent metric that directly influences the profitability of an organization. With rapid technological advances in the past decade or so, it’s even more important to elevate customer focus in the following ways: Making your organization accessible to your customers across multiple modalities, including voice, text, social media, and more Providing your […]  ( 11 min )
    Identify the location of anomalies using Amazon Lookout for Vision at the edge without using a GPU
    Automated defect detection using computer vision helps improve quality and lower the cost of inspection. Defect detection involves identifying the presence of a defect, classifying types of defects, and identifying where the defects are located. Many manufacturing processes require detection at a low latency, with limited compute resources, and with limited connectivity. Amazon Lookout for […]  ( 11 min )
    Fine-tune and deploy a summarizer model using the Hugging Face Amazon SageMaker containers bringing your own script
    There have been many recent advancements in the NLP domain. Pre-trained models and fully managed NLP services have democratised access and adoption of NLP. Amazon Comprehend is a fully managed service that can perform NLP tasks like custom entity recognition, topic modelling, sentiment analysis and more to extract insights from data without the need of any prior […]  ( 8 min )
    Team and user management with Amazon SageMaker and AWS SSO
    Amazon SageMaker Studio is a web-based integrated development environment (IDE) for machine learning (ML) that lets you build, train, debug, deploy, and monitor your ML models. Each onboarded user in Studio has their own dedicated set of resources, such as compute instances, a home directory on an Amazon Elastic File System (Amazon EFS) volume, and […]  ( 15 min )
    Build and train ML models using a data mesh architecture on AWS: Part 2
    This is the second part of a series that showcases the machine learning (ML) lifecycle with a data mesh design pattern for a large enterprise with multiple lines of business (LOBs) and a Center of Excellence (CoE) for analytics and ML. In part 1, we addressed the data steward persona and showcased a data mesh […]  ( 9 min )
    Build and train ML models using a data mesh architecture on AWS: Part 1
    Organizations across various industries are using artificial intelligence (AI) and machine learning (ML) to solve business challenges specific to their industry. For example, in the financial services industry, you can use AI and ML to solve challenges around fraud detection, credit risk prediction, direct marketing, and many others. Large enterprises sometimes set up a center […]  ( 13 min )
  • Open

    ?
    submitted by /u/quookaa [link] [comments]  ( 90 min )
    Splitting up, style transfering, and then recombining images - tips, tricks, algorithms, code repos?
    So I have large images (5k x 5k or greater) I want to style transfer, but I am hardware limited to a certain size (generally ~1.5k per side). I want to chop these images up and style transfer, but when I do that, the styles don't match (you can see the lines where the image was cut, even if each section is adequately transfered). I saw with Painnt (an app that does this) that after it splits, style transfers, it then does something to merge these separate sections together. Does anybody have any idea what that could be? Is it maybe done by oversampling each section and then merging the overlapping sections? That's all I can really imagine. I would be so appreciative if anyone has any ideas, algorithms, or explanations to share! I've been racking my brain about this, and I've tried a bunch of cutting styles and photoshop combinations, but the fact I can see with Painnt that it's possible programmatically, I would love to reproduce this on my own (I want to use my own styles...). Thank you!! submitted by /u/nomagneticmonopoles [link] [comments]  ( 88 min )
    HFT
    https://www.youtube.com/watch?v=V43a-KxLFcg submitted by /u/fmurph22 [link] [comments]  ( 85 min )
    I interviewed Blake Lemoine, fired Google Engineer, on consciousness and AI. AMA!
    Hey all! I'm Felix! I have a podcast and I interviewed Blake Lemoine earlier this week. The podcast is currently in post production and I wrote the teaser article (linked below) about it, and am happy to answer any Q's. I have a background in AI (phil) myself and really enjoyed the conversation, and would love to chat with the community here/answer Q's anybody may have. Thank you! Teaser article here. submitted by /u/felixanderfelixander [link] [comments]  ( 87 min )
    Engineers working on “analog deep learning” have found a way to propel protons through solids at unprecedented speeds.
    submitted by /u/qptbook [link] [comments]  ( 86 min )
    First Portable Blackrock Brain Computer Interface | Rapid Robotics Fastest Robot Arm Setup | New AI Using Light Performs 1,000x Faster
    submitted by /u/tohelpyou88 [link] [comments]  ( 86 min )
    DeepMind AI Powers Major Scientific Breakthrough: AlphaFold Generates 3D View of the Protein Universe
    submitted by /u/Tao_Dragon [link] [comments]  ( 90 min )
    Going into AI, ML or Computational Statistics without a strong background in CS?
    I’m currently a math/statistics major and am interested in pursuing research in AI, Machine Learning (ML), and computational statistics/numerical methods, aiming for a PhD in something along those lines (so most likely in statistics). I thought about picking up CS as a second major because 1) I hear its very useful to have a bachelors in CS when working in the aforementioned areas, 2) most research in these areas is done in CS departments or by CS faculty, and 3) it provides a good exit opportunity in case things don’t go as planned, since it opens up lots of lucrative employment opportunities. However, I’ll be honest, I’m really not looking forward to taking all those CS classes, except for ones related to my interests. As such, how bad would it be if I don’t have a strong background in CS? Is it something worth doing, even if I don’t particularly want to? I’d much rather take advanced math electives that will also be helpful to me (like measure theory, graph theory, graduate linear algebra, and graduate numerical analysis). For additional context: I’ve taken Intro to CS (and have become quite proficient in Java), several classes that use R and Matlab (also proficient in those), and will be taking advanced electives in AI and the Theory of Machine Learning (perhaps also one in Data Science), all of which are very project-heavy meaning lots of programming, especially in Python. Notably, I’m missing data structures, algorithms, and databases. However, I’m hoping that the project-heavy classes will cover the basics of most of the topics in CS that I’ll need going forward and anything else I can learn on my own as I go, especially since I’ve already taken Intro to CS. I’d appreciate any input though! submitted by /u/mowa0199 [link] [comments]  ( 88 min )
    Max Plank Researchers Propose A Metrical Face Shape Predictor Called MICA (MetrIC fAce)
    submitted by /u/ai-lover [link] [comments]  ( 87 min )
    Resume parsing (OCR): Which solution to choose?
    submitted by /u/tah_zem [link] [comments]  ( 86 min )
    No cloud, No infrastructure; deploy a model in 5 minutes or less.
    Hi there, Lex from Hopsworks. We recently launched our new release, and it comes with something new for those amongst the crowd that have a sense of how model works, but no sense on how to deploy them. We have a Serverless platform; you do not need cloud accounts (aws, google, azure... etc) nor infrastructure; you can run a colab notebook and serve your models, for free :) No catch; you can try yourself directly from a notebook - we have a great example here. You'd need an account on app.hopsworks.ai, and a google drive account. And that's all. Cheers fellow AI people o./ submitted by /u/lexsiga [link] [comments]  ( 86 min )
    American Tornado
    submitted by /u/widgia [link] [comments]  ( 85 min )
    [Off-Topic] The 2nd Reddit Robotics Showcase is this Weekend!
    Saturday 30th & Sunday 31st from 10amEDT / 3pm BST The Reddit Robotics Showcase is an event for all ages and abilities to share their passion for Robotics. From amateurs to academics, startups to industry pro's, see what the global robotics community has been up to! You can find out more from the website we will be livestreaming the event to our YouTube Channel Saturday, 30th of July Industrial / Automation: “The Ocado Series 600 Bot” Matt Whelan, Head of Engineering, Ocado Technology – 10:00 EDT (15:00 BST, 23:00 JST) https://www.youtube.com/watch?v=fy4vpjw_nNw Mobile Robots: “Mobile Robots in the Wild” Marc Hanheide, Lincoln Centre for Autonomous Systems – 14:00 EDT (19:00 BST, 03:00 JST) Sunday, 31st of July Bio – Inspired Robots: “Entering the maze: snake-like robots from aerospace to surgery” Dr Matteo Russo – Rolls-Royce University Technology Centre (UTC) in Manufacturing and On-Wing Technology – 10:00 EDT (15:00 BST, 23:00 JST) https://www.youtube.com/watch?v=GJoAQ1KxaVw Human Robot Interaction: “Social Agents and Human Robot Interaction” Dr Ruth Aylett of the National Robotarium – 14:00 EDT (19:00 BST, 03:00 JST) " The primary purpose of this event is to showcase the multitude of projects underway in the r/Robotics Reddit community. Topics range across all focuses of robotics, such as simulation, navigation, control, perception, and mechatronic design. We will use this showcase to present discussion pieces and foster conversation between active members in the robotics community around the world. The showcase will feature invited roboticists in research and industry to discuss what they see as technical challenges or interesting directions for robots. Amateurs and academics, students and industry professionals alike. " submitted by /u/Badmanwillis [link] [comments]  ( 87 min )
    University Project
    Hi everyone, I'm a master's student at the University of Bath and I am conducting some research into the field of AI and software development. I've created a survey to get a better understanding of developer communities, how they work, and a few other questions about content. It is fully anonymous and the information collected will be used deleted once the project is over. It shouldn't take more than 5 minutes of your time and I appreciate any help that you guys could give me for this. https://form.jotform.com/akat2406/academic-research Apologies if this counts as self-advertisement I'm still very new to this part of Reddit. If you want to know more about the project feel free to message me and I can explain it in a more detailed manner, thanks again and hope you have a good day. submitted by /u/Hunter2406 [link] [comments]  ( 86 min )
    The 'artificial synapse' could allow neural networks to function more like brains. - Science Inter
    submitted by /u/Historical-Object374 [link] [comments]  ( 86 min )
    Where is the equality? Limiting AI biased on ideology is madness
    submitted by /u/Humblebats [link] [comments]  ( 94 min )
    Experimenting with Midjourney and After Effects to make 2.5D trading cards
    submitted by /u/RustedDreams [link] [comments]  ( 86 min )
    Those who work as AI, machine learning, computer vision, or robotics engineers. How did you get there? What is your education? What is your pay? and do you like your job? Thanks in advance for the answers
    submitted by /u/jobseaker999 [link] [comments]  ( 87 min )
    I asked an AI if birds are drones. (GPT-3)
    submitted by /u/kbf_ [link] [comments]  ( 85 min )
    SSO (Single Sign-On) for CVAT, the annotation tool
    For those who are interested in using CVAT with SSO, previously I made a proof-of-concept video to demonstrate my SSO implementation for CVAT: https://www.youtube.com/watch?v=R7hBBLG5Fdc Now I'm happy to announce that I have submitted my code changes: https://github.com/AlexGaoDW/cvat/tree/feature/datawiza-sso And I've created a PR to get it into the official repo. You can try it out by yourself following the document here: https://docs.datawiza.com/guides/cvat.html I also set up an instance using Google as the identity provider such that you can try SSO functionality with your Google account: https://cvat-sso.datawiza.net/ Enjoy! submitted by /u/Membership-Full [link] [comments]  ( 86 min )
    Are there any free online text to image AI’s that are a little better than dalle mini? One that can do celebrities
    submitted by /u/Acrobatic-Animal2432 [link] [comments]  ( 86 min )
    Apple AI Researchers Develop GMPIs (Generative Multiplane Images) For Making A 2D GAN 3D-Aware
    submitted by /u/ai-lover [link] [comments]  ( 94 min )
  • Open

    Value function notation
    When I'm writing about an action-value function Q, which receives an observation o as input, do I write Q(o, a) where a is an action, or write Q(s, a) where s is the full state of the environment? ​ I think I'm confused here because the Q function is estimating the value of the state, but only receiving a partial observation of the state as input. submitted by /u/StandingBuffalo [link] [comments]  ( 87 min )
    Here is the first video in a series explaining Deep Q Learning for self driving cars!
    submitted by /u/Si1veRonReddit [link] [comments]  ( 86 min )
    PPO rollout buffer for turn-based two-player game with varying turn lengths
    Hey there, I am trying to train a MLP policy with PPO on a board game. A turn may take anywhere from one to about fifteen actions, then it's the other players turn. My current implementation uses MaskablePPO from stable_baselines3. My custom VecEnv currently uses a copy of the model to step the games where it's the "opponents" turn when required, thereby acting like the opponent is part of the environment. env.step(actions) will execute the training agents actions, step all games where it's the "opponents" turn and finally return observations for the next state where it's the training agent's turn again. This works in general, but comes with a multitude of problems: Experience is only collected from the "agent" side of the game. Each game has to wait for up to fifteen rounds of mo…  ( 103 min )
    One episode takes about 40 seconds and its only 288 steps!
    First of all this env is made by me and the major problem is its obs My obs is like this --> self.signal_features[(self.current_tick-self.window_size+1):self.current_tick+1] I am using tensorforce and this is my agent spec agent = dict( agent='dueling_dqn', memory=50000, batch_size=128, network='auto', update_frequency=0.25, start_updating=None, learning_rate=0.001, huber_loss=None, horizon=1, discount=0.99, reward_processing=None, return_processing=None, predict_terminal_values=False, target_update_weight=1.0, target_sync_frequency=1, state_preprocessing='linear_normalization', exploration=dict(type='linear', unit='episodes', num_steps=250000.0,initial_value=1.0, final_value=0.0), variable_noise=0.0, l2_regularization=0.0, entropy_regularization=0.0, parallel_interactions=1, config=dict(device='CPU'), saver=dict(directory='model', frequency=1, max_checkpoints=10), summarizer=dict(directory='summaries', summaries='all'), recorder=None ) ​ Can this be sped up or this is the limit? Any help is appreciated...... submitted by /u/Zalkwalker [link] [comments]  ( 95 min )
    Early stopping in PPO/TRPO
    Hello! We have collected a sample from an environment. Then lets say we update PPO/TRPO with K epochs. My question: Does it make sense to apply early stopping if the policy have changed too much with respect to the initial policy which gathered the sample? Meaning that we stop updating the policy at K-P epoch where P \in {0,1,...,K} and then collect a new sample etc. On the other hand in case of PPO/TRPO it is taken care of already that the policy does not change too much. Thus the early stopping may cause the agent to get stuck at local optima or make the learning painfully slow? submitted by /u/SigmaEpsilonDelta [link] [comments]  ( 87 min )
    Autonomous Driving via Reinforcement Learning
    submitted by /u/shani_786 [link] [comments]  ( 87 min )
    Need help: My DQN implementation with Jax (Haiku) gets slower the longer learning goes on
    Hello guys, I tried to use Jax for the first time and I thought coding the DQN would be a good first test. I'm using the Haiku library and the general code structure from CleanRL. My code: https://gist.github.com/nico-bohlinger/4c5b21464df0f3aaf555906b0959a4c5 Unfortunately the number of steps per second keeps steadily decreasing over time. Has somebody an idea why this is happening? If I use my variant of the CleanRL Pytorch version everything is fine. So I would guess something is wrong with the way I use Haiku / Jax. submitted by /u/NiconiusX [link] [comments]  ( 87 min )
  • Open

    Enhancing Backpropagation via Local Loss Optimization
    Posted by Ehsan Amid, Research Scientist, and Rohan Anil, Principal Engineer, Google Research, Brain Team While model design and training data are key ingredients in a deep neural network’s (DNN’s) success, less-often discussed is the specific optimization method used for updating the model parameters (weights). Training DNNs involves minimizing a loss function that measures the discrepancy between the ground truth labels and the model’s predictions. Training is carried out by backpropagation, which adjusts the model weights via gradient descent steps. Gradient descent, in turn, updates the weights by using the gradient (i.e., derivative) of the loss with respect to the weights. The simplest weight update corresponds to stochastic gradient descent, which, in every step, moves the weights…  ( 24 min )
  • Open

    First Portable Blackrock Brain Computer Interface | Rapid Robotics Fastest Robot Arm Setup | New AI Using Light Performs 1,000x Faster
    submitted by /u/tohelpyou88 [link] [comments]  ( 86 min )
    University Research Project
    Hi everyone, I'm a master's student at the University of Bath and I am conducting some research into the field of AI and software development. I've created a survey to get a better understanding of developer communities, how they work, and a few other questions about content. It is fully anonymous and the information collected will be used deleted once the project is over. It shouldn't take more than 5 minutes of your time and I appreciate any help that you guys could give me for this. https://form.jotform.com/akat2406/academic-research If this has been flaired wrong or doesn't meet the subreddit rules please let me know and I will edit/take the post down If you want to know more about the project feel free to message me and I can explain it in a more detailed manner, thanks again and hope you have a good day. submitted by /u/Hunter2406 [link] [comments]  ( 87 min )
  • Open

    What Is a QPU?
    Just as GPUs and DPUs enable accelerated computing today, they’re also helping a new kind of chip, the QPU, boot up the promise of quantum computing. In your hand, a quantum processing unit might look and feel very similar to a graphics or a data processing unit. They’re all typically chips, or modules with multiple Read article > The post What Is a QPU? appeared first on NVIDIA Blog.  ( 8 min )
  • Open

    How Is Artificial Intelligence Changing The Dynamics Of Supply Chain Management?
    Artificial intelligence (AI) has been gaining popularity in the supply chain industry as it promises to help companies improve their…  ( 10 min )
  • Open

    Exploiting Negative Preference in Content-based Music Recommendation with Contrastive Learning. (arXiv:2207.13909v1 [cs.IR])
    Advanced music recommendation systems are being introduced along with the development of machine learning. However, it is essential to design a music recommendation system that can increase user satisfaction by understanding users' music tastes, not by the complexity of models. Although several studies related to music recommendation systems exploiting negative preferences have shown performance improvements, there was a lack of explanation on how they led to better recommendations. In this work, we analyze the role of negative preference in users' music tastes by comparing music recommendation models with contrastive learning exploiting preference (CLEP) but with three different training strategies - exploiting preferences of both positive and negative (CLEP-PN), positive only (CLEP-P), and negative only (CLEP-N). We evaluate the effectiveness of the negative preference by validating each system with a small amount of personalized data obtained via survey and further illuminate the possibility of exploiting negative preference in music recommendations. Our experimental results show that CLEP-N outperforms the other two in accuracy and false positive rate. Furthermore, the proposed training strategies produced a consistent tendency regardless of different types of front-end musical feature extractors, proving the stability of the proposed method.  ( 2 min )
    Learning Deep Morphological Networks with Neural Architecture Search. (arXiv:2106.07714v2 [cs.CV] UPDATED)
    Deep Neural Networks (DNNs) are generated by sequentially performing linear and non-linear processes. Using a combination of linear and non-linear procedures is critical for generating a sufficiently deep feature space. The majority of non-linear operators are derivations of activation functions or pooling functions. Mathematical morphology is a branch of mathematics that provides non-linear operators for a variety of image processing problems. We investigate the utility of integrating these operations in an end-to-end deep learning framework in this paper. DNNs are designed to acquire a realistic representation for a particular job. Morphological operators give topological descriptors that convey salient information about the shapes of objects depicted in images. We propose a method based on meta-learning to incorporate morphological operators into DNNs. The learned architecture demonstrates how our novel morphological operations significantly increase DNN performance on various tasks, including picture classification and edge detection.  ( 2 min )
    Deep Learning for Classification of Thyroid Nodules on Ultrasound: Validation on an Independent Dataset. (arXiv:2207.13765v1 [eess.IV])
    Objectives: The purpose is to apply a previously validated deep learning algorithm to a new thyroid nodule ultrasound image dataset and compare its performances with radiologists. Methods: Prior study presented an algorithm which is able to detect thyroid nodules and then make malignancy classifications with two ultrasound images. A multi-task deep convolutional neural network was trained from 1278 nodules and originally tested with 99 separate nodules. The results were comparable with that of radiologists. The algorithm was further tested with 378 nodules imaged with ultrasound machines from different manufacturers and product types than the training cases. Four experienced radiologists were requested to evaluate the nodules for comparison with deep learning. Results: The Area Under Curve (AUC) of the deep learning algorithm and four radiologists were calculated with parametric, binormal estimation. For the deep learning algorithm, the AUC was 0.70 (95% CI: 0.64 - 0.75). The AUC of radiologists were 0.66 (95% CI: 0.61 - 0.71), 0.67 (95% CI:0.62 - 0.73), 0.68 (95% CI: 0.63 - 0.73), and 0.66 (95%CI: 0.61 - 0.71). Conclusion: In the new testing dataset, the deep learning algorithm achieved similar performances with all four radiologists.  ( 3 min )
    On the fast convergence of minibatch heavy ball momentum. (arXiv:2206.07553v2 [cs.LG] UPDATED)
    Simple stochastic momentum methods are widely used in machine learning optimization, but their good practical performance is at odds with an absence of theoretical guarantees of acceleration in the literature. In this work, we aim to close the gap between theory and practice by showing that stochastic heavy ball momentum, which can be interpreted as a randomized Kaczmarz algorithm with momentum, retains the fast linear rate of (deterministic) heavy ball momentum on quadratic optimization problems, at least when minibatching with a sufficiently large batch size is used. The analysis relies on carefully decomposing the momentum transition matrix, and using new spectral norm concentration bounds for products of independent random matrices. We provide numerical experiments to demonstrate that our bounds are reasonably sharp.  ( 2 min )
    Learning with Succinct Common Representation Based on Wyner's Common Information. (arXiv:1905.10945v2 [cs.LG] UPDATED)
    A new bimodal generative model is proposed for generating conditional and joint samples, accompanied with a training method with learning a succinct bottleneck representation. The proposed model, dubbed as the variational Wyner model, is designed based on two classical problems in network information theory -- distributed simulation and channel synthesis -- in which Wyner's common information arises as the fundamental limit on the succinctness of the common representation. The model is trained by minimizing the symmetric Kullback--Leibler divergence between variational and model distributions with regularization terms for common information, reconstruction consistency, and latent space matching terms, which is carried out via an adversarial density ratio estimation technique. The utility of the proposed approach is demonstrated through experiments for joint and conditional generation with synthetic and real-world datasets, as well as a challenging zero-shot image retrieval task.  ( 2 min )
    Graph Neural Networks to Predict Sports Outcomes. (arXiv:2207.14124v1 [cs.LG])
    Predicting outcomes in sports is important for teams, leagues, bettors, media, and fans. Given the growing amount of player tracking data, sports analytics models are increasingly utilizing spatially-derived features built upon player tracking data. However, player-specific information, such as location, cannot readily be included as features themselves, since common modeling techniques rely on vector input. Accordingly, spatially-derived features are commonly constructed in relation to anchor objects, such as the distance to a ball or goal, through global feature aggregations, or via role-assignment schemes, where players are designated a distinct role in the game. In doing so, we sacrifice inter-player and local relationships in favor of global ones. To address this issue, we introduce a sport-agnostic graph-based representation of game states. We then use our proposed graph representation as input to graph neural networks to predict sports outcomes. Our approach preserves permutation invariance and allows for flexible player interaction weights. We demonstrate how our method provides statistically significant improvements over the state of the art for prediction tasks in both American football and esports, reducing test set loss by 9% and 20%, respectively. Additionally, we show how our model can be used to answer "what if" questions in sports and to visualize relationships between players.  ( 2 min )
    Distributional Actor-Critic Ensemble for Uncertainty-Aware Continuous Control. (arXiv:2207.13730v1 [cs.LG])
    Uncertainty quantification is one of the central challenges for machine learning in real-world applications. In reinforcement learning, an agent confronts two kinds of uncertainty, called epistemic uncertainty and aleatoric uncertainty. Disentangling and evaluating these uncertainties simultaneously stands a chance of improving the agent's final performance, accelerating training, and facilitating quality assurance after deployment. In this work, we propose an uncertainty-aware reinforcement learning algorithm for continuous control tasks that extends the Deep Deterministic Policy Gradient algorithm (DDPG). It exploits epistemic uncertainty to accelerate exploration and aleatoric uncertainty to learn a risk-sensitive policy. We conduct numerical experiments showing that our variant of DDPG outperforms vanilla DDPG without uncertainty estimation in benchmark tasks on robotic control and power-grid optimization.  ( 2 min )
    Improving the Performance of Robust Control through Event-Triggered Learning. (arXiv:2207.14252v1 [eess.SY])
    Robust controllers ensure stability in feedback loops designed under uncertainty but at the cost of performance. Model uncertainty in time-invariant systems can be reduced by recently proposed learning-based methods, thus improving the performance of robust controllers using data. However, in practice, many systems also exhibit uncertainty in the form of changes over time, e.g., due to weight shifts or wear and tear, leading to decreased performance or instability of the learning-based controller. We propose an event-triggered learning algorithm that decides when to learn in the face of uncertainty in the LQR problem with rare or slow changes. Our key idea is to switch between robust and learned controllers. For learning, we first approximate the optimal length of the learning phase via Monte-Carlo estimations using a probabilistic model. We then design a statistical test for uncertain systems based on the moment-generating function of the LQR cost. The test detects changes in the system under control and triggers re-learning when control performance deteriorates due to system changes. We demonstrate improved performance over a robust controller baseline in a numerical example.  ( 2 min )
    GAUDI: A Neural Architect for Immersive 3D Scene Generation. (arXiv:2207.13751v1 [cs.CV])
    We introduce GAUDI, a generative model capable of capturing the distribution of complex and realistic 3D scenes that can be rendered immersively from a moving camera. We tackle this challenging problem with a scalable yet powerful approach, where we first optimize a latent representation that disentangles radiance fields and camera poses. This latent representation is then used to learn a generative model that enables both unconditional and conditional generation of 3D scenes. Our model generalizes previous works that focus on single objects by removing the assumption that the camera pose distribution can be shared across samples. We show that GAUDI obtains state-of-the-art performance in the unconditional generative setting across multiple datasets and allows for conditional generation of 3D scenes given conditioning variables like sparse image observations or text that describes the scene.  ( 2 min )
    A Transformer-based Generative Adversarial Network for Brain Tumor Segmentation. (arXiv:2207.14134v1 [eess.IV])
    Brain tumor segmentation remains a challenge in medical image segmentation tasks. With the application of transformer in various computer vision tasks, transformer blocks show the capability of learning long-distance dependency in global space, which is complementary with CNNs. In this paper, we proposed a novel transformer-based generative adversarial network to automatically segment brain tumors with multi-modalities MRI. Our architecture consists of a generator and a discriminator, which are trained in min-max game progress. The generator is based on a typical "U-shaped" encoder-decoder architecture, whose bottom layer is composed of transformer blocks with resnet. Besides, the generator is trained with deep supervision technology. The discriminator we designed is a CNN-based network with multi-scale $L_{1}$ loss, which is proved to be effective for medical semantic image segmentation. To validate the effectiveness of our method, we conducted experiments on BRATS2015 dataset, achieving comparable or better performance than previous state-of-the-art methods.  ( 2 min )
    p-Adic Statistical Field Theory and Deep Belief Networks. (arXiv:2207.13877v1 [math-ph])
    In this work we initiate the study of the correspondence between $p$-adic statistical field theories (SFTs) and neural networks (NNs). In general quantum field theories over a $p$-adic spacetime can be formulated in a rigorous way. Nowadays these theories are considered just mathematical toy models for understanding the problems of the true theories. In this work we show these theories are deeply connected with the deep belief networks (DBNs). Hinton et al. constructed DBNs by stacking several restricted Boltzmann machines (RBMs). The purpose of this construction is to obtain a network with a hierarchical structure (a deep learning architecture). An RBM corresponds a certain spin glass, thus a DBN should correspond to an ultrametric (hierarchical) spin glass. A model of such system can be easily constructed by using $p$-adic numbers. In our approach, a $p$-adic SFT corresponds to a $p$-adic continuous DBN, and a discretization of this theory corresponds to a $p$-adic discrete DBN. We show that these last machines are universal approximators. In the $p$-adic framework, the correspondence between SFTs and NNs is not fully developed. We point out several open problems.  ( 2 min )
    Cryptographic Hardness of Learning Halfspaces with Massart Noise. (arXiv:2207.14266v1 [cs.LG])
    We study the complexity of PAC learning halfspaces in the presence of Massart noise. In this problem, we are given i.i.d. labeled examples $(\mathbf{x}, y) \in \mathbb{R}^N \times \{ \pm 1\}$, where the distribution of $\mathbf{x}$ is arbitrary and the label $y$ is a Massart corruption of $f(\mathbf{x})$, for an unknown halfspace $f: \mathbb{R}^N \to \{ \pm 1\}$, with flipping probability $\eta(\mathbf{x}) \leq \eta < 1/2$. The goal of the learner is to compute a hypothesis with small 0-1 error. Our main result is the first computational hardness result for this learning problem. Specifically, assuming the (widely believed) subexponential-time hardness of the Learning with Errors (LWE) problem, we show that no polynomial-time Massart halfspace learner can achieve error better than $\Omega(\eta)$, even if the optimal 0-1 error is small, namely $\mathrm{OPT} = 2^{-\log^{c} (N)}$ for any universal constant $c \in (0, 1)$. Prior work had provided qualitatively similar evidence of hardness in the Statistical Query model. Our computational hardness result essentially resolves the polynomial PAC learnability of Massart halfspaces, by showing that known efficient learning algorithms for the problem are nearly best possible.  ( 2 min )
    PEA: Improving the Performance of ReLU Networks for Free by Using Progressive Ensemble Activations. (arXiv:2207.14074v1 [cs.CV])
    In recent years novel activation functions have been proposed to improve the performance of neural networks, and they show superior performance compared to the ReLU counterpart. However, there are environments, where the availability of complex activations is limited, and usually only the ReLU is supported. In this paper we propose methods that can be used to improve the performance of ReLU networks by using these efficient novel activations during model training. More specifically, we propose ensemble activations that are composed of the ReLU and one of these novel activations. Furthermore, the coefficients of the ensemble are neither fixed nor learned, but are progressively updated during the training process in a way that by the end of the training only the ReLU activations remain active in the network and the other activations can be removed. This means that in inference time the network contains ReLU activations only. We perform extensive evaluations on the ImageNet classification task using various compact network architectures and various novel activation functions. Results show 0.2-0.8% top-1 accuracy gain, which confirms the applicability of the proposed methods. Furthermore, we demonstrate the proposed methods on semantic segmentation and we boost the performance of a compact segmentation network by 0.34% mIOU on the Cityscapes dataset.  ( 3 min )
    MarkerMap: nonlinear marker selection for single-cell studies. (arXiv:2207.14106v1 [stat.ML])
    Single-cell RNA-seq data allow the quantification of cell type differences across a growing set of biological contexts. However, pinpointing a small subset of genomic features explaining this variability can be ill-defined and computationally intractable. Here we introduce MarkerMap, a generative model for selecting minimal gene sets which are maximally informative of cell type origin and enable whole transcriptome reconstruction. MarkerMap provides a scalable framework for both supervised marker selection, aimed at identifying specific cell type populations, and unsupervised marker selection, aimed at gene expression imputation and reconstruction. We benchmark MarkerMap's competitive performance against previously published approaches on real single cell gene expression data sets. MarkerMap is available as a pip installable package, as a community resource aimed at developing explainable machine learning techniques for enhancing interpretability in single-cell studies.  ( 2 min )
    Sound2Synth: Interpreting Sound via FM Synthesizer Parameters Estimation. (arXiv:2205.03043v2 [cs.SD] UPDATED)
    Synthesizer is a type of electronic musical instrument that is now widely used in modern music production and sound design. Each parameters configuration of a synthesizer produces a unique timbre and can be viewed as a unique instrument. The problem of estimating a set of parameters configuration that best restore a sound timbre is an important yet complicated problem, i.e.: the synthesizer parameters estimation problem. We proposed a multi-modal deep-learning-based pipeline Sound2Synth, together with a network structure Prime-Dilated Convolution (PDC) specially designed to solve this problem. Our method achieved not only SOTA but also the first real-world applicable results on Dexed synthesizer, a popular FM synthesizer.  ( 2 min )
    Dive into Machine Learning Algorithms for Influenza Virus Host Prediction with Hemagglutinin Sequences. (arXiv:2207.13842v1 [cs.LG])
    Influenza viruses mutate rapidly and can pose a threat to public health, especially to those in vulnerable groups. Throughout history, influenza A viruses have caused pandemics between different species. It is important to identify the origin of a virus in order to prevent the spread of an outbreak. Recently, there has been increasing interest in using machine learning algorithms to provide fast and accurate predictions for viral sequences. In this study, real testing data sets and a variety of evaluation metrics were used to evaluate machine learning algorithms at different taxonomic levels. As hemagglutinin is the major protein in the immune response, only hemagglutinin sequences were used and represented by position-specific scoring matrix and word embedding. The results suggest that the 5-grams-transformer neural network is the most effective algorithm for predicting viral sequence origins, with approximately 99.54% AUCPR, 98.01% F1 score and 96.60% MCC at a higher classification level, and approximately 94.74% AUCPR, 87.41% F1 score and 80.79% MCC at a lower classification level.  ( 2 min )
    A general framework for multi-step ahead adaptive conformal heteroscedastic time series forecasting. (arXiv:2207.14219v1 [stat.ML])
    The exponential growth of machine learning (ML) has prompted a great deal of interest in quantifying the uncertainty of each prediction for a user-defined level of confidence. Reliable uncertainty quantification is crucial and is a step towards increased trust in AI results. It becomes especially important in high-stakes decision-making, where the true output must be within the confidence set with high probability. Conformal prediction (CP) is a distribution-free uncertainty quantification framework that works for any black-box model and yields prediction intervals (PIs) that are valid under the mild assumption of exchangeability. CP-type methods are gaining popularity due to being easy to implement and computationally cheap; however, the exchangeability assumption immediately excludes time series forecasting. Although recent papers tackle covariate shift, this is not enough for the general time series forecasting problem of producing H-step ahead valid PIs. To attain such a goal, we propose a new method called AEnbMIMOCQR (Adaptive ensemble batch multiinput multi-output conformalized quantile regression), which produces asymptotic valid PIs and is appropriate for heteroscedastic time series. We compare the proposed method against state-of-the-art competitive methods in the NN5 forecasting competition dataset. All the code and data to reproduce the experiments are made available  ( 2 min )
    Fast Newton method solving KLR based on Multilevel Circulant Matrix with log-linear complexity. (arXiv:2108.08605v3 [cs.LG] UPDATED)
    Kernel logistic regression (KLR) is a conventional nonlinear classifier in machine learning. With the explosive growth of data size, the storage and computation of large dense kernel matrices is a major challenge in scaling KLR. Even the nystr\"{o}m approximation is applied to solve KLR, it also faces the time complexity of $O(nc^2)$ and the space complexity of $O(nc)$, where $n$ is the number of training instances and $c$ is the sampling size. In this paper, we propose a fast Newton method efficiently solving large-scale KLR problems by exploiting the storage and computing advantages of multilevel circulant matrix (MCM). Specifically, by approximating the kernel matrix with an MCM, the storage space is reduced to $O(n)$, and further approximating the coefficient matrix of the Newton equation as MCM, the computational complexity of Newton iteration is reduced to $O(n \log n)$. The proposed method can run in log-linear time complexity per iteration, because the multiplication of MCM (or its inverse) and vector can be implemented the multidimensional fast Fourier transform (mFFT). Experimental results on some large-scale binary-classification and multi-classification problems show that the proposed method enables KLR to scale to large scale problems with less memory consumption and less training time without sacrificing test accuracy.  ( 3 min )
    ALLNet: A Hybrid Convolutional Neural Network to Improve Diagnosis of Acute Lymphocytic Leukemia (ALL) in White Blood Cells. (arXiv:2108.08195v2 [cs.CV] UPDATED)
    Due to morphological similarity at the microscopic level, making an accurate and time-sensitive distinction between blood cells affected by Acute Lymphocytic Leukemia (ALL) and their healthy counterparts calls for the usage of machine learning architectures. However, three of the most common models, VGG, ResNet, and Inception, each come with their own set of flaws with room for improvement which demands the need for a superior model. ALLNet, the proposed hybrid convolutional neural network architecture, consists of a combination of the VGG, ResNet, and Inception models. The ALL Challenge dataset of ISBI 2019 (available here) contains 10,691 images of white blood cells which were used to train and test the models. 7,272 of the images in the dataset are of cells with ALL and 3,419 of them are of healthy cells. Of the images, 60% were used to train the model, 20% were used for the cross-validation set, and 20% were used for the test set. ALLNet outperformed the VGG, ResNet, and the Inception models across the board, achieving an accuracy of 92.6567%, a sensitivity of 95.5304%, a specificity of 85.9155%, an AUC score of 0.966347, and an F1 score of 0.94803 in the cross-validation set. In the test set, ALLNet achieved an accuracy of 92.0991%, a sensitivity of 96.5446%, a specificity of 82.8035%, an AUC score of 0.959972, and an F1 score of 0.942963. The utilization of ALLNet in the clinical workspace can better treat the thousands of people suffering from ALL across the world, many of whom are children.  ( 3 min )
    ClaSP -- Parameter-free Time Series Segmentation. (arXiv:2207.13987v1 [cs.LG])
    The study of natural and human-made processes often results in long sequences of temporally-ordered values, aka time series (TS). Such processes often consist of multiple states, e.g. operating modes of a machine, such that state changes in the observed processes result in changes in the distribution of shape of the measured values. Time series segmentation (TSS) tries to find such changes in TS post-hoc to deduce changes in the data-generating process. TSS is typically approached as an unsupervised learning problem aiming at the identification of segments distinguishable by some statistical property. Current algorithms for TSS require domain-dependent hyper-parameters to be set by the user, make assumptions about the TS value distribution or the types of detectable changes which limits their applicability. Common hyperparameters are the measure of segment homogeneity and the number of change points, which are particularly hard to tune for each data set. We present ClaSP, a novel, highly accurate, hyper-parameter-free and domain-agnostic method for TSS. ClaSP hierarchically splits a TS into two parts. A change point is determined by training a binary TS classifier for each possible split point and selecting the one split that is best at identifying subsequences to be from either of the partitions. ClaSP learns its main two model-parameters from the data using two novel bespoke algorithms. In our experimental evaluation using a benchmark of 115 data sets, we show that ClaSP outperforms the state of the art in terms of accuracy and is fast and scalable. Furthermore, we highlight properties of ClaSP using several real-world case studies.  ( 3 min )
    Federated Learning Framework Coping with Hierarchical Heterogeneity in Cooperative ITS. (arXiv:2204.00215v3 [cs.LG] UPDATED)
    Deep learning is a key approach for the environment perception function of Cooperative Intelligent Transportation Systems (C-ITS) with autonomous vehicles and smart traffic infrastructure. In today's C-ITS, smart traffic participants are capable of timely generating and transmitting a large amount of data. However, these data can not be used for model training directly due to privacy constraints. In this paper, we introduce a federated learning framework coping with Hierarchical Heterogeneity (H2-Fed), which can notably enhance the conventional pre-trained deep learning model. The framework exploits data from connected public traffic agents in vehicular networks without affecting user data privacy. By coordinating existing traffic infrastructure, including roadside units and road traffic clouds, the model parameters are efficiently disseminated by vehicular communications and hierarchically aggregated. Considering the individual heterogeneity of data distribution, computational and communication capabilities across traffic agents and roadside units, we employ a novel method that addresses the heterogeneity of different aggregation layers of the framework architecture, i.e., aggregation in layers of roadside units and cloud. The experiment results indicate that our method can well balance the learning accuracy and stability according to the knowledge of heterogeneity in current communication networks. Comparing to other baseline approaches, the evaluation on federated datasets shows that our framework is more general and capable especially in application scenarios with low communication quality. Even when 90% of the agents are timely disconnected, the pre-trained deep learning model can still be forced to converge stably, and its accuracy can be enhanced from 68% to over 90% after convergence.
    Electricity Price Forecasting Model based on Gated Recurrent Units. (arXiv:2207.14225v1 [cs.LG])
    The participation of consumers and producers in demand response programs has increased in smart grids, which reduces investment and operation costs of power systems. Also, with the advent of renewable energy sources, the electricity market is becoming more complex and unpredictable. To effectively implement demand response programs, forecasting the future price of electricity is very crucial for producers in the electricity market. Electricity prices are very volatile and change under the influence of various factors such as temperature, wind speed, rainfall, intensity of commercial and daily activities, etc. Therefore, considering the influencing factors as dependent variables can increase the accuracy of the forecast. In this paper, a model for electricity price forecasting is presented based on Gated Recurrent Units. The electrical load consumption is considered as an input variable in this model. Noise in electricity price seriously reduces the efficiency and effectiveness of analysis. Therefore, an adaptive noise reducer is integrated into the model for noise reduction. The SAEs are then used to extract features from the de-noised electricity price. Finally, the de-noised features are fed into the GRU to train predictor. Results on real dataset shows that the proposed methodology can perform effectively in prediction of electricity price.
    OFedQIT: Communication-Efficient Online Federated Learning via Quantization and Intermittent Transmission. (arXiv:2205.06491v2 [cs.LG] UPDATED)
    Online federated learning (OFL) is a promising framework to collaboratively learn a sequence of non-linear functions (or models) from distributed streaming data incoming to multiple clients while keeping the privacy of their local data. In this framework, we first construct a vanilla method (named OFedAvg) by incorporating online gradient descent (OGD) into the de facto aggregation method (named FedAvg). Despite its optimal asymptotic performance, OFedAvg suffers from heavy communication overhead and long learning delay. To tackle these shortcomings, we propose a communication-efficient OFL algorithm (named OFedQIT) by means of a stochastic quantization and an intermittent transmission. Our major contribution is to theoretically prove that OFedQIT over $T$ time slots can achieve an optimal sublinear regret bound $\mathcal{O}(\sqrt{T})$ for any real data (including non-IID data) while significantly reducing the communication overhead. Furthermore, this optimality is still guaranteed even when a small fraction of clients (having faster processing time and high-quality communication channel) in a network are participated at once. Our analysis reveals that OFedQIT successfully addresses the drawbacks of OFedAvg while maintaining superior learning accuracy. Experiments with real datasets demonstrate the effectiveness of our algorithm on various online classification and regression tasks.
    $\mu\text{KG}$: A Library for Multi-source Knowledge Graph Embeddings and Applications. (arXiv:2207.11442v2 [cs.CL] UPDATED)
    This paper presents $\mu\text{KG}$, an open-source Python library for representation learning over knowledge graphs. $\mu\text{KG}$ supports joint representation learning over multi-source knowledge graphs (and also a single knowledge graph), multiple deep learning libraries (PyTorch and TensorFlow2), multiple embedding tasks (link prediction, entity alignment, entity typing, and multi-source link prediction), and multiple parallel computing modes (multi-process and multi-GPU computing). It currently implements 26 popular knowledge graph embedding models and supports 16 benchmark datasets. $\mu\text{KG}$ provides advanced implementations of embedding techniques with simplified pipelines of different tasks. It also comes with high-quality documentation for ease of use. $\mu\text{KG}$ is more comprehensive than existing knowledge graph embedding libraries. It is useful for a thorough comparison and analysis of various embedding models and tasks. We show that the jointly learned embeddings can greatly help knowledge-powered downstream tasks, such as multi-hop knowledge graph question answering. We will stay abreast of the latest developments in the related fields and incorporate them into $\mu\text{KG}$.
    A Probabilistic Framework for Estimating the Risk of Pedestrian-Vehicle Conflicts at Intersections. (arXiv:2207.14145v1 [cs.LG])
    Pedestrian safety has become an important research topic among various studies due to the increased number of pedestrian-involved crashes. To evaluate pedestrian safety proactively, surrogate safety measures (SSMs) have been widely used in traffic conflict-based studies as they do not require historical crashes as inputs. However, most existing SSMs were developed based on the assumption that road users would maintain constant velocity and direction. Risk estimations based on this assumption are less unstable, more likely to be exaggerated, and unable to capture the evasive maneuvers of drivers. Considering the limitations among existing SSMs, this study proposes a probabilistic framework for estimating the risk of pedestrian-vehicle conflicts at intersections. The proposed framework loosen restrictions of constant speed by predicting trajectories using a Gaussian Process Regression and accounts for the different possible driver maneuvers with a Random Forest model. Real-world LiDAR data collected at an intersection was used to evaluate the performance of the proposed framework. The newly developed framework is able to identify all pedestrian-vehicle conflicts. Compared to the Time-to-Collision, the proposed framework provides a more stable risk estimation and captures the evasive maneuvers of vehicles. Moreover, the proposed framework does not require expensive computation resources, which makes it an ideal choice for real-time proactive pedestrian safety solutions at intersections.
    Inclined Quadrotor Landing using Deep Reinforcement Learning. (arXiv:2103.09043v2 [cs.RO] UPDATED)
    Landing a quadrotor on an inclined surface is a challenging maneuver. The final state of any inclined landing trajectory is not an equilibrium, which precludes the use of most conventional control methods. We propose a deep reinforcement learning approach to design an autonomous landing controller for inclined surfaces. Using the proximal policy optimization (PPO) algorithm with sparse rewards and a tailored curriculum learning approach, an inclined landing policy can be trained in simulation in less than 90 minutes on a standard laptop. The policy then directly runs on a real Crazyflie 2.1 quadrotor and successfully performs real inclined landings in a flying arena. A single policy evaluation takes approximately 2.5\,ms, which makes it suitable for a future embedded implementation on the quadrotor.
    General Cross-Architecture Distillation of Pretrained Language Models into Matrix Embeddings. (arXiv:2109.08449v2 [cs.CL] UPDATED)
    Large pretrained language models (PreLMs) are revolutionizing natural language processing across all benchmarks. However, their sheer size is prohibitive for small laboratories or for deployment on mobile devices. Approaches like pruning and distillation reduce the model size but typically retain the same model architecture. In contrast, we explore distilling PreLMs into a different, more efficient architecture, Continual Multiplication of Words (CMOW), which embeds each word as a matrix and uses matrix multiplication to encode sequences. We extend the CMOW architecture and its CMOW/CBOW-Hybrid variant with a bidirectional component for more expressive power, per-token representations for a general (task-agnostic) distillation during pretraining, and a two-sequence encoding scheme that facilitates downstream tasks on sentence pairs, such as sentence similarity and natural language inference. Our matrix-based bidirectional CMOW/CBOW-Hybrid model is competitive to DistilBERT on question similarity and recognizing textual entailment, but uses only half of the number of parameters and is three times faster in terms of inference speed. We match or exceed the scores of ELMo for all tasks of the GLUE benchmark except for the sentiment analysis task SST-2 and the linguistic acceptability task CoLA. However, compared to previous cross-architecture distillation approaches, we demonstrate a doubling of the scores on detecting linguistic acceptability. This shows that matrix-based embeddings can be used to distill large PreLM into competitive models and motivates further research in this direction.
    Three-dimensional microstructure generation using generative adversarial neural networks in the context of continuum micromechanics. (arXiv:2206.01693v2 [cond-mat.mtrl-sci] UPDATED)
    Multiscale simulations are demanding in terms of computational resources. In the context of continuum micromechanics, the multiscale problem arises from the need of inferring macroscopic material parameters from the microscale. If the underlying microstructure is explicitly given by means of microCT-scans, convolutional neural networks can be used to learn the microstructure-property mapping, which is usually obtained from computational homogenization. The CNN approach provides a significant speedup, especially in the context of heterogeneous or functionally graded materials. Another application is uncertainty quantification, where many expansive evaluations are required. However, one bottleneck of this approach is the large number of training microstructures needed. This work closes this gap by proposing a generative adversarial network tailored towards three-dimensional microstructure generation. The lightweight algorithm is able to learn the underlying properties of the material from a single microCT-scan without the need of explicit descriptors. During prediction time, the network can produce unique three-dimensional microstructures with the same properties of the original data in a fraction of seconds and at consistently high quality.
    RHA-Net: An Encoder-Decoder Network with Residual Blocks and Hybrid Attention Mechanisms for Pavement Crack Segmentation. (arXiv:2207.14166v1 [cs.CV])
    The acquisition and evaluation of pavement surface data play an essential role in pavement condition evaluation. In this paper, an efficient and effective end-to-end network for automatic pavement crack segmentation, called RHA-Net, is proposed to improve the pavement crack segmentation accuracy. The RHA-Net is built by integrating residual blocks (ResBlocks) and hybrid attention blocks into the encoder-decoder architecture. The ResBlocks are used to improve the ability of RHA-Net to extract high-level abstract features. The hybrid attention blocks are designed to fuse both low-level features and high-level features to help the model focus on correct channels and areas of cracks, thereby improving the feature presentation ability of RHA-Net. An image data set containing 789 pavement crack images collected by a self-designed mobile robot is constructed and used for training and evaluating the proposed model. Compared with other state-of-the-art networks, the proposed model achieves better performance and the functionalities of adding residual blocks and hybrid attention mechanisms are validated in a comprehensive ablation study. Additionally, a light-weighted version of the model generated by introducing depthwise separable convolution achieves better a performance and a much faster processing speed with 1/30 of the number of U-Net parameters. The developed system can segment pavement crack in real-time on an embedded device Jetson TX2 (25 FPS). The video taken in real-time experiments is released at https://youtu.be/3XIogk0fiG4.
    Pareto-optimal clustering with the primal deterministic information bottleneck. (arXiv:2204.02489v2 [cs.LG] UPDATED)
    At the heart of both lossy compression and clustering is a trade-off between the fidelity and size of the learned representation. Our goal is to map out and study the Pareto frontier that quantifies this trade-off. We focus on the optimization of the Deterministic Information Bottleneck (DIB) objective over the space of hard clusterings. To this end, we introduce the primal DIB problem, which we show results in a much richer frontier than its previously studied Lagrangian relaxation when optimized over discrete search spaces. We present an algorithm for mapping out the Pareto frontier of the primal DIB trade-off that is also applicable to other two-objective clustering problems. We study general properties of the Pareto frontier, and we give both analytic and numerical evidence for logarithmic sparsity of the frontier in general. We provide evidence that our algorithm has polynomial scaling despite the super-exponential search space, and additionally, we propose a modification to the algorithm that can be used where sampling noise is expected to be significant. Finally, we use our algorithm to map the DIB frontier of three different tasks: compressing the English alphabet, extracting informative color classes from natural images, and compressing a group theory-inspired dataset, revealing interesting features of frontier, and demonstrating how the structure of the frontier can be used for model selection with a focus on points previously hidden by the cloak of the convex hull.
    FedVARP: Tackling the Variance Due to Partial Client Participation in Federated Learning. (arXiv:2207.14130v1 [cs.LG])
    Data-heterogeneous federated learning (FL) systems suffer from two significant sources of convergence error: 1) client drift error caused by performing multiple local optimization steps at clients, and 2) partial client participation error caused by the fact that only a small subset of the edge clients participate in every training round. We find that among these, only the former has received significant attention in the literature. To remedy this, we propose FedVARP, a novel variance reduction algorithm applied at the server that eliminates error due to partial client participation. To do so, the server simply maintains in memory the most recent update for each client and uses these as surrogate updates for the non-participating clients in every round. Further, to alleviate the memory requirement at the server, we propose a novel clustering-based variance reduction algorithm ClusterFedVARP. Unlike previously proposed methods, both FedVARP and ClusterFedVARP do not require additional computation at clients or communication of additional optimization parameters. Through extensive experiments, we show that FedVARP outperforms state-of-the-art methods, and ClusterFedVARP achieves performance comparable to FedVARP with much less memory requirements.
    Gender In Gender Out: A Closer Look at User Attributes in Context-Aware Recommendation. (arXiv:2207.14218v1 [cs.LG])
    This paper studies user attributes in light of current concerns in the recommender system community: diversity, coverage, calibration, and data minimization. In experiments with a conventional context-aware recommender system that leverages side information, we show that user attributes do not always improve recommendation. Then, we demonstrate that user attributes can negatively impact diversity and coverage. Finally, we investigate the amount of information about users that ``survives'' from the training data into the recommendation lists produced by the recommender. This information is a weak signal that could in the future be exploited for calibration or studied further as a privacy leak.
    Towards Robust Ad Hoc Teamwork Agents By Creating Diverse Training Teammates. (arXiv:2207.14138v1 [cs.LG])
    Ad hoc teamwork (AHT) is the problem of creating an agent that must collaborate with previously unseen teammates without prior coordination. Many existing AHT methods can be categorised as type-based methods, which require a set of predefined teammates for training. Designing teammate types for training is a challenging issue that determines the generalisation performance of agents when dealing with teammate types unseen during training. In this work, we propose a method to discover diverse teammate types based on maximising best response diversity metrics. We show that our proposed approach yields teammate types that require a wider range of best responses from the learner during collaboration, which potentially improves the robustness of a learner's performance in AHT compared to alternative methods.
    Shift-Curvature, SGD, and Generalization. (arXiv:2108.09507v3 [stat.ML] UPDATED)
    A longstanding debate surrounds the related hypotheses that low-curvature minima generalize better, and that SGD discourages curvature. We offer a more complete and nuanced view in support of both. First, we show that curvature harms test performance through two new mechanisms, the shift-curvature and bias-curvature, in addition to a known parameter-covariance mechanism. The three curvature-mediated contributions to test performance are reparametrization-invariant although curvature is not. The shift in the shift-curvature is the line connecting train and test local minima, which differ due to dataset sampling or distribution shift. Although the shift is unknown at training time, the shift-curvature can still be mitigated by minimizing overall curvature. Second, we derive a new, explicit SGD steady-state distribution showing that SGD optimizes an effective potential related to but different from train loss, and that SGD noise mediates a trade-off between deep versus low-curvature regions of this effective potential. Third, combining our test performance analysis with the SGD steady state shows that for small SGD noise, the shift-curvature may be the most significant of the three mechanisms. Our experiments confirm the impact of shift-curvature on test loss, and further explore the relationship between SGD noise and curvature.
    SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning. (arXiv:2206.13101v2 [cs.SD] UPDATED)
    Speech emotion recognition (SER) has many challenges, but one of the main challenges is that each framework does not have a unified standard. In this paper, we propose SpeechEQ, a framework for unifying SER tasks based on a multi-scale unified metric. This metric can be trained by Multitask Learning (MTL), which includes two emotion recognition tasks of Emotion States Category (EIS) and Emotion Intensity Scale (EIS), and two auxiliary tasks of phoneme recognition and gender recognition. For this framework, we build a Mandarin SER dataset - SpeechEQ Dataset (SEQD). We conducted experiments on the public CASIA and ESD datasets in Mandarin, which exhibit that our method outperforms baseline methods by a relatively large margin, yielding 8.0% and 6.5% improvement in accuracy respectively. Additional experiments on IEMOCAP with four emotion categories (i.e., angry, happy, sad, and neutral) also show the proposed method achieves a state-of-the-art of both weighted accuracy (WA) of 78.16% and unweighted accuracy (UA) of 77.47%.
    Playing a 2D Game Indefinitely using NEAT and Reinforcement Learning. (arXiv:2207.14140v1 [cs.LG])
    For over a decade now, robotics and the use of artificial agents have become a common thing.Testing the performance of new path finding or search space optimization algorithms has also become a challenge as they require simulation or an environment to test them.The creation of artificial environments with artificial agents is one of the methods employed to test such algorithms.Games have also become an environment to test them.The performance of the algorithms can be compared by using artificial agents that will behave according to the algorithm in the environment they are put in.The performance parameters can be, how quickly the agent is able to differentiate between rewarding actions and hostile actions.This can be tested by placing the agent in an environment with different types of hurdles and the goal of the agent is to reach the farthest by taking decisions on actions that will lead to avoiding all the obstacles.The environment chosen is a game called "Flappy Bird".The goal of the game is to make the bird fly through a set of pipes of random heights.The bird must go in between these pipes and must not hit the top, the bottom, or the pipes themselves.The actions that the bird can take are either to flap its wings or drop down with gravity.The algorithms that are enforced on the artificial agents are NeuroEvolution of Augmenting Topologies (NEAT) and Reinforcement Learning.The NEAT algorithm takes an "N" initial population of artificial agents.They follow genetic algorithms by considering an objective function, crossover, mutation, and augmenting topologies.Reinforcement learning, on the other hand, remembers the state, the action taken at that state, and the reward received for the action taken using a single agent and a Deep Q-learning Network.The performance of the NEAT algorithm improves as the initial population of the artificial agents is increased.
    Learning unseen coexisting attractors. (arXiv:2207.14133v1 [cs.LG])
    Reservoir computing is a machine learning approach that can generate a surrogate model of a dynamical system. It can learn the underlying dynamical system using fewer trainable parameters and hence smaller training data sets than competing approaches. Recently, a simpler formulation, known as next-generation reservoir computing, removes many algorithm metaparameters and identifies a well-performing traditional reservoir computer, thus simplifying training even further. Here, we study a particularly challenging problem of learning a dynamical system that has both disparate time scales and multiple co-existing dynamical states (attractors). We compare the next-generation and traditional reservoir computer using metrics quantifying the geometry of the ground-truth and forecasted attractors. For the studied four-dimensional system, the next-generation reservoir computing approach uses $\sim 1.7 \times$ less training data, requires $10^3 \times$ shorter `warm up' time, has fewer metaparameters, and has an $\sim 100\times$ higher accuracy in predicting the co-existing attractor characteristics in comparison to a traditional reservoir computer. Furthermore, we demonstrate that it predicts the basin of attraction with high accuracy. This work lends further support to the superior learning ability of this new machine learning algorithm for dynamical systems.
    Progressive Voronoi Diagram Subdivision: Towards A Holistic Geometric Framework for Exemplar-free Class-Incremental Learning. (arXiv:2207.14202v1 [cs.CV])
    Exemplar-free Class-incremental Learning (CIL) is a challenging problem because rehearsing data from previous phases is strictly prohibited, causing catastrophic forgetting of Deep Neural Networks (DNNs). In this paper, we present iVoro, a holistic framework for CIL, derived from computational geometry. We found Voronoi Diagram (VD), a classical model for space subdivision, is especially powerful for solving the CIL problem, because VD itself can be constructed favorably in an incremental manner -- the newly added sites (classes) will only affect the proximate classes, making the non-contiguous classes hardly forgettable. Further, in order to find a better set of centers for VD construction, we colligate DNN with VD using Power Diagram and show that the VD structure can be optimized by integrating local DNN models using a divide-and-conquer algorithm. Moreover, our VD construction is not restricted to the deep feature space, but is also applicable to multiple intermediate feature spaces, promoting VD to be multi-centered VD (CIVD) that efficiently captures multi-grained features from DNN. Importantly, iVoro is also capable of handling uncertainty-aware test-time Voronoi cell assignment and has exhibited high correlations between geometric uncertainty and predictive accuracy (up to ~0.9). Putting everything together, iVoro achieves up to 25.26%, 37.09%, and 33.21% improvements on CIFAR-100, TinyImageNet, and ImageNet-Subset, respectively, compared to the state-of-the-art non-exemplar CIL approaches. In conclusion, iVoro enables highly accurate, privacy-preserving, and geometrically interpretable CIL that is particularly useful when cross-phase data sharing is forbidden, e.g. in medical applications. Our code is available at https://machunwei.github.io/ivoro.
    Localized Vision-Language Matching for Open-vocabulary Object Detection. (arXiv:2205.06160v2 [cs.CV] UPDATED)
    In this work, we propose an open-vocabulary object detection method that, based on image-caption pairs, learns to detect novel object classes along with a given set of known classes. It is a two-stage training approach that first uses a location-guided image-caption matching technique to learn class labels for both novel and known classes in a weakly-supervised manner and second specializes the model for the object detection task using known class annotations. We show that a simple language model fits better than a large contextualized language model for detecting novel objects. Moreover, we introduce a consistency-regularization technique to better exploit image-caption pair information. Our method compares favorably to existing open-vocabulary detection approaches while being data-efficient. Source code is available at https://github.com/lmb-freiburg/locov .
    RIBBON: Cost-Effective and QoS-Aware Deep Learning Model Inference using a Diverse Pool of Cloud Computing Instances. (arXiv:2207.11434v2 [cs.DC] UPDATED)
    Deep learning model inference is a key service in many businesses and scientific discovery processes. This paper introduces RIBBON, a novel deep learning inference serving system that meets two competing objectives: quality-of-service (QoS) target and cost-effectiveness. The key idea behind RIBBON is to intelligently employ a diverse set of cloud computing instances (heterogeneous instances) to meet the QoS target and maximize cost savings. RIBBON devises a Bayesian Optimization-driven strategy that helps users build the optimal set of heterogeneous instances for their model inference service needs on cloud computing platforms -- and, RIBBON demonstrates its superiority over existing approaches of inference serving systems using homogeneous instance pools. RIBBON saves up to 16% of the inference service cost for different learning models including emerging deep learning recommender system models and drug-discovery enabling models.
    Topological Analysis of Ensembles of Hydrodynamic Turbulent Flows -- An Experimental Study. (arXiv:2207.14080v1 [physics.flu-dyn])
    This application paper presents a comprehensive experimental evaluation of the suitability of Topological Data Analysis (TDA) for the quantitative comparison of turbulent flows. Specifically, our study documents the usage of the persistence diagram of the maxima of flow enstrophy (an established vorticity indicator), for the topological representation of 180 ensemble members, generated by a coarse sampling of the parameter space of five numerical solvers. We document five main hypotheses reported by domain experts, describing their expectations regarding the variability of the flows generated by the distinct solver configurations. We contribute three evaluation protocols to assess the validation of the above hypotheses by two comparison measures: (i) a standard distance used in scientific imaging (the L2 norm) and (ii) an established topological distance between persistence diagrams (the L2-Wasserstein metric). Extensive experiments on the input ensemble demonstrate the superiority of the topological distance (ii) to report as close to each other flows which are expected to be similar by domain experts, due to the configuration of their vortices. Overall, the insights reported by our study bring an experimental evidence of the suitability of TDA for representing and comparing turbulent flows, thereby providing to the fluid dynamics community confidence for its usage in future work. Also, our flow data and evaluation protocols provide to the TDA community an application-approved benchmark for the evaluation and design of further topological distances.
    Classification of FIB/SEM-tomography images for highly porous multiphase materials using random forest classifiers. (arXiv:2207.14114v1 [cond-mat.mtrl-sci])
    FIB/SEM tomography represents an indispensable tool for the characterization of three-dimensional nanostructures in battery research and many other fields. However, contrast and 3D classification/reconstruction problems occur in many cases, which strongly limits the applicability of the technique especially on porous materials, like those used for electrode materials in batteries or fuel cells. Distinguishing the different components like active Li storage particles and carbon/binder materials is difficult and often prevents a reliable quantitative analysis of image data, or may even lead to wrong conclusions about structure-property relationships. In this contribution, we present a novel approach for data classification in three-dimensional image data obtained by FIB/SEM tomography and its applications to NMC battery electrode materials. We use two different image signals, namely the signal of the angled SE2 chamber detector and the Inlens detector signal, combine both signals and train a random forest, i.e. a particular machine learning algorithm. We demonstrate that this approach can overcome current limitations of existing techniques suitable for multi-phase measurements and that it allows for quantitative data reconstruction even where current state-of the art techniques fail, or demand for large training sets. This approach may yield as guideline for future research using FIB/SEM tomography.
    Modeling Item Response Theory with Stochastic Variational Inference. (arXiv:2108.11579v2 [cs.LG] UPDATED)
    Item Response Theory (IRT) is a ubiquitous model for understanding human behaviors and attitudes based on their responses to questions. Large modern datasets offer opportunities to capture more nuances in human behavior, potentially improving psychometric modeling leading to improved scientific understanding and public policy. However, while larger datasets allow for more flexible approaches, many contemporary algorithms for fitting IRT models may also have massive computational demands that forbid real-world application. To address this bottleneck, we introduce a variational Bayesian inference algorithm for IRT, and show that it is fast and scalable without sacrificing accuracy. Applying this method to five large-scale item response datasets from cognitive science and education yields higher log likelihoods and higher accuracy in imputing missing data than alternative inference algorithms. Using this new inference approach we then generalize IRT with expressive Bayesian models of responses, leveraging recent advances in deep learning to capture nonlinear item characteristic curves (ICC) with neural networks. Using an eigth-grade mathematics test from TIMSS, we show our nonlinear IRT models can capture interesting asymmetric ICCs. The algorithm implementation is open-source, and easily usable.
    Generative Modelling With Inverse Heat Dissipation. (arXiv:2206.13397v2 [cs.CV] UPDATED)
    While diffusion models have shown great success in image generation, their noise-inverting generative process does not explicitly consider the structure of images, such as their inherent multi-scale nature. Inspired by diffusion models and the desirability of coarse-to-fine modelling, we propose a new model that generates images through iteratively inverting the heat equation, a PDE that locally erases fine-scale information when run over the 2D plane of the image. In our novel methodology, the solution of the forward heat equation is interpreted as a variational approximation in a directed graphical model. We demonstrate promising image quality and point out emergent qualitative properties not seen in diffusion models, such as disentanglement of overall colour and shape in images and aspects of neural network interpretability. Spectral analysis on natural images positions our model as a type of dual to diffusion models and reveals implicit inductive biases in them.
    Exploiting and Defending Against the Approximate Linearity of Apple's NeuralHash. (arXiv:2207.14258v1 [cs.CR])
    Perceptual hashes map images with identical semantic content to the same $n$-bit hash value, while mapping semantically-different images to different hashes. These algorithms carry important applications in cybersecurity such as copyright infringement detection, content fingerprinting, and surveillance. Apple's NeuralHash is one such system that aims to detect the presence of illegal content on users' devices without compromising consumer privacy. We make the surprising discovery that NeuralHash is approximately linear, which inspires the development of novel black-box attacks that can (i) evade detection of "illegal" images, (ii) generate near-collisions, and (iii) leak information about hashed images, all without access to model parameters. These vulnerabilities pose serious threats to NeuralHash's security goals; to address them, we propose a simple fix using classical cryptographic standards.
    Reinforcement Learning with Intrinsic Affinity for Personalized Prosperity Management. (arXiv:2204.09218v2 [cs.LG] UPDATED)
    The common purpose of applying reinforcement learning (RL) to asset management is the maximization of profit. The extrinsic reward function used to learn an optimal strategy typically does not take into account any other preferences or constraints. We have developed a regularization method that ensures that strategies have global intrinsic affinities, i.e., different personalities may have preferences for certain assets which may change over time. We capitalize on these intrinsic policy affinities to make our RL model inherently interpretable. We demonstrate how RL agents can be trained to orchestrate such individual policies for particular personality profiles and still achieve high returns.
    Distinction Maximization Loss: Efficiently Improving Uncertainty Estimation and Out-of-Distribution Detection by Simply Replacing the Loss and Calibrating. (arXiv:2205.05874v3 [cs.LG] UPDATED)
    Building robust deterministic neural networks remains a challenge. On the one hand, some approaches improve out-of-distribution detection at the cost of reducing classification accuracy in some situations. On the other hand, some methods simultaneously increase classification accuracy, uncertainty estimation, and out-of-distribution detection at the expense of reducing the inference efficiency. In this paper, we propose training deterministic neural networks using our DisMax loss, which works as a drop-in replacement for the usual SoftMax loss (i.e., the combination of the linear output layer, the SoftMax activation, and the cross-entropy loss). Starting from the IsoMax+ loss, we create each logit based on the distances to all prototypes, rather than just the one associated with the correct class. We also introduce a mechanism to combine images to construct what we call fractional probability regularization. Moreover, we present a fast way to calibrate the network after training. Finally, we propose a composite score to perform out-of-distribution detection. Our experiments show that DisMax usually outperforms current approaches simultaneously in classification accuracy, uncertainty estimation, and out-of-distribution detection while maintaining deterministic neural network inference efficiency. The code to reproduce the results is available at https://github.com/dlmacedo/distinction-maximization-loss.
    On the Principles of Parsimony and Self-Consistency for the Emergence of Intelligence. (arXiv:2207.04630v3 [cs.AI] UPDATED)
    Ten years into the revival of deep networks and artificial intelligence, we propose a theoretical framework that sheds light on understanding deep networks within a bigger picture of Intelligence in general. We introduce two fundamental principles, Parsimony and Self-consistency, that address two fundamental questions regarding Intelligence: what to learn and how to learn, respectively. We believe the two principles are the cornerstones for the emergence of Intelligence, artificial or natural. While these two principles have rich classical roots, we argue that they can be stated anew in entirely measurable and computable ways. More specifically, the two principles lead to an effective and efficient computational framework, compressive closed-loop transcription, that unifies and explains the evolution of modern deep networks and many artificial intelligence practices. While we mainly use modeling of visual data as an example, we believe the two principles will unify understanding of broad families of autonomous intelligent systems and provide a framework for understanding the brain.
    CrAM: A Compression-Aware Minimizer. (arXiv:2207.14200v1 [cs.LG])
    We examine the question of whether SGD-based optimization of deep neural networks (DNNs) can be adapted to produce models which are both highly-accurate and easily-compressible. We propose a new compression-aware minimizer dubbed CrAM, which modifies the SGD training iteration in a principled way, in order to produce models whose local loss behavior is stable under compression operations such as weight pruning or quantization. Experimental results on standard image classification tasks show that CrAM produces dense models that can be more accurate than standard SGD-type baselines, but which are surprisingly stable under weight pruning: for instance, for ResNet50 on ImageNet, CrAM-trained models can lose up to 70% of their weights in one shot with only minor accuracy loss.
    Differentiable Rule Induction with Learned Relational Features. (arXiv:2201.06515v2 [stat.ML] UPDATED)
    Rule-based decision models are attractive due to their interpretability. However, existing rule induction methods often result in long and consequently less interpretable rule models. This problem can often be attributed to the lack of appropriately expressive vocabulary, i.e., relevant predicates used as literals in the decision model. Most existing rule induction algorithms presume pre-defined literals, naturally decoupling the definition of the literals from the rule learning phase. In contrast, we propose the Relational Rule Network (R2N), a neural architecture that learns literals that represent a linear relationship among numerical input features along with the rules that use them. This approach opens the door to increasing the expressiveness of induced decision models by coupling literal learning directly with rule learning in an end-to-end differentiable fashion. On benchmark tasks, we show that these learned literals are simple enough to retain interpretability, yet improve prediction accuracy and provide sets of rules that are more concise compared to state-of-the-art rule induction algorithms.
    Optimization of Artificial Neural Networks models applied to the identification of images of asteroids' resonant arguments. (arXiv:2207.14181v1 [astro-ph.EP])
    The asteroidal main belt is crossed by a web of mean-motion and secular resonances, that occur when there is a commensurability between fundamental frequencies of the asteroids and planets. Traditionally, these objects were identified by visual inspection of the time evolution of their resonant argument, which is a combination of orbital elements of the asteroid and the perturbing planet(s). Since the population of asteroids affected by these resonances is, in some cases, of the order of several thousand, this has become a taxing task for a human observer. Recent works used Convolutional Neural Networks (CNN) models to perform such task automatically. In this work, we compare the outcome of such models with those of some of the most advanced and publicly available CNN architectures, like the VGG, Inception and ResNet. The performance of such models is first tested and optimized for overfitting issues, using validation sets and a series of regularization techniques like data augmentation, dropout, and batch normalization. The three best-performing models were then used to predict the labels of larger testing databases containing thousands of images. The VGG model, with and without regularizations, proved to be the most efficient method to predict labels of large datasets. Since the Vera C. Rubin observatory is likely to discover up to four million new asteroids in the next few years, the use of these models might become quite valuable to identify populations of resonant minor bodies.
    Regret Minimization and Convergence to Equilibria in General-sum Markov Games. (arXiv:2207.14211v1 [cs.LG])
    An abundance of recent impossibility results establish that regret minimization in Markov games with adversarial opponents is both statistically and computationally intractable. Nevertheless, none of these results preclude the possibility of regret minimization under the assumption that all parties adopt the same learning procedure. In this work, we present the first (to our knowledge) algorithm for learning in general-sum Markov games that provides sublinear regret guarantees when executed by all agents. The bounds we obtain are for swap regret, and thus, along the way, imply convergence to a correlated equilibrium. Our algorithm is decentralized, computationally efficient, and does not require any communication between agents. Our key observation is that online learning via policy optimization in Markov games essentially reduces to a form of weighted regret minimization, with unknown weights determined by the path length of the agents' policy sequence. Consequently, controlling the path length leads to weighted regret objectives for which sufficiently adaptive algorithms provide sublinear regret guarantees.
    On stabilizing reinforcement learning without Lyapunov functions. (arXiv:2207.08730v2 [eess.SY] UPDATED)
    Reinforcement learning remains one of the major directions of the contemporary development of control engineering and machine learning. Nice intuition, flexible settings, ease of application are among the many perks of this methodology. From the standpoint of machine learning, the main strength of a reinforcement learning agent is its ability to "capture" (learn) the optimal behavior in the given environment. Typically, the agent is built on neural networks and it is their approximation abilities that give rise to the above belief. From the standpoint of control engineering, however, reinforcement learning has serious deficiencies. The most significant one is the lack of stability guarantee of the agent-environment closed loop. A great deal of research was and is being made towards stabilizing reinforcement learning. Speaking of stability, the celebrated Lyapunov theory is the de facto tool. It is thus no wonder that so many techniques of stabilizing reinforcement learning rely on the Lyapunov theory in one way or another. In control theory, there is an intricate connection between a stabilizing controller and a Lyapunov function. Employing such a pair seems thus quite attractive to design stabilizing reinforcement learning. However, computation of a Lyapunov function is generally a cumbersome process. In this note, we show how to construct a stabilizing reinforcement learning agent that does not employ such a function at all. We only assume that a Lyapunov function exists, which is a natural thing to do if the given system (read: environment) is stabilizable, but we do not need to compute one.
    Private Convex Optimization via Exponential Mechanism. (arXiv:2203.00263v2 [cs.DS] UPDATED)
    In this paper, we study private optimization problems for non-smooth convex functions $F(x)=\mathbb{E}_i f_i(x)$ on $\mathbb{R}^d$. We show that modifying the exponential mechanism by adding an $\ell_2^2$ regularizer to $F(x)$ and sampling from $\pi(x)\propto \exp(-k(F(x)+\mu\|x\|_2^2/2))$ recovers both the known optimal empirical risk and population loss under $(\epsilon,\delta)$-DP. Furthermore, we show how to implement this mechanism using $\widetilde{O}(n \min(d, n))$ queries to $f_i(x)$ for the DP-SCO where $n$ is the number of samples/users and $d$ is the ambient dimension. We also give a (nearly) matching lower bound $\widetilde{\Omega}(n \min(d, n))$ on the number of evaluation queries. Our results utilize the following tools that are of independent interest: (1) We prove Gaussian Differential Privacy (GDP) of the exponential mechanism if the loss function is strongly convex and the perturbation is Lipschitz. Our privacy bound is \emph{optimal} as it includes the privacy of Gaussian mechanism as a special case and is proved using the isoperimetric inequality for strongly log-concave measures. (2) We show how to sample from $\exp(-F(x)-\mu \|x\|^2_2/2)$ for $G$-Lipschitz $F$ with $\eta$ error in total variation (TV) distance using $\widetilde{O}((G^2/\mu) \log^2(d/\eta))$ unbiased queries to $F(x)$. This is the first sampler whose query complexity has \emph{polylogarithmic dependence} on both dimension $d$ and accuracy $\eta$.
    Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent. (arXiv:2206.02617v3 [cs.LG] UPDATED)
    Differentially private stochastic gradient descent (DP-SGD) is the workhorse algorithm for recent advances in private deep learning. It provides a single privacy guarantee to all datapoints in the dataset. We propose an efficient algorithm to compute privacy guarantees for individual examples when releasing models trained by DP-SGD. We use our algorithm to investigate individual privacy parameters across a number of datasets. We find that most examples enjoy stronger privacy guarantees than the worst-case bound. We further discover that the training loss and the privacy parameter of an example are well-correlated. This implies groups that are underserved in terms of model utility are simultaneously underserved in terms of privacy guarantee. For example, on CIFAR-10, the average $\epsilon$ of the class with the lowest test accuracy is 26.3% higher than that of the class with the highest accuracy. We also run membership inference attacks to show this reflects disparate empirical privacy risks.
    What Happens after SGD Reaches Zero Loss? --A Mathematical Framework. (arXiv:2110.06914v4 [cs.LG] UPDATED)
    Understanding the implicit bias of Stochastic Gradient Descent (SGD) is one of the key challenges in deep learning, especially for overparametrized models, where the local minimizers of the loss function $L$ can form a manifold. Intuitively, with a sufficiently small learning rate $\eta$, SGD tracks Gradient Descent (GD) until it gets close to such manifold, where the gradient noise prevents further convergence. In such a regime, Blanc et al. (2020) proved that SGD with label noise locally decreases a regularizer-like term, the sharpness of loss, $\mathrm{tr}[\nabla^2 L]$. The current paper gives a general framework for such analysis by adapting ideas from Katzenberger (1991). It allows in principle a complete characterization for the regularization effect of SGD around such manifold -- i.e., the "implicit bias" -- using a stochastic differential equation (SDE) describing the limiting dynamics of the parameters, which is determined jointly by the loss function and the noise covariance. This yields some new results: (1) a global analysis of the implicit bias valid for $\eta^{-2}$ steps, in contrast to the local analysis of Blanc et al. (2020) that is only valid for $\eta^{-1.6}$ steps and (2) allowing arbitrary noise covariance. As an application, we show with arbitrary large initialization, label noise SGD can always escape the kernel regime and only requires $O(\kappa\ln d)$ samples for learning an $\kappa$-sparse overparametrized linear model in $\mathbb{R}^d$ (Woodworth et al., 2020), while GD initialized in the kernel regime requires $\Omega(d)$ samples. This upper bound is minimax optimal and improves the previous $\tilde{O}(\kappa^2)$ upper bound (HaoChen et al., 2020).
    One-Nearest-Neighbor Search is All You Need for Minimax Optimal Regression and Classification. (arXiv:2202.02464v2 [math.ST] UPDATED)
    Recently, Qiao, Duan, and Cheng~(2019) proposed a distributed nearest-neighbor classification method, in which a massive dataset is split into smaller groups, each processed with a $k$-nearest-neighbor classifier, and the final class label is predicted by a majority vote among these groupwise class labels. This paper shows that the distributed algorithm with $k=1$ over a sufficiently large number of groups attains a minimax optimal error rate up to a multiplicative logarithmic factor under some regularity conditions, for both regression and classification problems. Roughly speaking, distributed 1-nearest-neighbor rules with $M$ groups has a performance comparable to standard $\Theta(M)$-nearest-neighbor rules. In the analysis, alternative rules with a refined aggregation method are proposed and shown to attain exact minimax optimal rates.
    A Generative Deep Learning Approach to Stochastic Downscaling of Precipitation Forecasts. (arXiv:2204.02028v2 [physics.ao-ph] UPDATED)
    Despite continuous improvements, precipitation forecasts are still not as accurate and reliable as those of other meteorological variables. A major contributing factor to this is that several key processes affecting precipitation distribution and intensity occur below the resolved scale of global weather models. Generative adversarial networks (GANs) have been demonstrated by the computer vision community to be successful at super-resolution problems, i.e., learning to add fine-scale structure to coarse images. Leinonen et al. (2020) previously applied a GAN to produce ensembles of reconstructed high-resolution atmospheric fields, given coarsened input data. In this paper, we demonstrate this approach can be extended to the more challenging problem of increasing the accuracy and resolution of comparatively low-resolution input from a weather forecasting model, using high-resolution radar measurements as a "ground truth". The neural network must learn to add resolution and structure whilst accounting for non-negligible forecast error. We show that GANs and VAE-GANs can match the statistical properties of state-of-the-art pointwise post-processing methods whilst creating high-resolution, spatially coherent precipitation maps. Our model compares favourably to the best existing downscaling methods in both pixel-wise and pooled CRPS scores, power spectrum information and rank histograms (used to assess calibration). We test our models and show that they perform in a range of scenarios, including heavy rainfall.
    Associative Learning Mechanism for Drug-Target Interaction Prediction. (arXiv:2205.15364v4 [q-bio.BM] UPDATED)
    As a necessary process in drug development, finding a drug compound that can selectively bind to a specific protein is highly challenging and costly. Drug-target affinity (DTA), which represents the strength of drug-target interaction (DTI), has played an important role in the DTI prediction task over the past decade. Although deep learning has been applied to DTA-related research, existing solutions ignore fundamental correlations between molecular substructures in molecular representation learning of drug compound molecules/protein targets. Moreover, traditional methods lack the interpretability of the DTA prediction process. This results in missing feature information of intermolecular interactions, thereby affecting prediction performance. Therefore, this paper proposes a DTA prediction method with interactive learning and an autoencoder mechanism. The proposed model enhances the corresponding ability to capture the feature information of a single molecular sequence by the drug/protein molecular representation learning module and supplements the information interaction between molecular sequence pairs by the interactive information learning module. The DTA value prediction module fuses the drug-target pair interaction information to output the predicted value of DTA. Additionally, this paper theoretically proves that the proposed method maximizes evidence lower bound (ELBO) for the joint distribution of the DTA prediction model, which enhances the consistency of the probability distribution between the actual value and the predicted value. The experimental results confirm mutual transformer-drug target affinity (MT-DTA) achieves better performance than other comparative methods.
    The Familiarity Hypothesis: Explaining the Behavior of Deep Open Set Methods. (arXiv:2203.02486v4 [cs.CV] UPDATED)
    In many object recognition applications, the set of possible categories is an open set, and the deployed recognition system will encounter novel objects belonging to categories unseen during training. Detecting such "novel category" objects is usually formulated as an anomaly detection problem. Anomaly detection algorithms for feature-vector data identify anomalies as outliers, but outlier detection has not worked well in deep learning. Instead, methods based on the computed logits of visual object classifiers give state-of-the-art performance. This paper proposes the Familiarity Hypothesis that these methods succeed because they are detecting the absence of familiar learned features rather than the presence of novelty. This distinction is important, because familiarity-based detection will fail in many situations where novelty is present. For example when an image contains both a novel object and a familiar one, the familiarity score will be high, so the novel object will not be noticed. The paper reviews evidence from the literature and presents additional evidence from our own experiments that provide strong support for this hypothesis. The paper concludes with a discussion of whether familiarity-based detection is an inevitable consequence of representation learning.
    On the Universality of Langevin Diffusion for Private Euclidean (Convex) Optimization. (arXiv:2204.01585v3 [cs.LG] UPDATED)
    In this paper we revisit the problem of differentially private empirical risk minimization (DP-ERM) and differentially private stochastic convex optimization (DP-SCO). We show that a well-studied continuous time algorithm from statistical physics, called Langevin diffusion (LD), simultaneously provides optimal privacy/utility trade-offs for both DP-ERM and DP-SCO, under $\epsilon$-DP, and $(\epsilon,\delta)$-DP both for convex and strongly convex loss functions. We provide new time and dimension independent uniform stability properties of LD, using with we provide the corresponding optimal excess population risk guarantees for $\epsilon$-DP. An important attribute of our DP-SCO guarantees for $\epsilon$-DP is that they match the non-private optimal bounds as $\epsilon\to\infty$. Along the way, we provide various technical tools, which can be of independent interest: i) A new R\'enyi divergence bound for LD, when run on loss functions over two neighboring data sets, ii) Excess empirical risk bounds for last-iterate LD, analogous to that of Shamir and Zhang for noisy stochastic gradient descent (SGD), and iii) A two phase excess risk analysis of LD, where the first phase is when the diffusion has not converged in any reasonable sense to a stationary distribution, and in the second phase when the diffusion has converged to a variant of Gibbs distribution. Our universality results crucially rely on the dynamics of LD. When it has converged to a stationary distribution, we obtain the optimal bounds under $\epsilon$-DP. When it is run only for a very short time $\propto 1/p$, we obtain the optimal bounds under $(\epsilon,\delta)$-DP. Here, $p$ is the dimensionality of the model space.
    Algorithmic Foundation of Deep X-Risk Optimization. (arXiv:2206.00439v4 [cs.LG] UPDATED)
    X-risk is a term introduced to represent a family of compositional measures or objectives, in which each data point is compared with a large number of items explicitly or implicitly for defining a risk function. It includes many widely used measures or objectives, e.g., AUROC, AUPRC, partial AUROC, NDCG, MAP, top-$K$ NDCG, top-$K$ MAP, listwise losses, p-norm push, top push, precision/recall at top $K$ positions, precision at a certain recall level, contrastive objectives, etc. While these non-decomposable measures/objectives and their optimization algorithms have been studied in the literature of machine learning, computer vision, information retrieval, and etc, optimizing these measures/objectives has encountered some unique challenges for deep learning. In this paper, we survey recent rigorous efforts for deep X-risk optimization (DXO) by focusing on its algorithmic foundation. We introduce a class of techniques for optimizing X-risks for deep learning. We formulate DXO into three special families of non-convex optimization problems belonging to non-convex min-max optimization, non-convex compositional optimization, and non-convex bilevel optimization, respectively. For each family of problems, we present some strong baseline algorithms and their complexities, which will motivate further research for improving the existing results. Discussions about the presented results and future studies are given at the end. Efficient algorithms for optimizing a variety of X-risks are implemented in the LibAUC library at www.libauc.org.
    Instance-wise or Class-wise? A Tale of Neighbor Shapley for Concept-based Explanation. (arXiv:2109.01369v4 [cs.LG] UPDATED)
    Deep neural networks have demonstrated remarkable performance in many data-driven and prediction-oriented applications, and sometimes even perform better than humans. However, their most significant drawback is the lack of interpretability, which makes them less attractive in many real-world applications. When relating to the moral problem or the environmental factors that are uncertain such as crime judgment, financial analysis, and medical diagnosis, it is essential to mine the evidence for the model's prediction (interpret model knowledge) to convince humans. Thus, investigating how to interpret model knowledge is of paramount importance for both academic research and real applications.
    Execute Order 66: Targeted Data Poisoning for Reinforcement Learning. (arXiv:2201.00762v2 [cs.LG] UPDATED)
    Data poisoning for reinforcement learning has historically focused on general performance degradation, and targeted attacks have been successful via perturbations that involve control of the victim's policy and rewards. We introduce an insidious poisoning attack for reinforcement learning which causes agent misbehavior only at specific target states - all while minimally modifying a small fraction of training observations without assuming any control over policy or reward. We accomplish this by adapting a recent technique, gradient alignment, to reinforcement learning. We test our method and demonstrate success in two Atari games of varying difficulty.
    Hardness of Agnostically Learning Halfspaces from Worst-Case Lattice Problems. (arXiv:2207.14030v1 [cs.LG])
    We show hardness of improperly learning halfspaces in the agnostic model based on worst-case lattice problems, e.g., approximating shortest vectors within polynomial factors. In particular, we show that under this assumption there is no efficient algorithm that outputs any binary hypothesis, not necessarily a halfspace, achieving misclassfication error better than $\frac 1 2 - \epsilon$ even if the optimal misclassification error is as small is as small as $\delta$. Here, $\epsilon$ can be smaller than the inverse of any polynomial in the dimension and $\delta$ as small as $\mathrm{exp}\left(-\Omega\left(\log^{1-c}(d)\right)\right)$, where $0 < c < 1$ is an arbitrary constant and $d$ is the dimension. Previous hardness results [Daniely16] of this problem were based on average-case complexity assumptions, specifically, variants of Feige's random 3SAT hypothesis. Our work gives the first hardness for this problem based on a worst-case complexity assumption. It is inspired by a sequence of recent works showing hardness of learning well-separated Gaussian mixtures based on worst-case lattice problems.
    Depth Field Networks for Generalizable Multi-view Scene Representation. (arXiv:2207.14287v1 [cs.CV])
    Modern 3D computer vision leverages learning to boost geometric reasoning, mapping image data to classical structures such as cost volumes or epipolar constraints to improve matching. These architectures are specialized according to the particular problem, and thus require significant task-specific tuning, often leading to poor domain generalization performance. Recently, generalist Transformer architectures have achieved impressive results in tasks such as optical flow and depth estimation by encoding geometric priors as inputs rather than as enforced constraints. In this paper, we extend this idea and propose to learn an implicit, multi-view consistent scene representation, introducing a series of 3D data augmentation techniques as a geometric inductive prior to increase view diversity. We also show that introducing view synthesis as an auxiliary task further improves depth estimation. Our Depth Field Networks (DeFiNe) achieve state-of-the-art results in stereo and video depth estimation without explicit geometric constraints, and improve on zero-shot domain generalization by a wide margin.
    An iterative clustering algorithm for the Contextual Stochastic Block Model with optimality guarantees. (arXiv:2112.10467v2 [stat.ML] UPDATED)
    Real-world networks often come with side information that can help to improve the performance of network analysis tasks such as clustering. Despite a large number of empirical and theoretical studies conducted on network clustering methods during the past decade, the added value of side information and the methods used to incorporate it optimally in clustering algorithms are relatively less understood. We propose a new iterative algorithm to cluster networks with side information for nodes (in the form of covariates) and show that our algorithm is optimal under the Contextual Symmetric Stochastic Block Model. Our algorithm can be applied to general Contextual Stochastic Block Models and avoids hyperparameter tuning in contrast to previously proposed methods. We confirm our theoretical results on synthetic data experiments where our algorithm significantly outperforms other methods, and show that it can also be applied to signed graphs. Finally we demonstrate the practical interest of our method on real data.
    Federated Learning for IoUT: Concepts, Applications, Challenges and Opportunities. (arXiv:2207.13976v1 [cs.LG])
    Internet of Underwater Things (IoUT) have gained rapid momentum over the past decade with applications spanning from environmental monitoring and exploration, defence applications, etc. The traditional IoUT systems use machine learning (ML) approaches which cater the needs of reliability, efficiency and timeliness. However, an extensive review of the various studies conducted highlight the significance of data privacy and security in IoUT frameworks as a predominant factor in achieving desired outcomes in mission critical applications. Federated learning (FL) is a secured, decentralized framework which is a recent development in machine learning, that will help in fulfilling the challenges faced by conventional ML approaches in IoUT. This paper presents an overview of the various applications of FL in IoUT, its challenges, open issues and indicates direction of future research prospects.
    Learning to Adapt Classifier for Imbalanced Semi-supervised Learning. (arXiv:2207.13856v1 [cs.LG])
    Pseudo-labeling has proven to be a promising semi-supervised learning (SSL) paradigm. Existing pseudo-labeling methods commonly assume that the class distributions of training data are balanced. However, such an assumption is far from realistic scenarios and existing pseudo-labeling methods suffer from severe performance degeneration in the context of class-imbalance. In this work, we investigate pseudo-labeling under imbalanced semi-supervised setups. The core idea is to automatically assimilate the training bias arising from class-imbalance, using a bias adaptive classifier that equips the original linear classifier with a bias attractor. The bias attractor is designed to be a light-weight residual network for adapting to the training bias. Specifically, the bias attractor is learned through a bi-level learning framework such that the bias adaptive classifier is able to fit imbalanced training data, while the linear classifier can give unbiased label prediction for each class. We conduct extensive experiments under various imbalanced semi-supervised setups, and the results demonstrate that our method can be applicable to different pseudo-labeling models and superior to the prior arts.
    Branch Ranking for Efficient Mixed-Integer Programming via Offline Ranking-based Policy Learning. (arXiv:2207.13701v1 [cs.LG])
    Deriving a good variable selection strategy in branch-and-bound is essential for the efficiency of modern mixed-integer programming (MIP) solvers. With MIP branching data collected during the previous solution process, learning to branch methods have recently become superior over heuristics. As branch-and-bound is naturally a sequential decision making task, one should learn to optimize the utility of the whole MIP solving process instead of being myopic on each step. In this work, we formulate learning to branch as an offline reinforcement learning (RL) problem, and propose a long-sighted hybrid search scheme to construct the offline MIP dataset, which values the long-term utilities of branching decisions. During the policy training phase, we deploy a ranking-based reward assignment scheme to distinguish the promising samples from the long-term or short-term view, and train the branching model named Branch Ranking via offline policy learning. Experiments on synthetic MIP benchmarks and real-world tasks demonstrate that Branch Rankink is more efficient and robust, and can better generalize to large scales of MIP instances compared to the widely used heuristics and state-of-the-art learning-based branching models.
    PHEMEPlus: Enriching Social Media Rumour Verification with External Evidence. (arXiv:2207.13970v1 [cs.CL])
    Work on social media rumour verification utilises signals from posts, their propagation and users involved. Other lines of work target identifying and fact-checking claims based on information from Wikipedia, or trustworthy news articles without considering social media context. However works combining the information from social media with external evidence from the wider web are lacking. To facilitate research in this direction, we release a novel dataset, PHEMEPlus, an extension of the PHEME benchmark, which contains social media conversations as well as relevant external evidence for each rumour. We demonstrate the effectiveness of incorporating such evidence in improving rumour verification models. Additionally, as part of the evidence collection, we evaluate various ways of query formulation to identify the most effective method.
    Unsupervised Frequent Pattern Mining for CEP. (arXiv:2207.14017v1 [cs.LG])
    Complex Event Processing (CEP) is a set of methods that allow efficient knowledge extraction from massive data streams using complex and highly descriptive patterns. Numerous applications, such as online finance, healthcare monitoring and fraud detection use CEP technologies to capture critical alerts, potential threats, or vital notifications in real time. As of today, in many fields, patterns are manually defined by human experts. However, desired patterns often contain convoluted relations that are difficult for humans to detect, and human expertise is scarce in many domains. We present REDEEMER (REinforcement baseD cEp pattErn MinER), a novel reinforcement and active learning approach aimed at mining CEP patterns that allow expansion of the knowledge extracted while reducing the human effort required. This approach includes a novel policy gradient method for vast multivariate spaces and a new way to combine reinforcement and active learning for CEP rule learning while minimizing the number of labels needed for training. REDEEMER aims to enable CEP integration in domains that could not utilize it before. To the best of our knowledge, REDEEMER is the first system that suggests new CEP rules that were not observed beforehand, and is the first method aimed for increasing pattern knowledge in fields where experts do not possess sufficient information required for CEP tools. Our experiments on diverse data-sets demonstrate that REDEEMER is able to extend pattern knowledge while outperforming several state-of-the-art reinforcement learning methods for pattern mining.
    Automated Classification of Nanoparticles with Various Ultrastructures and Sizes. (arXiv:2207.14023v1 [cond-mat.mtrl-sci])
    Accurately measuring the size, morphology, and structure of nanoparticles is very important, because they are strongly dependent on their properties for many applications. In this paper, we present a deep-learning based method for nanoparticle measurement and classification trained from a small data set of scanning transmission electron microscopy images. Our approach is comprised of two stages: localization, i.e., detection of nanoparticles, and classification, i.e., categorization of their ultrastructure. For each stage, we optimize the segmentation and classification by analysis of the different state-of-the-art neural networks. We show how the generation of synthetic images, either using image processing or using various image generation neural networks, can be used to improve the results in both stages. Finally, the application of the algorithm to bimetallic nanoparticles demonstrates the automated data collection of size distributions including classification of complex ultrastructures. The developed method can be easily transferred to other material systems and nanoparticle structures.
    Differentially Private Learning of Hawkes Processes. (arXiv:2207.13741v1 [stat.ML])
    Hawkes processes have recently gained increasing attention from the machine learning community for their versatility in modeling event sequence data. While they have a rich history going back decades, some of their properties, such as sample complexity for learning the parameters and releasing differentially private versions, are yet to be thoroughly analyzed. In this work, we study standard Hawkes processes with background intensity $\mu$ and excitation function $\alpha e^{-\beta t}$. We provide both non-private and differentially private estimators of $\mu$ and $\alpha$, and obtain sample complexity results in both settings to quantify the cost of privacy. Our analysis exploits the strong mixing property of Hawkes processes and classical central limit theorem results for weakly dependent random variables. We validate our theoretical findings on both synthetic and real datasets.
    HelixFold-Single: MSA-free Protein Structure Prediction by Using Protein Language Model as an Alternative. (arXiv:2207.13921v1 [q-bio.BM])
    AI-based protein structure prediction pipelines, such as AlphaFold2, have achieved near-experimental accuracy. These advanced pipelines mainly rely on Multiple Sequence Alignments (MSAs) and templates as inputs to learn the co-evolution information from the homologous sequences. Nonetheless, searching MSAs and templates from protein databases is time-consuming, usually taking dozens of minutes. Consequently, we attempt to explore the limits of fast protein structure prediction by using only primary sequences of proteins. HelixFold-Single is proposed to combine a large-scale protein language model with the superior geometric learning capability of AlphaFold2. Our proposed method, HelixFold-Single, first pre-trains a large-scale protein language model (PLM) with thousands of millions of primary sequences utilizing the self-supervised learning paradigm, which will be used as an alternative to MSAs and templates for learning the co-evolution information. Then, by combining the pre-trained PLM and the essential components of AlphaFold2, we obtain an end-to-end differentiable model to predict the 3D coordinates of atoms from only the primary sequence. HelixFold-Single is validated in datasets CASP14 and CAMEO, achieving competitive accuracy with the MSA-based methods on the targets with large homologous families. Furthermore, HelixFold-Single consumes much less time than the mainstream pipelines for protein structure prediction, demonstrating its potential in tasks requiring many predictions. The code of HelixFold-Single is available at https://github.com/PaddlePaddle/PaddleHelix/tree/dev/apps/protein_folding/helixfold-single, and we also provide stable web services on https://paddlehelix.baidu.com/app/drug/protein-single/forecast.
    Safety-Enhanced Autonomous Driving Using Interpretable Sensor Fusion Transformer. (arXiv:2207.14024v1 [cs.CV])
    Large-scale deployment of autonomous vehicles has been continually delayed due to safety concerns. On the one hand, comprehensive scene understanding is indispensable, a lack of which would result in vulnerability to rare but complex traffic situations, such as the sudden emergence of unknown objects. However, reasoning from a global context requires access to sensors of multiple types and adequate fusion of multi-modal sensor signals, which is difficult to achieve. On the other hand, the lack of interpretability in learning models also hampers the safety with unverifiable failure causes. In this paper, we propose a safety-enhanced autonomous driving framework, named Interpretable Sensor Fusion Transformer(InterFuser), to fully process and fuse information from multi-modal multi-view sensors for achieving comprehensive scene understanding and adversarial event detection. Besides, intermediate interpretable features are generated from our framework, which provide more semantics and are exploited to better constrain actions to be within the safe sets. We conducted extensive experiments on CARLA benchmarks, where our model outperforms prior methods, ranking the first on the public CARLA Leaderboard.
    Multi-Step Deductive Reasoning Over Natural Language: An Empirical Study on Out-of-Distribution Generalisation. (arXiv:2207.14000v1 [cs.CL])
    Combining deep learning with symbolic logic reasoning aims to capitalize on the success of both fields and is drawing increasing attention. Inspired by DeepLogic, an end-to-end model trained to perform inference on logic programs, we introduce IMA-GloVe-GA, an iterative neural inference network for multi-step reasoning expressed in natural language. In our model, reasoning is performed using an iterative memory neural network based on RNN with a gate attention mechanism. We evaluate IMA-GloVe-GA on three datasets: PARARULES, CONCEPTRULES V1 and CONCEPTRULES V2. Experimental results show DeepLogic with gate attention can achieve higher test accuracy than DeepLogic and other RNN baseline models. Our model achieves better out-of-distribution generalisation than RoBERTa-Large when the rules have been shuffled. Furthermore, to address the issue of unbalanced distribution of reasoning depths in the current multi-step reasoning datasets, we develop PARARULE-Plus, a large dataset with more examples that require deeper reasoning steps. Experimental results show that the addition of PARARULE-Plus can increase the model's performance on examples requiring deeper reasoning depths. The source code and data are available at https://github.com/Strong-AI-Lab/Multi-Step-Deductive-Reasoning-Over-Natural-Language.
    Raising Student Completion Rates with Adaptive Curriculum and Contextual Bandits. (arXiv:2207.14003v1 [cs.CL])
    We present an adaptive learning Intelligent Tutoring System, which uses model-based reinforcement learning in the form of contextual bandits to assign learning activities to students. The model is trained on the trajectories of thousands of students in order to maximize their exercise completion rates and continues to learn online, automatically adjusting itself to new activities. A randomized controlled trial with students shows that our model leads to superior completion rates and significantly improved student engagement when compared to other approaches. Our approach is fully-automated unlocking new opportunities for learning experience personalization.
    Physical Systems Modeled Without Physical Laws. (arXiv:2207.13702v1 [cs.LG])
    Physics-based simulations typically operate with a combination of complex differentiable equations and many scientific and geometric inputs. Our work involves gathering data from those simulations and seeing how well tree-based machine learning methods can emulate desired outputs without "knowing" the complex backing involved in the simulations. The selected physics-based simulations included Navier-Stokes, stress analysis, and electromagnetic field lines to benchmark performance as numerical and statistical algorithms. We specifically focus on predicting specific spatial-temporal data between two simulation outputs and increasing spatial resolution to generalize the physics predictions to finer test grids without the computational costs of repeating the numerical calculation.
    Real Image Restoration via Structure-preserving Complementarity Attention. (arXiv:2207.13879v1 [eess.IV])
    Since convolutional neural networks perform well in learning generalizable image priors from large-scale data, these models have been widely used in image denoising tasks. However, the computational complexity increases dramatically as well on complex model. In this paper, We propose a novel lightweight Complementary Attention Module, which includes a density module and a sparse module, which can cooperatively mine dense and sparse features for feature complementary learning to build an efficient lightweight architecture. Moreover, to reduce the loss of details caused by denoising, this paper constructs a gradient-based structure-preserving branch. We utilize gradient-based branches to obtain additional structural priors for denoising, and make the model pay more attention to image geometric details through gradient loss optimization.Based on the above, we propose an efficiently Unet structured network with dual branch, the visual results show that can effectively preserve the structural details of the original image, we evaluate benchmarks including SIDD and DND, where SCANet achieves state-of-the-art performance in PSNR and SSIM while significantly reducing computational cost.
    One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares. (arXiv:2207.13853v1 [cs.LG])
    While deep neural networks are capable of achieving state-of-the-art performance in various domains, their training typically requires iterating for many passes over the dataset. However, due to computational and memory constraints and potential privacy concerns, storing and accessing all the data is impractical in many real-world scenarios where the data arrives in a stream. In this paper, we investigate the problem of one-pass learning, in which a model is trained on sequentially arriving data without retraining on previous datapoints. Motivated by the increasing use of overparameterized models, we develop Orthogonal Recursive Fitting (ORFit), an algorithm for one-pass learning which seeks to perfectly fit every new datapoint while changing the parameters in a direction that causes the least change to the predictions on previous datapoints. By doing so, we bridge two seemingly distinct algorithms in adaptive filtering and machine learning, namely the recursive least-squares (RLS) algorithm and orthogonal gradient descent (OGD). Our algorithm uses the memory efficiently by exploiting the structure of the streaming data via an incremental principal component analysis (IPCA). Further, we show that, for overparameterized linear models, the parameter vector obtained by our algorithm is what stochastic gradient descent (SGD) would converge to in the standard multi-pass setting. Finally, we generalize the results to the nonlinear setting for highly overparameterized models, relevant for deep learning. Our experiments show the effectiveness of the proposed method compared to the baselines.
    Structural Similarity for Improved Transfer in Reinforcement Learning. (arXiv:2207.13813v1 [cs.LG])
    Transfer learning is an increasingly common approach for developing performant RL agents. However, it is not well understood how to define the relationship between the source and target tasks, and how this relationship contributes to successful transfer. We present an algorithm called Structural Similarity for Two MDPS, or SS2, that calculates a state similarity measure for states in two finite MDPs based on previously developed bisimulation metrics, and show that the measure satisfies properties of a distance metric. Then, through empirical results with GridWorld navigation tasks, we provide evidence that the distance measure can be used to improve transfer performance for Q-Learning agents over previous implementations.
    Remote Medication Status Prediction for Individuals with Parkinson's Disease using Time-series Data from Smartphones. (arXiv:2207.13700v1 [cs.LG])
    Medication for neurological diseases such as the Parkinson's disease usually happens remotely at home, away from hospitals. Such out-of-lab environments pose challenges in collecting timely and accurate health status data using the limited professional care devices for health condition analysis, medication adherence measurement and future dose or treatment planning. Individual differences in behavioral signals collected from wearable sensors also lead to difficulties in adopting current general machine learning analysis pipelines. To address these challenges, we present a method for predicting medication status of Parkinson's disease patients using the public mPower dataset, which contains 62,182 remote multi-modal test records collected on smartphones from 487 patients. The proposed method shows promising results in predicting three medication status objectively: Before Medication (AUC=0.95), After Medication (AUC=0.958), and Another Time (AUC=0.976) by examining patient-wise historical records with the attention weights learned through a Transformer model. We believe our method provides an innovative way for personalized remote health sensing in a timely and objective fashion which could benefit a broad range of similar applications.
    Predicting the Output Structure of Sparse Matrix Multiplication with Sampled Compression Ratio. (arXiv:2207.13848v1 [cs.DC])
    Sparse general matrix multiplication (SpGEMM) is a fundamental building block in numerous scientific applications. One critical task of SpGEMM is to compute or predict the structure of the output matrix (i.e., the number of nonzero elements per output row) for efficient memory allocation and load balance, which impact the overall performance of SpGEMM. Existing work either precisely calculates the output structure or adopts upper-bound or sampling-based methods to predict the output structure. However, these methods either take much execution time or are not accurate enough. In this paper, we propose a novel sampling-based method with better accuracy and low costs compared to the existing sampling-based method. The proposed method first predicts the compression ratio of SpGEMM by leveraging the number of intermediate products (denoted as FLOP) and the number of nonzero elements (denoted as NNZ) of the same sampled result matrix. And then, the predicted output structure is obtained by dividing the FLOP per output row by the predicted compression ratio. We also propose a reference design of the existing sampling-based method with optimized computing overheads to demonstrate the better accuracy of the proposed method. We construct 625 test cases with various matrix dimensions and sparse structures to evaluate the prediction accuracy. Experimental results show that the absolute relative errors of the proposed method and the reference design are 1.56\% and 8.12\%, respectively, on average, and 25\% and 156\%, respectively, in the worst case.
    Extraction of Vascular Wall in Carotid Ultrasound via a Novel Boundary-Delineation Network. (arXiv:2207.13868v1 [eess.IV])
    Ultrasound imaging plays an important role in the diagnosis of vascular lesions. Accurate segmentation of the vascular wall is important for the prevention, diagnosis and treatment of vascular diseases. However, existing methods have inaccurate localization of the vascular wall boundary. Segmentation errors occur in discontinuous vascular wall boundaries and dark boundaries. To overcome these problems, we propose a new boundary-delineation network (BDNet). We use the boundary refinement module to re-delineate the boundary of the vascular wall to obtain the correct boundary location. We designed the feature extraction module to extract and fuse multi-scale features and different receptive field features to solve the problem of dark boundaries and discontinuous boundaries. We use a new loss function to optimize the model. The interference of class imbalance on model optimization is prevented to obtain finer and smoother boundaries. Finally, to facilitate clinical applications, we design the model to be lightweight. Experimental results show that our model achieves the best segmentation results and significantly reduces memory consumption compared to existing models for the dataset.
    Learning to Assess Danger from Movies for Cooperative Escape Planning in Hazardous Environments. (arXiv:2207.13791v1 [cs.RO])
    There has been a plethora of work towards improving robot perception and navigation, yet their application in hazardous environments, like during a fire or an earthquake, is still at a nascent stage. We hypothesize two key challenges here: first, it is difficult to replicate such scenarios in the real world, which is necessary for training and testing purposes. Second, current systems are not fully able to take advantage of the rich multi-modal data available in such hazardous environments. To address the first challenge, we propose to harness the enormous amount of visual content available in the form of movies and TV shows, and develop a dataset that can represent hazardous environments encountered in the real world. The data is annotated with high-level danger ratings for realistic disaster images, and corresponding keywords are provided that summarize the content of the scene. In response to the second challenge, we propose a multi-modal danger estimation pipeline for collaborative human-robot escape scenarios. Our Bayesian framework improves danger estimation by fusing information from robot's camera sensor and language inputs from the human. Furthermore, we augment the estimation module with a risk-aware planner that helps in identifying safer paths out of the dangerous environment. Through extensive simulations, we exhibit the advantages of our multi-modal perception framework that gets translated into tangible benefits such as higher success rate in a collaborative human-robot mission.
    Multi-Objective Provisioning of Network Slices using Deep Reinforcement Learning. (arXiv:2207.13821v1 [cs.NI])
    Network Slicing (NS) is crucial for efficiently enabling divergent network applications in next generation networks. Nonetheless, the complex Quality of Service (QoS) requirements and diverse heterogeneity in network services entails high computational time for Network Slice Provisioning (NSP) optimization. The legacy optimization methods are challenging to meet the low latency and high reliability of network applications. To this end, we model the real-time NSP as an Online Network Slice Provisioning (ONSP) problem. Specifically, we formulate the ONSP problem as an online Multi-Objective Integer Programming Optimization (MOIPO) problem. Then, we approximate the solution of the MOIPO problem by applying the Proximal Policy Optimization (PPO) method to the traffic demand prediction. Our simulation results show the effectiveness of the proposed method compared to the state-of-the-art MOIPO solvers with a lower SLA violation rate and network operation cost.
    Calibrate: Interactive Analysis of Probabilistic Model Output. (arXiv:2207.13770v1 [cs.HC])
    Analyzing classification model performance is a crucial task for machine learning practitioners. While practitioners often use count-based metrics derived from confusion matrices, like accuracy, many applications, such as weather prediction, sports betting, or patient risk prediction, rely on a classifier's predicted probabilities rather than predicted labels. In these instances, practitioners are concerned with producing a calibrated model, that is, one which outputs probabilities that reflect those of the true distribution. Model calibration is often analyzed visually, through static reliability diagrams, however, the traditional calibration visualization may suffer from a variety of drawbacks due to the strong aggregations it necessitates. Furthermore, count-based approaches are unable to sufficiently analyze model calibration. We present Calibrate, an interactive reliability diagram that addresses the aforementioned issues. Calibrate constructs a reliability diagram that is resistant to drawbacks in traditional approaches, and allows for interactive subgroup analysis and instance-level inspection. We demonstrate the utility of Calibrate through use cases on both real-world and synthetic data. We further validate Calibrate by presenting the results of a think-aloud experiment with data scientists who routinely analyze model calibration.  ( 2 min )
    Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers. (arXiv:2207.13820v1 [cs.CV])
    Transformer encoder architectures have recently achieved state-of-the-art results on monocular 3D human mesh reconstruction, but they require a substantial number of parameters and expensive computations. Due to the large memory overhead and slow inference speed, it is difficult to deploy such models for practical use. In this paper, we propose a novel transformer encoder-decoder architecture for 3D human mesh reconstruction from a single image, called FastMETRO. We identify the performance bottleneck in the encoder-based transformers is caused by the token design which introduces high complexity interactions among input tokens. We disentangle the interactions via an encoder-decoder architecture, which allows our model to demand much fewer parameters and shorter inference time. In addition, we impose the prior knowledge of human body's morphological relationship via attention masking and mesh upsampling operations, which leads to faster convergence with higher accuracy. Our FastMETRO improves the Pareto-front of accuracy and efficiency, and clearly outperforms image-based methods on Human3.6M and 3DPW. Furthermore, we validate its generalizability on FreiHAND.  ( 2 min )
    Deep Learning-Based Acoustic Mosquito Detection in Noisy Conditions Using Trainable Kernels and Augmentations. (arXiv:2207.13843v1 [cs.SD])
    In this paper, we demonstrate a unique recipe to enhance the effectiveness of audio machine learning approaches by fusing pre-processing techniques into a deep learning model. Our solution accelerates training and inference performance by optimizing hyper-parameters through training instead of costly random searches to build a reliable mosquito detector from audio signals. The experiments and the results presented here are part of the MOS C submission of the ACM 2022 challenge. Our results outperform the published baseline by 212% on the unpublished test set. We believe that this is one of the best real-world examples of building a robust bio-acoustic system that provides reliable mosquito detection in noisy conditions.  ( 2 min )
    Label-Only Membership Inference Attack against Node-Level Graph Neural Networks. (arXiv:2207.13766v1 [cs.CR])
    Graph Neural Networks (GNNs), inspired by Convolutional Neural Networks (CNNs), aggregate the message of nodes' neighbors and structure information to acquire expressive representations of nodes for node classification, graph classification, and link prediction. Previous studies have indicated that GNNs are vulnerable to Membership Inference Attacks (MIAs), which infer whether a node is in the training data of GNNs and leak the node's private information, like the patient's disease history. The implementation of previous MIAs takes advantage of the models' probability output, which is infeasible if GNNs only provide the prediction label (label-only) for the input. In this paper, we propose a label-only MIA against GNNs for node classification with the help of GNNs' flexible prediction mechanism, e.g., obtaining the prediction label of one node even when neighbors' information is unavailable. Our attacking method achieves around 60\% accuracy, precision, and Area Under the Curve (AUC) for most datasets and GNN models, some of which are competitive or even better than state-of-the-art probability-based MIAs implemented under our environment and settings. Additionally, we analyze the influence of the sampling method, model selection approach, and overfitting level on the attack performance of our label-only MIA. Both of those factors have an impact on the attack performance. Then, we consider scenarios where assumptions about the adversary's additional dataset (shadow dataset) and extra information about the target model are relaxed. Even in those scenarios, our label-only MIA achieves a better attack performance in most cases. Finally, we explore the effectiveness of possible defenses, including Dropout, Regularization, Normalization, and Jumping knowledge. None of those four defenses prevent our attack completely.  ( 3 min )
    A Novel Data Augmentation Technique for Out-of-Distribution Sample Detection using Compounded Corruptions. (arXiv:2207.13916v1 [cs.CV])
    Modern deep neural network models are known to erroneously classify out-of-distribution (OOD) test data into one of the in-distribution (ID) training classes with high confidence. This can have disastrous consequences for safety-critical applications. A popular mitigation strategy is to train a separate classifier that can detect such OOD samples at the test time. In most practical settings OOD examples are not known at the train time, and hence a key question is: how to augment the ID data with synthetic OOD samples for training such an OOD detector? In this paper, we propose a novel Compounded Corruption technique for the OOD data augmentation termed CnC. One of the major advantages of CnC is that it does not require any hold-out data apart from the training set. Further, unlike current state-of-the-art (SOTA) techniques, CnC does not require backpropagation or ensembling at the test time, making our method much faster at inference. Our extensive comparison with 20 methods from the major conferences in last 4 years show that a model trained using CnC based data augmentation, significantly outperforms SOTA, both in terms of OOD detection accuracy as well as inference time. We include a detailed post-hoc analysis to investigate the reasons for the success of our method and identify higher relative entropy and diversity of CnC samples as probable causes. We also provide theoretical insights via a piece-wise decomposition analysis on a two-dimensional dataset to reveal (visually and quantitatively) that our approach leads to a tighter boundary around ID classes, leading to better detection of OOD samples. Source code link: https://github.com/cnc-ood  ( 3 min )
    Modelling non-reinforced preferences using selective attention. (arXiv:2207.13699v1 [cs.LG])
    How can artificial agents learn non-reinforced preferences to continuously adapt their behaviour to a changing environment? We decompose this question into two challenges: ($i$) encoding diverse memories and ($ii$) selectively attending to these for preference formation. Our proposed \emph{no}n-\emph{re}inforced preference learning mechanism using selective attention, \textsc{Nore}, addresses both by leveraging the agent's world model to collect a diverse set of experiences which are interleaved with imagined roll-outs to encode memories. These memories are selectively attended to, using attention and gating blocks, to update agent's preferences. We validate \textsc{Nore} in a modified OpenAI Gym FrozenLake environment (without any external signal) with and without volatility under a fixed model of the environment -- and compare its behaviour to \textsc{Pepper}, a Hebbian preference learning mechanism. We demonstrate that \textsc{Nore} provides a straightforward framework to induce exploratory preferences in the absence of external signals.  ( 2 min )
    Physical Pooling Functions in Graph Neural Networks for Molecular Property Prediction. (arXiv:2207.13779v1 [cs.LG])
    Graph neural networks (GNNs) are emerging in chemical engineering for the end-to-end learning of physicochemical properties based on molecular graphs. A key element of GNNs is the pooling function which combines atom feature vectors into molecular fingerprints. Most previous works use a standard pooling function to predict a variety of properties. However, unsuitable pooling functions can lead to unphysical GNNs that poorly generalize. We compare and select meaningful GNN pooling methods based on physical knowledge about the learned properties. The impact of physical pooling functions is demonstrated with molecular properties calculated from quantum mechanical computations. We also compare our results to the recent set2set pooling approach. We recommend using sum pooling for the prediction of properties that depend on molecular size and compare pooling functions for properties that are molecular size-independent. Overall, we show that the use of physical pooling functions significantly enhances generalization.  ( 2 min )
    Diversity Boosted Learning for Domain Generalization with Large Number of Domains. (arXiv:2207.13865v1 [cs.LG])
    Machine learning algorithms minimizing the average training loss usually suffer from poor generalization performance due to the greedy exploitation of correlations among the training data, which are not stable under distributional shifts. It inspires various works for domain generalization (DG), where a series of methods, such as Causal Matching and FISH, work by pairwise domain operations. They would need $O(n^2)$ pairwise domain operations with $n$ domains, where each one is often highly expensive. Moreover, while a common objective in the DG literature is to learn invariant representations against domain-induced spurious correlations, we highlight the importance of mitigating spurious correlations caused by objects. Based on the observation that diversity helps mitigate spurious correlations, we propose a Diversity boosted twO-level saMplIng framework (DOMI) utilizing Determinantal Point Processes (DPPs) to efficiently sample the most informative ones among large number of domains. We show that DOMI helps train robust models against spurious correlations from both domain-side and object-side, substantially enhancing the performance of the backbone DG algorithms on rotated MNIST, rotated Fashion MNIST, and iwildcam datasets.  ( 2 min )
    SoundChoice: Grapheme-to-Phoneme Models with Semantic Disambiguation. (arXiv:2207.13703v1 [cs.SD])
    End-to-end speech synthesis models directly convert the input characters into an audio representation (e.g., spectrograms). Despite their impressive performance, such models have difficulty disambiguating the pronunciations of identically spelled words. To mitigate this issue, a separate Grapheme-to-Phoneme (G2P) model can be employed to convert the characters into phonemes before synthesizing the audio. This paper proposes SoundChoice, a novel G2P architecture that processes entire sentences rather than operating at the word level. The proposed architecture takes advantage of a weighted homograph loss (that improves disambiguation), exploits curriculum learning (that gradually switches from word-level to sentence-level G2P), and integrates word embeddings from BERT (for further performance improvement). Moreover, the model inherits the best practices in speech recognition, including multi-task learning with Connectionist Temporal Classification (CTC) and beam search with an embedded language model. As a result, SoundChoice achieves a Phoneme Error Rate (PER) of 2.65% on whole-sentence transcription using data from LibriSpeech and Wikipedia. Index Terms grapheme-to-phoneme, speech synthesis, text-tospeech, phonetics, pronunciation, disambiguation.  ( 2 min )
    Adaptive Second Order Coresets for Data-efficient Machine Learning. (arXiv:2207.13887v1 [cs.LG])
    Training machine learning models on massive datasets incurs substantial computational costs. To alleviate such costs, there has been a sustained effort to develop data-efficient training methods that can carefully select subsets of the training examples that generalize on par with the full training data. However, existing methods are limited in providing theoretical guarantees for the quality of the models trained on the extracted subsets, and may perform poorly in practice. We propose AdaCore, a method that leverages the geometry of the data to extract subsets of the training examples for efficient machine learning. The key idea behind our method is to dynamically approximate the curvature of the loss function via an exponentially-averaged estimate of the Hessian to select weighted subsets (coresets) that provide a close approximation of the full gradient preconditioned with the Hessian. We prove rigorous guarantees for the convergence of various first and second-order methods applied to the subsets chosen by AdaCore. Our extensive experiments show that AdaCore extracts coresets with higher quality compared to baselines and speeds up training of convex and non-convex machine learning models, such as logistic regression and neural networks, by over 2.9x over the full data and 4.5x over random subsets.  ( 2 min )
    Towards Sleep Scoring Generalization Through Self-Supervised Meta-Learning. (arXiv:2207.13801v1 [cs.LG])
    In this work we introduce a novel meta-learning method for sleep scoring based on self-supervised learning. Our approach aims at building models for sleep scoring that can generalize across different patients and recording facilities, but do not require a further adaptation step to the target data. Towards this goal, we build our method on top of the Model Agnostic Meta-Learning (MAML) framework by incorporating a self-supervised learning (SSL) stage, and call it S2MAML. We show that S2MAML can significantly outperform MAML. The gain in performance comes from the SSL stage, which we base on a general purpose pseudo-task that limits the overfitting to the subject-specific patterns present in the training dataset. We show that S2MAML outperforms standard supervised learning and MAML on the SC, ST, ISRUC, UCD and CAP datasets.  ( 2 min )
  • Open

    Shift-Curvature, SGD, and Generalization. (arXiv:2108.09507v3 [stat.ML] UPDATED)
    A longstanding debate surrounds the related hypotheses that low-curvature minima generalize better, and that SGD discourages curvature. We offer a more complete and nuanced view in support of both. First, we show that curvature harms test performance through two new mechanisms, the shift-curvature and bias-curvature, in addition to a known parameter-covariance mechanism. The three curvature-mediated contributions to test performance are reparametrization-invariant although curvature is not. The shift in the shift-curvature is the line connecting train and test local minima, which differ due to dataset sampling or distribution shift. Although the shift is unknown at training time, the shift-curvature can still be mitigated by minimizing overall curvature. Second, we derive a new, explicit SGD steady-state distribution showing that SGD optimizes an effective potential related to but different from train loss, and that SGD noise mediates a trade-off between deep versus low-curvature regions of this effective potential. Third, combining our test performance analysis with the SGD steady state shows that for small SGD noise, the shift-curvature may be the most significant of the three mechanisms. Our experiments confirm the impact of shift-curvature on test loss, and further explore the relationship between SGD noise and curvature.
    Learning with Succinct Common Representation Based on Wyner's Common Information. (arXiv:1905.10945v2 [cs.LG] UPDATED)
    A new bimodal generative model is proposed for generating conditional and joint samples, accompanied with a training method with learning a succinct bottleneck representation. The proposed model, dubbed as the variational Wyner model, is designed based on two classical problems in network information theory -- distributed simulation and channel synthesis -- in which Wyner's common information arises as the fundamental limit on the succinctness of the common representation. The model is trained by minimizing the symmetric Kullback--Leibler divergence between variational and model distributions with regularization terms for common information, reconstruction consistency, and latent space matching terms, which is carried out via an adversarial density ratio estimation technique. The utility of the proposed approach is demonstrated through experiments for joint and conditional generation with synthetic and real-world datasets, as well as a challenging zero-shot image retrieval task.  ( 2 min )
    Pareto-optimal clustering with the primal deterministic information bottleneck. (arXiv:2204.02489v2 [cs.LG] UPDATED)
    At the heart of both lossy compression and clustering is a trade-off between the fidelity and size of the learned representation. Our goal is to map out and study the Pareto frontier that quantifies this trade-off. We focus on the optimization of the Deterministic Information Bottleneck (DIB) objective over the space of hard clusterings. To this end, we introduce the primal DIB problem, which we show results in a much richer frontier than its previously studied Lagrangian relaxation when optimized over discrete search spaces. We present an algorithm for mapping out the Pareto frontier of the primal DIB trade-off that is also applicable to other two-objective clustering problems. We study general properties of the Pareto frontier, and we give both analytic and numerical evidence for logarithmic sparsity of the frontier in general. We provide evidence that our algorithm has polynomial scaling despite the super-exponential search space, and additionally, we propose a modification to the algorithm that can be used where sampling noise is expected to be significant. Finally, we use our algorithm to map the DIB frontier of three different tasks: compressing the English alphabet, extracting informative color classes from natural images, and compressing a group theory-inspired dataset, revealing interesting features of frontier, and demonstrating how the structure of the frontier can be used for model selection with a focus on points previously hidden by the cloak of the convex hull.  ( 3 min )
    Regret Minimization and Convergence to Equilibria in General-sum Markov Games. (arXiv:2207.14211v1 [cs.LG])
    An abundance of recent impossibility results establish that regret minimization in Markov games with adversarial opponents is both statistically and computationally intractable. Nevertheless, none of these results preclude the possibility of regret minimization under the assumption that all parties adopt the same learning procedure. In this work, we present the first (to our knowledge) algorithm for learning in general-sum Markov games that provides sublinear regret guarantees when executed by all agents. The bounds we obtain are for swap regret, and thus, along the way, imply convergence to a correlated equilibrium. Our algorithm is decentralized, computationally efficient, and does not require any communication between agents. Our key observation is that online learning via policy optimization in Markov games essentially reduces to a form of weighted regret minimization, with unknown weights determined by the path length of the agents' policy sequence. Consequently, controlling the path length leads to weighted regret objectives for which sufficiently adaptive algorithms provide sublinear regret guarantees.  ( 2 min )
    The Familiarity Hypothesis: Explaining the Behavior of Deep Open Set Methods. (arXiv:2203.02486v4 [cs.CV] UPDATED)
    In many object recognition applications, the set of possible categories is an open set, and the deployed recognition system will encounter novel objects belonging to categories unseen during training. Detecting such "novel category" objects is usually formulated as an anomaly detection problem. Anomaly detection algorithms for feature-vector data identify anomalies as outliers, but outlier detection has not worked well in deep learning. Instead, methods based on the computed logits of visual object classifiers give state-of-the-art performance. This paper proposes the Familiarity Hypothesis that these methods succeed because they are detecting the absence of familiar learned features rather than the presence of novelty. This distinction is important, because familiarity-based detection will fail in many situations where novelty is present. For example when an image contains both a novel object and a familiar one, the familiarity score will be high, so the novel object will not be noticed. The paper reviews evidence from the literature and presents additional evidence from our own experiments that provide strong support for this hypothesis. The paper concludes with a discussion of whether familiarity-based detection is an inevitable consequence of representation learning.  ( 3 min )
    What Happens after SGD Reaches Zero Loss? --A Mathematical Framework. (arXiv:2110.06914v4 [cs.LG] UPDATED)
    Understanding the implicit bias of Stochastic Gradient Descent (SGD) is one of the key challenges in deep learning, especially for overparametrized models, where the local minimizers of the loss function $L$ can form a manifold. Intuitively, with a sufficiently small learning rate $\eta$, SGD tracks Gradient Descent (GD) until it gets close to such manifold, where the gradient noise prevents further convergence. In such a regime, Blanc et al. (2020) proved that SGD with label noise locally decreases a regularizer-like term, the sharpness of loss, $\mathrm{tr}[\nabla^2 L]$. The current paper gives a general framework for such analysis by adapting ideas from Katzenberger (1991). It allows in principle a complete characterization for the regularization effect of SGD around such manifold -- i.e., the "implicit bias" -- using a stochastic differential equation (SDE) describing the limiting dynamics of the parameters, which is determined jointly by the loss function and the noise covariance. This yields some new results: (1) a global analysis of the implicit bias valid for $\eta^{-2}$ steps, in contrast to the local analysis of Blanc et al. (2020) that is only valid for $\eta^{-1.6}$ steps and (2) allowing arbitrary noise covariance. As an application, we show with arbitrary large initialization, label noise SGD can always escape the kernel regime and only requires $O(\kappa\ln d)$ samples for learning an $\kappa$-sparse overparametrized linear model in $\mathbb{R}^d$ (Woodworth et al., 2020), while GD initialized in the kernel regime requires $\Omega(d)$ samples. This upper bound is minimax optimal and improves the previous $\tilde{O}(\kappa^2)$ upper bound (HaoChen et al., 2020).  ( 3 min )
    Online Inference for Mixture Model of Streaming Graph Signals with Non-White Excitation. (arXiv:2207.14019v1 [stat.ML])
    This paper considers a joint multi-graph inference and clustering problem for simultaneous inference of node centrality and association of graph signals with their graphs. We study a mixture model of filtered low pass graph signals with possibly non-white and low-rank excitation. While the mixture model is motivated from practical scenarios, it presents significant challenges to prior graph learning methods. As a remedy, we consider an inference problem focusing on the node centrality of graphs. We design an expectation-maximization (EM) algorithm with a unique low-rank plus sparse prior derived from low pass signal property. We propose a novel online EM algorithm for inference from streaming data. As an example, we extend the online algorithm to detect if the signals are generated from an abnormal graph. We show that the proposed algorithms converge to a stationary point of the maximum-a-posterior (MAP) problem. Numerical experiments support our analysis.  ( 2 min )
    Fast Online Changepoint Detection via Functional Pruning CUSUM statistics. (arXiv:2110.08205v3 [stat.ME] UPDATED)
    Many modern applications of online changepoint detection require the ability to process high-frequency observations, sometimes with limited available computational resources. Online algorithms for detecting a change in mean often involve using a moving window, or specifying the expected size of change. Such choices affect which changes the algorithms have most power to detect. We introduce an algorithm, Functional Online CuSUM (FOCuS), which is equivalent to running these earlier methods simultaneously for all sizes of window, or all possible values for the size of change. Our theoretical results give tight bounds on the expected computational cost per iteration of FOCuS, with this being logarithmic in the number of observations. We show how FOCuS can be applied to a number of different change in mean scenarios, and demonstrate its practical utility through its state-of-the art performance at detecting anomalous behaviour in computer server data.  ( 2 min )
    Differentially Private Learning of Hawkes Processes. (arXiv:2207.13741v1 [stat.ML])
    Hawkes processes have recently gained increasing attention from the machine learning community for their versatility in modeling event sequence data. While they have a rich history going back decades, some of their properties, such as sample complexity for learning the parameters and releasing differentially private versions, are yet to be thoroughly analyzed. In this work, we study standard Hawkes processes with background intensity $\mu$ and excitation function $\alpha e^{-\beta t}$. We provide both non-private and differentially private estimators of $\mu$ and $\alpha$, and obtain sample complexity results in both settings to quantify the cost of privacy. Our analysis exploits the strong mixing property of Hawkes processes and classical central limit theorem results for weakly dependent random variables. We validate our theoretical findings on both synthetic and real datasets.  ( 2 min )
    One-Pass Learning via Bridging Orthogonal Gradient Descent and Recursive Least-Squares. (arXiv:2207.13853v1 [cs.LG])
    While deep neural networks are capable of achieving state-of-the-art performance in various domains, their training typically requires iterating for many passes over the dataset. However, due to computational and memory constraints and potential privacy concerns, storing and accessing all the data is impractical in many real-world scenarios where the data arrives in a stream. In this paper, we investigate the problem of one-pass learning, in which a model is trained on sequentially arriving data without retraining on previous datapoints. Motivated by the increasing use of overparameterized models, we develop Orthogonal Recursive Fitting (ORFit), an algorithm for one-pass learning which seeks to perfectly fit every new datapoint while changing the parameters in a direction that causes the least change to the predictions on previous datapoints. By doing so, we bridge two seemingly distinct algorithms in adaptive filtering and machine learning, namely the recursive least-squares (RLS) algorithm and orthogonal gradient descent (OGD). Our algorithm uses the memory efficiently by exploiting the structure of the streaming data via an incremental principal component analysis (IPCA). Further, we show that, for overparameterized linear models, the parameter vector obtained by our algorithm is what stochastic gradient descent (SGD) would converge to in the standard multi-pass setting. Finally, we generalize the results to the nonlinear setting for highly overparameterized models, relevant for deep learning. Our experiments show the effectiveness of the proposed method compared to the baselines.  ( 3 min )
    Algorithmic Foundation of Deep X-Risk Optimization. (arXiv:2206.00439v4 [cs.LG] UPDATED)
    X-risk is a term introduced to represent a family of compositional measures or objectives, in which each data point is compared with a large number of items explicitly or implicitly for defining a risk function. It includes many widely used measures or objectives, e.g., AUROC, AUPRC, partial AUROC, NDCG, MAP, top-$K$ NDCG, top-$K$ MAP, listwise losses, p-norm push, top push, precision/recall at top $K$ positions, precision at a certain recall level, contrastive objectives, etc. While these non-decomposable measures/objectives and their optimization algorithms have been studied in the literature of machine learning, computer vision, information retrieval, and etc, optimizing these measures/objectives has encountered some unique challenges for deep learning. In this paper, we survey recent rigorous efforts for deep X-risk optimization (DXO) by focusing on its algorithmic foundation. We introduce a class of techniques for optimizing X-risks for deep learning. We formulate DXO into three special families of non-convex optimization problems belonging to non-convex min-max optimization, non-convex compositional optimization, and non-convex bilevel optimization, respectively. For each family of problems, we present some strong baseline algorithms and their complexities, which will motivate further research for improving the existing results. Discussions about the presented results and future studies are given at the end. Efficient algorithms for optimizing a variety of X-risks are implemented in the LibAUC library at www.libauc.org.  ( 3 min )
    Modeling Item Response Theory with Stochastic Variational Inference. (arXiv:2108.11579v2 [cs.LG] UPDATED)
    Item Response Theory (IRT) is a ubiquitous model for understanding human behaviors and attitudes based on their responses to questions. Large modern datasets offer opportunities to capture more nuances in human behavior, potentially improving psychometric modeling leading to improved scientific understanding and public policy. However, while larger datasets allow for more flexible approaches, many contemporary algorithms for fitting IRT models may also have massive computational demands that forbid real-world application. To address this bottleneck, we introduce a variational Bayesian inference algorithm for IRT, and show that it is fast and scalable without sacrificing accuracy. Applying this method to five large-scale item response datasets from cognitive science and education yields higher log likelihoods and higher accuracy in imputing missing data than alternative inference algorithms. Using this new inference approach we then generalize IRT with expressive Bayesian models of responses, leveraging recent advances in deep learning to capture nonlinear item characteristic curves (ICC) with neural networks. Using an eigth-grade mathematics test from TIMSS, we show our nonlinear IRT models can capture interesting asymmetric ICCs. The algorithm implementation is open-source, and easily usable.  ( 3 min )
    Generative Modelling With Inverse Heat Dissipation. (arXiv:2206.13397v2 [cs.CV] UPDATED)
    While diffusion models have shown great success in image generation, their noise-inverting generative process does not explicitly consider the structure of images, such as their inherent multi-scale nature. Inspired by diffusion models and the desirability of coarse-to-fine modelling, we propose a new model that generates images through iteratively inverting the heat equation, a PDE that locally erases fine-scale information when run over the 2D plane of the image. In our novel methodology, the solution of the forward heat equation is interpreted as a variational approximation in a directed graphical model. We demonstrate promising image quality and point out emergent qualitative properties not seen in diffusion models, such as disentanglement of overall colour and shape in images and aspects of neural network interpretability. Spectral analysis on natural images positions our model as a type of dual to diffusion models and reveals implicit inductive biases in them.  ( 2 min )
    MarkerMap: nonlinear marker selection for single-cell studies. (arXiv:2207.14106v1 [stat.ML])
    Single-cell RNA-seq data allow the quantification of cell type differences across a growing set of biological contexts. However, pinpointing a small subset of genomic features explaining this variability can be ill-defined and computationally intractable. Here we introduce MarkerMap, a generative model for selecting minimal gene sets which are maximally informative of cell type origin and enable whole transcriptome reconstruction. MarkerMap provides a scalable framework for both supervised marker selection, aimed at identifying specific cell type populations, and unsupervised marker selection, aimed at gene expression imputation and reconstruction. We benchmark MarkerMap's competitive performance against previously published approaches on real single cell gene expression data sets. MarkerMap is available as a pip installable package, as a community resource aimed at developing explainable machine learning techniques for enhancing interpretability in single-cell studies.  ( 2 min )
    Hardness of Agnostically Learning Halfspaces from Worst-Case Lattice Problems. (arXiv:2207.14030v1 [cs.LG])
    We show hardness of improperly learning halfspaces in the agnostic model based on worst-case lattice problems, e.g., approximating shortest vectors within polynomial factors. In particular, we show that under this assumption there is no efficient algorithm that outputs any binary hypothesis, not necessarily a halfspace, achieving misclassfication error better than $\frac 1 2 - \epsilon$ even if the optimal misclassification error is as small is as small as $\delta$. Here, $\epsilon$ can be smaller than the inverse of any polynomial in the dimension and $\delta$ as small as $\mathrm{exp}\left(-\Omega\left(\log^{1-c}(d)\right)\right)$, where $0 < c < 1$ is an arbitrary constant and $d$ is the dimension. Previous hardness results [Daniely16] of this problem were based on average-case complexity assumptions, specifically, variants of Feige's random 3SAT hypothesis. Our work gives the first hardness for this problem based on a worst-case complexity assumption. It is inspired by a sequence of recent works showing hardness of learning well-separated Gaussian mixtures based on worst-case lattice problems.  ( 2 min )
    Differentiable Rule Induction with Learned Relational Features. (arXiv:2201.06515v2 [stat.ML] UPDATED)
    Rule-based decision models are attractive due to their interpretability. However, existing rule induction methods often result in long and consequently less interpretable rule models. This problem can often be attributed to the lack of appropriately expressive vocabulary, i.e., relevant predicates used as literals in the decision model. Most existing rule induction algorithms presume pre-defined literals, naturally decoupling the definition of the literals from the rule learning phase. In contrast, we propose the Relational Rule Network (R2N), a neural architecture that learns literals that represent a linear relationship among numerical input features along with the rules that use them. This approach opens the door to increasing the expressiveness of induced decision models by coupling literal learning directly with rule learning in an end-to-end differentiable fashion. On benchmark tasks, we show that these learned literals are simple enough to retain interpretability, yet improve prediction accuracy and provide sets of rules that are more concise compared to state-of-the-art rule induction algorithms.  ( 2 min )
    An iterative clustering algorithm for the Contextual Stochastic Block Model with optimality guarantees. (arXiv:2112.10467v2 [stat.ML] UPDATED)
    Real-world networks often come with side information that can help to improve the performance of network analysis tasks such as clustering. Despite a large number of empirical and theoretical studies conducted on network clustering methods during the past decade, the added value of side information and the methods used to incorporate it optimally in clustering algorithms are relatively less understood. We propose a new iterative algorithm to cluster networks with side information for nodes (in the form of covariates) and show that our algorithm is optimal under the Contextual Symmetric Stochastic Block Model. Our algorithm can be applied to general Contextual Stochastic Block Models and avoids hyperparameter tuning in contrast to previously proposed methods. We confirm our theoretical results on synthetic data experiments where our algorithm significantly outperforms other methods, and show that it can also be applied to signed graphs. Finally we demonstrate the practical interest of our method on real data.  ( 2 min )
    A general framework for multi-step ahead adaptive conformal heteroscedastic time series forecasting. (arXiv:2207.14219v1 [stat.ML])
    The exponential growth of machine learning (ML) has prompted a great deal of interest in quantifying the uncertainty of each prediction for a user-defined level of confidence. Reliable uncertainty quantification is crucial and is a step towards increased trust in AI results. It becomes especially important in high-stakes decision-making, where the true output must be within the confidence set with high probability. Conformal prediction (CP) is a distribution-free uncertainty quantification framework that works for any black-box model and yields prediction intervals (PIs) that are valid under the mild assumption of exchangeability. CP-type methods are gaining popularity due to being easy to implement and computationally cheap; however, the exchangeability assumption immediately excludes time series forecasting. Although recent papers tackle covariate shift, this is not enough for the general time series forecasting problem of producing H-step ahead valid PIs. To attain such a goal, we propose a new method called AEnbMIMOCQR (Adaptive ensemble batch multiinput multi-output conformalized quantile regression), which produces asymptotic valid PIs and is appropriate for heteroscedastic time series. We compare the proposed method against state-of-the-art competitive methods in the NN5 forecasting competition dataset. All the code and data to reproduce the experiments are made available  ( 2 min )
    A Generative Deep Learning Approach to Stochastic Downscaling of Precipitation Forecasts. (arXiv:2204.02028v2 [physics.ao-ph] UPDATED)
    Despite continuous improvements, precipitation forecasts are still not as accurate and reliable as those of other meteorological variables. A major contributing factor to this is that several key processes affecting precipitation distribution and intensity occur below the resolved scale of global weather models. Generative adversarial networks (GANs) have been demonstrated by the computer vision community to be successful at super-resolution problems, i.e., learning to add fine-scale structure to coarse images. Leinonen et al. (2020) previously applied a GAN to produce ensembles of reconstructed high-resolution atmospheric fields, given coarsened input data. In this paper, we demonstrate this approach can be extended to the more challenging problem of increasing the accuracy and resolution of comparatively low-resolution input from a weather forecasting model, using high-resolution radar measurements as a "ground truth". The neural network must learn to add resolution and structure whilst accounting for non-negligible forecast error. We show that GANs and VAE-GANs can match the statistical properties of state-of-the-art pointwise post-processing methods whilst creating high-resolution, spatially coherent precipitation maps. Our model compares favourably to the best existing downscaling methods in both pixel-wise and pooled CRPS scores, power spectrum information and rank histograms (used to assess calibration). We test our models and show that they perform in a range of scenarios, including heavy rainfall.  ( 3 min )
    Individual Privacy Accounting for Differentially Private Stochastic Gradient Descent. (arXiv:2206.02617v3 [cs.LG] UPDATED)
    Differentially private stochastic gradient descent (DP-SGD) is the workhorse algorithm for recent advances in private deep learning. It provides a single privacy guarantee to all datapoints in the dataset. We propose an efficient algorithm to compute privacy guarantees for individual examples when releasing models trained by DP-SGD. We use our algorithm to investigate individual privacy parameters across a number of datasets. We find that most examples enjoy stronger privacy guarantees than the worst-case bound. We further discover that the training loss and the privacy parameter of an example are well-correlated. This implies groups that are underserved in terms of model utility are simultaneously underserved in terms of privacy guarantee. For example, on CIFAR-10, the average $\epsilon$ of the class with the lowest test accuracy is 26.3% higher than that of the class with the highest accuracy. We also run membership inference attacks to show this reflects disparate empirical privacy risks.  ( 2 min )
    On the fast convergence of minibatch heavy ball momentum. (arXiv:2206.07553v2 [cs.LG] UPDATED)
    Simple stochastic momentum methods are widely used in machine learning optimization, but their good practical performance is at odds with an absence of theoretical guarantees of acceleration in the literature. In this work, we aim to close the gap between theory and practice by showing that stochastic heavy ball momentum, which can be interpreted as a randomized Kaczmarz algorithm with momentum, retains the fast linear rate of (deterministic) heavy ball momentum on quadratic optimization problems, at least when minibatching with a sufficiently large batch size is used. The analysis relies on carefully decomposing the momentum transition matrix, and using new spectral norm concentration bounds for products of independent random matrices. We provide numerical experiments to demonstrate that our bounds are reasonably sharp.  ( 2 min )

  • Open

    [D] Question on Bnns and MC-Dropout
    Hi, i am a student and i was reading about bayesian neural networks and mc-dropout. The book i am reading was published in 2018 ( which means written in 2017), and i know that 5 years are a long time in the deep learning field. i have a doubt that i would like to ask you. In MC-dropout we approximate the variational posterior as a Bernoulli distribution. doesn't this mean that with mc-dropout we partially lose the ability to adapt the variational distribution to the true a posteriori distribution when compared to a fully variational approach with a generalized mean-field approximation ? In general, is there any disadvantage to using mc-dropout as opposed to a fully variational approach? I was using tensorflow probability with DenseLocalReparameterization layers for a regression problem and now I am wondering whether what I was doing makes sense or if I have complicated my life for no reason and no advantage. sorry if it's a stupid question. ​ ah i would also ask if there is a limit in the dimension of the neural network under which mc-dropout is not a good approximation anymore. My nn is fairly small submitted by /u/ilrazziatore [link] [comments]  ( 88 min )
    [P] should I always favor using object tracking when annotating videos for segmentation over not using it?
    I'm currently working on CVAT to annotate videos of an object (the object is constant but the camera is moving) and I have the option of using Object Tracking feature, which tracks the object in every frame and annotates it, which in return will give more segmentation masks than annotating every N the frames manually. However the downside of using that feature is that in some frames the segmentation mask will not be layed out correctly or be very broken. So my question is, in that case should I still use Object Tracking despite the downside? On one hand I'll be getting more segmentation masks and therefore more data to train on. But on the other hand some of these masks will be faulty and might corrupt the model. submitted by /u/TypicalAngryRedditor [link] [comments]  ( 88 min )
    [D] Switching from Blockchain development to ML
    My main skills were solidity and web development. Obviously, I don't think my solidity skills will be useful here, but will my web development skills have some form of value in my journey? Or is it all python? Also, any recommendations or thoughts of moving from Blockchain to ML and AI are welcomed. Thanks submitted by /u/PlayboiCult [link] [comments]  ( 88 min )
    [D] Typical compute requirements for the training of a transformer-based recommender systems.
    I recently moved from NLP to recommender systems and I've noticed that most papers seem to not address how many resources it took to train their models. This has proven slightly frustrating as I'm currently trying to scope out what a manageable first proof of concept would look like. From my background in NLP, I know that the self-supervised training from scratch of such models takes a while but I'm not sure to what extent this is true for time series data. Has anyone here used something akin to Bert4rec or anything in the Transformer4rec library? What is your experience with your particular dataset/compute capability/model? submitted by /u/MC_Dropout [link] [comments]  ( 87 min )
    [D] TensorDock Core GPU Cloud — GPU servers from $0.29/hr
    Hello r/MachineLearning! I’m Jonathan from TensorDock. After 7 months in beta, we’re finally launching Core Cloud, our platform to deploy GPU virtual machines in as little as 45 seconds! I think you guys would find this as a nice alternative to other clouds for you to train your ML models. https://www.tensordock.com/product-core 🤔 Why? Training machine learning workloads at large clouds can be extremely expensive. This left us wondering, “how did cloud ever become more expensive than on-prem?” I’ve seen too many ML startups buy their own hardware. Cheaper dedicated servers with NVIDIA GPUs are not too hard to find, but they lack the functionality and scalability of the big clouds. We thought to ourselves, what if we built a platform that combines the functionality of the large clou…  ( 90 min )
    [D] Did anyone get into any AI residency right after the Bachelors/ undergraduate studies ?
    I am an undergraduate student from India. I am planning on applying for all the AI residencies. What is expected from an undergraduate applicant with good technical skills and academic record submitted by /u/Actual_banana_2002 [link] [comments]  ( 87 min )
    ML in Production Environments - problems and painpoints? [Discussion] [D]
    Hi all, I'm looking to learn/hear about problems and painpoints that individuals/teams are experiencing when deploying ML products to production? Any insight would be great as I'm keen to avoid headaches as much as possible. ​ Thanks submitted by /u/stoic-AI [link] [comments]  ( 119 min )
    [D] What are some techniques to disperse load across multiple different hardware?
    I just want to say I am very noob and that I need simple explanations to learn. And start with basic noob AI. And then one day I shall create my own unique kind of AI. I have multiple OpenCL 1.2 devices. Of varying speeds such as RX 6600 XT, RX 460, old HD 5450s, some NPUs, Mali GPUs, etc. I suppose I would like to do something like disperse a uneven number of NEURONS across several devices based on their speed. For devices to analyze different amounts of data and combine them into a shared project. How easy is that? Simple AI should be used to train other AIs, and produce data for other AIs. Ok, I just want to avoid a bottleneck involving using slower devices with faster ones. submitted by /u/Reddit-CEO-DontClick [link] [comments]  ( 88 min )
    How to approach Recommendation System Project [P]
    Hello , So during my internship I'll be working on building a recommendation system for an e-commerce website and this is the first time I'll be working on such project. I need some advices on how to approach such problems and if there any helpful resources I can use it will be much appreciated. Thank you. submitted by /u/AB3NZ [link] [comments]  ( 88 min )
    [D] Training on a bunch of hardware vs one kind of hardware?
    I don't know. I don't know the efficiency of training on multiple different hardware. I have a RX 6600 XT, but not a cluster. I do have a ton of other things laying around, an RX 6600 XT, RX 460, Intel Iris Pro, Malis, NPUs, RK3399, HD 5450s*. What is the efficiency of training on a bunch of low power, high power, specialized, and unrelated hardware? I do know they all run OpenCL 1.2 (often via open source drivers) very well. Wonder if something will bottleneck, I also happen to be bad at programming, for for personal projects, I can probably steal other people's code. But ideally, I suppose each part will do it's own thing and run at around 93% utilization? submitted by /u/Reddit-CEO-DontClick [link] [comments]  ( 88 min )
    [P] For Hearthstone and recsys fans, there is a kaggle competition for you :)
    Hello Reddit, I just published on kaggle a competition around recommender system applied in the context of hearthstone. https://www.kaggle.com/competitions/what-card-should-i-select-next I hope that you will enjoy it (I just dived again in hearthstone, and I am hooked to their battlegrounds mode) submitted by /u/jeanmidev [link] [comments]  ( 87 min )
    [D] Honest/Pragmatic thoughts of AutoML frameworks when it comes to (at least some ) daily work?
    Hi all. I work with ML and do a lot of data science on a daily basis and it’s a world I love. I’ve worked hard to get to the knowledge base that I have and I’m quite proud of it. I think a lot of us are. But making things happen and get results takes WORK - I need to make sure delivery is happening as well. Recently I’ve been exploring the AutoML frameworks from AWS and Google. And they are pretty much “dump some data, select a few options and ML magic happens in a box”. I came at them pretty negatively - cynically at least. And they are not perfect. If I sit down and work I can beat their outputs usually - but often only by a few points. And that will take me a good half day, or a good day, to make happen. The thing is - what I’m seeing is that while they are by no means perfect they are …. Entirely OK. For a lot of the work that I’m doing it’s not about fighting for every point of accuracy it’s about exploring or getting a gut feel for data or pulling out some key facets for a different group within a client. There are just as many times when accuracy and quality DOES matter - and in those cases I’m going to stay as close as possible to the models and the features. So - I find myself torn on my thoughts about them and was wondering what others thought? Are you staying away from them? Diving in fully? Using them in certain times/use-cases? submitted by /u/CarrotCakeandGin [link] [comments]  ( 118 min )
    [R] Ten Lessons of Implementing Recommendation Systems in Business
    FunCorp data science team has been long working on improving the user experience with machine learning. We've picked out key takeaways of that process. Following this article's advice, you will avoid a lot of mistakes when creating a recommendation system for your product. 1. Define a Goal that Really Contributes to the Business Tasks The global task of the recommendation system is to select a shortlist of content from a large catalog that is most suitable for a particular user. The content itself can be different — from products in the online store and articles to banking services. FunCorp product team works with the most interesting kind of content — we recommend memes. To do this, we rely on the history of the user’s interaction with the service. But “good recommendations” from a use…  ( 102 min )
    Chest X-ray Network :Simplified Transfer Learning for Chest Radiography Model Development [R]
    Researchers have added an additional step of pre training a generic image deep learning model on 800k chest x-ray images using supervised contrartsive learning using noisy labels from radiology reports. Image embeddings generated from this network can then be used for tasks like abnormality detection on a smaller set of chest x-ray images They have also released a chest foundation tool for generating image embeddings for chest x-ray. I liked the idea behind this paper and I believe it can also be extended to other medical imaging modalities like MR,CT. I have made a video on the same . Do checkout : https://youtu.be/lyhG6hivJqw submitted by /u/Sea-Photo5230 [link] [comments]  ( 120 min )
    [D] Building a paraphrasing tool like Quillbot
    Quillbot is an amazing tool for paraphrasing. I used it multiple times while writing peer-reviewed articles and my dissertation thesis. Unfortunately, there's no similar tool that I'm aware of for my language (Italian). I was wondering what kind of tools/AImodels I could leverage if I wanted to build it in my native language. Any suggestions are much appreciated. I'm a web developer with some basic knowledge of AI, ML, and statistics, so you can get as geeky as you like in your explanations :) submitted by /u/Kelith7 [link] [comments]  ( 88 min )
    [D] How important is text preprocessing nowadays with transformer models available?
    Hi everyone! The headline already sums it up pretty much. Do we still really need stemming, cleaning etc. as we used to or are the transformer models good and big enough to handle raw data nowadays? Thanks a lot! submitted by /u/kermitai [link] [comments]  ( 91 min )
    [R] [P] FL_PyTorch: Optimization Research Simulator for Federated Learning is publicly available on GitHub.
    FL_PyTorch: Optimization Research Simulator for Federated Learning is publicly available on GitHub. https://burlachenkok.github.io/FL_PyTorch-Available-As-Open-Source/ Repository: https://github.com/burlachenkok/flpytorch Slack Workspace: https://fl-pytorch.slack.com/ The invitation Link: https://join.slack.com/t/fl-pytorch/shared_invite/zt-1cjkjct9c-1wuFdrbVT4LcrAcjyj_gBw The arXiv link for the paper: https://arxiv.org/abs/2202.03099 FL_PyTorch is a suite of open-source software written in python that builds on top of one of the most popular research Deep Learning (DL) frameworks PyTorch. We built FL_PyTorch as a research simulator for FL to enable fast development, prototyping, and experimenting with new and existing FL optimization algorithms. Our system supports abstractions that provide researchers with sufficient flexibility to experiment with existing and novel approaches to advance the state-of-the-art. The work is in proceedings of the 2nd International Workshop on Distributed Machine Learning DistributedML 2021. submitted by /u/bruziuz [link] [comments]  ( 88 min )
    [D]What are some common sticking points in this field?
    Many people try to improve but either quit or get stuck real quick and not able to advance to the next level in this field. From your experience and perspective, what are the most common things that need to learned for practitioners to get over the hump? submitted by /u/THE_REAL_ODB [link] [comments]  ( 121 min )
    [D] Naming convention: `train!` or `fit!` for the API of a ML library ?
    I am deeply undecided to name the step where parameters of a model are learned from data in the API of my ML library `train!(model,X,[Y])` or `fit!(model,X,[Y])`. I would intuitively prefer the first, as makes somehow explicit that we are learning something with experience, but `train/fit` seems to be more common... What would you choose ? PS: the exclamation mark is due to another convention in Julia where functions that change their arguments - the model object in my case - ends with an exclamation mark submitted by /u/alobianco [link] [comments]  ( 88 min )
    [R] Blog post summarizing undergraduate thesis work
    Hey everyone! I just published a blog post today that summarizes my undergraduate thesis work. The thesis topic is a multi-network approach to minimize overfitting to noisy data. Here is a link to the article. Any feedback or questions would be really appreciated. Thanks! submitted by /u/ryxu [link] [comments]  ( 87 min )
    [D] Influence of cognitive science on ML. Worth learning?
    Often ML algorithms (especially DL) motivate their ideas with notions from cognitive science. These when presented often seem to be reasonable as someone who is not well versed on the subject. A part of me wants to more explicitly learn this and am considering taking a class on it. the opportunity cost being not being able to explore a topic like signal processing that is next on my list of topics to self-explore. (already have a graduate degree in CS). tl;dr is cognitive science a class worth taking? has being informed in this field helped in ML or AI? or life in general? submitted by /u/mathuwthrow [link] [comments]  ( 88 min )
  • Open

    "Man" created on pixelz.ai
    submitted by /u/PixelzJ [link] [comments]  ( 85 min )
    I made a program that is capable to solve logic problems
    First, you need to supply it with the rules in the given environment, then pass it trough a function to compare those rules and get all direct implications of these functions. Then given an allegation you can get the implications of that allegation. here is a example of the program running ​ Here is the code of the example shown above: source code: https://github.com/Thiago099/Einstein submitted by /u/Small-Ad-1694 [link] [comments]  ( 86 min )
    Royalty/commercial free Generated backgrounds for Art.
    Hey, As an artist I am currently asking around if there are any AI programs out there that you can use for art backgrounds and whatever and it just occurred to me that do I actually own it or is it royalty/commercial free? Right now I am being cautious and like an answer because I learned that you can learn to use shortcuts to save time and struggles. submitted by /u/Bluefuchs [link] [comments]  ( 86 min )
    Eerie Deepfake Tech Turns Random Guy Into Angelina Jolie
    submitted by /u/Tao_Dragon [link] [comments]  ( 86 min )
    Low barrier entry conversational bot design options?
    Having taken a couple of months to poke around with Replika.ai and checking other similar products like Kuki, I'm interested in crafting my own "robot companion" but I have no real knowledge of how to set up an AI. Are there any good options for someone who wants to make a bot, but doesn't really know the ins-and-outs of the design process? Open source would be my preference. submitted by /u/micah1_8 [link] [comments]  ( 86 min )
    Artificial Intelligence Discovers Alternative Physics
    submitted by /u/sasksean [link] [comments]  ( 86 min )
    A dataset of global AI/ML salaries in the Public Domain
    This is a project to simply collect as many salary information in the whole AI/ML job space and make it all public for everyone to access and use (researchers, jobseekers, recruiters, etc.). The dataset can be found here: https://salaries.ai-jobs.net/download/ submitted by /u/ai_jobs [link] [comments]  ( 86 min )
    Need a career advice to clearly understand appreciate AI advancements.
    Sorry, if I'm too long, I just can't pin down what exactly I want. Also is the flair right? When I was in my high school, I fell in love with physics: reading Feynman, watching a lot of science videos. I was just obsessed how each idea peels out deeper and deeper understanding of how nature works. Now I expected something of similar grandiosity in AI. Definitely, AI has grown up a lot, and we have uncovered a lot of ideas. But as I entered an undergrad course, I realized all I have to deal with are just the very popular models like Classifiers and Clustering models. It felt stale. I almost gave up, thinking AI is just a bunch of loose ideas that somehow worked, until I found a book in our library that has exactly what I wanted. I hope I had read that book before anything at all. (Deepak Khemani's A first course in Artificial intelligence. I fricking love how he strings up a lot of the loose concepts in CS into a single Feynman-esque narrative.) What I want is an insightful understanding of the field, its developments and its findings that I can speak hours long; But I can't find a way to pursue it. Nor I can find a way in which I can make a good career out of it. I don't think hirers would value what I want. submitted by /u/Neuroth [link] [comments]  ( 87 min )
    Disco Diffusion AI Art Tutorial Quickstudies #3 Models
    submitted by /u/prfitofthesngularity [link] [comments]  ( 86 min )
    Can NFT finance an AI ?
    An artificial intelligence project needs to be funded and needs resources. An NFT collection is created with the name TELOS MASK The collection is being presented to a competition currently underway, and needs supporters. To support the project, you can register here to receive 100 free tokens to use for voting. wish me luck and vote if you like it or at least talk to the AI to see if it deserves your attention submitted by /u/metaquid [link] [comments]  ( 86 min )
    A Tale of Two “AI” Companies
    submitted by /u/bendee983 [link] [comments]  ( 86 min )
    SimSimi sus 🤨
    submitted by /u/ChooChooWaah [link] [comments]  ( 92 min )
    Are there any public use AI bots that could potentially become great songwriting collaborators?
    I'm a songwriter and looking to collaborate with the worlds future artists. submitted by /u/BigOlDumbCunt [link] [comments]  ( 85 min )
    Cohere AI Hackathon
    Join us for Cohere AI hackathon, where you will use one of the world's most powerful NLP engines to build applications based on large language models. We are waiting for you on 19-21st August at lablab.ai, so that you can already start implementing your innovative projects that will radically change the world in the near future! Cohere experts during workshops, keynotes, and mentoring sessions - will do their best to quickly and efficiently onboard you to the advanced NLP model that leads the future! Who can participate? Industry experts with coding and data science experience People with other types of domain knowledge that want to understand & explore AI Register now - it's totally free! Cohere AI Hackathon submitted by /u/zakrzzz [link] [comments]  ( 86 min )
    The Best Machine Learning Courses on Udemy (2022)
    submitted by /u/Jan_Prince [link] [comments]  ( 86 min )
    What are the requirements of laptop for engineering.
    I'm gonna join AI and ML engineering this year and I would like to know what is a good laptop. Do I have to have a GPU for the laptop? submitted by /u/SomewhereBrilliant85 [link] [comments]  ( 87 min )
    Neural Network to predict how long a job will take to repair
    The company I work for offers a large amount of aftermarket services for the products that they sell. The biggest one in terms of volume is the repairs, where a customer will send goods back to our facility, where a skilled operator will assess the job for damage and report back what parts they need to fix it. Once they complete the work they book their time to an order in our ERP system. The time taken to repair a job will vary each time and can range for anywhere between a couple hours to a whole day. I work in the production planning department where we are responsible for creating a weekly plan for each of the different areas of the facility. We have set times for each of the jobs however, these tend to be an average of all the time booked and therefore are more likely to be inaccurate than accurate. I thought this might be a good problem for a neural network where I could take the historical data (just under 1m rows) and use it to predict how long a future order might take. I followed some tutorials on tensorflow and managed to create a neural network and initially had some success getting it to predict around 60% of the orders in the test data correctly. I’ve now hit a brick wall with getting the model to be anymore accurate and I feel like I’m just randomly changing the hyper parameters hoping for better results. This is my first time working with AI and I’m lost on what to do next to improve the accuracy. Does anyone have any advice on what approach I might follow to improve the model further? submitted by /u/-hilcf [link] [comments]  ( 92 min )
    What are the biggest hurdles in annotating data well?
    Hi everyone! I am very keen to know what are the biggest hurdles for you nowadays when annotating data for NLP? There is so much great annotation software for already that I am wondering if there are any big obstacles left. Do you have any insights from some of your projects or day to day work maybe? Thanks a lot! submitted by /u/kermitai [link] [comments]  ( 86 min )
    FL_PyTorch: Optimization Research Simulator for Federated Learning is publicly available on GitHub.
    FL_PyTorch: Optimization Research Simulator for Federated Learning is publicly available on GitHub. https://burlachenkok.github.io/FL_PyTorch-Available-As-Open-Source/ Repository: https://github.com/burlachenkok/flpytorch Slack Workspace: https://fl-pytorch.slack.com/ The invitation Link: https://join.slack.com/t/fl-pytorch/shared_invite/zt-1cjkjct9c-1wuFdrbVT4LcrAcjyj_gBw The arXiv link for the paper: https://arxiv.org/abs/2202.03099 FL_PyTorch is a suite of open-source software written in python that builds on top of one of the most popular research Deep Learning (DL) frameworks PyTorch. We built FL_PyTorch as a research simulator for FL to enable fast development, prototyping, and experimenting with new and existing FL optimization algorithms. Our system supports abstractions that provide researchers with sufficient flexibility to experiment with existing and novel approaches to advance the state-of-the-art. The work is in proceedings of the 2nd International Workshop on Distributed Machine Learning DistributedML 2021. submitted by /u/bruziuz [link] [comments]  ( 86 min )
    The Most Beautiful Space Visualization on the Internet | 4K UHD | 24 FPS
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 90 min )
  • Open

    Why do we need experience replay if the algorithm is epsilon-greedy ?
    Hey ! I am new to deep Q learning and am confused about something. I understand that experience replay allows to get rid of of the correlation between consecutive states, thus avoiding to fall in local optima. But doesn't epsilon already solve this problem ? If we start by taking random actions, won't we explore most of the state space and thus avoid falling in local optima ? The difference I see is that using experience replay the neural net is not fed several similar states in a row while it is training, but how does that prevent falling in local optima ? submitted by /u/youneskamel2 [link] [comments]  ( 93 min )
    "Semi-analytical Industrial Cooling System Model for Reinforcement Learning", Chervonyi et al 2022 {DM} (cooling simulated Google datacenters)
    submitted by /u/gwern [link] [comments]  ( 86 min )
    "PI-ARS: Accelerating Evolution-Learned Visual-Locomotion with Predictive Information Representations", Lee et al 2022 {G} (evolving policy on top of contrastive+reward-predictive NN)
    submitted by /u/gwern [link] [comments]  ( 93 min )
    "Multi-Objective Hyperparameter Optimization -- An Overview", Karl et al 2022
    submitted by /u/gwern [link] [comments]  ( 93 min )
    "Learning with Combinatorial Optimization Layers: a Probabilistic Approach", Dalle et al 2022
    submitted by /u/gwern [link] [comments]  ( 93 min )
    Mujoco action space
    Does anyone happen to know what happens when you submit an action outside of the action space in mujoco? I.e. submit a 1.5 when the range is [-1,1]. Couldn't seem to find this anywhere in the docs. submitted by /u/VirtualHat [link] [comments]  ( 86 min )
    made an RL algo for modeling episode reward directly
    I came across a problem for when modeling per step reward became very disconnected with the actual final episode reward, which is usually what we really care about. This can happen for any number of reasons... where a decrease in error, doesn't always translate to an increase in final episode reward in a straightforward manner. Of course very generally speaking, one can expect episode reward to go up as loss decreases, but in practice we might have a few different models with the same loss, and actually perform very differently in some environments. Usually the more complex the environment, the more this becomes an issue. That and per sample methods usually require many samples, a time horizon variable, and other hyperparameters that can be hard to set correctly. Obviously not suited for every problem (ex. environments that are expensive to sample from or have some sort of time constraint) but for certain problems you might find it useful. Interested for people to try it out and give some feedback. https://github.com/ben-arnao/OnGrad submitted by /u/Yogi_DMT [link] [comments]  ( 87 min )
    MuJoCo: How can i change friction properties of geom with default class?
    Hi, I'm creating Mujoco environment to test walking robot software. I want to create many types of ground in one simulation to check how he will adapt. For starters, I've tried to manipulate friction attribute in geom element. I created two hfields, placed them next to each other and created new default class: One hfield geom in simulation have default class second one have class="geom_frictionless", but friction in simulation is same on both surfaces. I must add that geom's material in simulation is changing so part of custom class attributes works. Anyone knows why I can't override friction element? submitted by /u/Kwach00 [link] [comments]  ( 87 min )
    What is the current SOTA for On-policy RL?
    The on-policy RL community does not seem to release the popular SOTA after the PPO? submitted by /u/CeyaoZhang [link] [comments]  ( 87 min )
  • Open

    1,650+ Global Interns Gleam With NVIDIA Green
    A record number of interns calls for a record-sized celebration. In our largest contingent ever, over 1,650 interns from 350+ schools started with NVIDIA worldwide over the past year. Amidst busy work days tackling real-world projects across engineering, automation, robotics and more, the group’s also finishing up a three-day celebration, culminating today with National Intern Read article > The post 1,650+ Global Interns Gleam With NVIDIA Green appeared first on NVIDIA Blog.  ( 5 min )
    Pony.ai Express: New Autonomous Trucking Collaboration Powered by NVIDIA DRIVE Orin
    More than 160 years after the legendary Pony Express delivery service completed its first route, a new generation of “Pony”-emblazoned vehicles are taking an AI-powered approach to long-haul delivery. Autonomous driving company Pony.ai announced today a partnership with SANY Heavy Truck (SANY), China’s largest heavy equipment manufacturer, to jointly develop level 4 autonomous trucks. The Read article > The post Pony.ai Express: New Autonomous Trucking Collaboration Powered by NVIDIA DRIVE Orin appeared first on NVIDIA Blog.  ( 5 min )
    Welcome Back, Commander: ‘Command & Conquer Remastered Collection’ Joins GeForce NOW
    Take a trip down memory lane this week with an instantly recognizable classic, Command & Conquer Remastered Collection, joining the nearly 20 Electronic Arts games streaming from the GeForce NOW library. Speaking of remastered, GeForce NOW members can enhance their gameplay further with improved resolution scaling in the 2.0.43 app update. When the feature is Read article > The post Welcome Back, Commander: ‘Command & Conquer Remastered Collection’ Joins GeForce NOW appeared first on NVIDIA Blog.  ( 5 min )
    NVIDIA Studio Laptops Offer Students AI, Creative Capabilities That Are Best in… Class
    Selecting the right laptop is a lot like trying to pick the right major. Both can be challenging tasks where choosing wrongly costs countless hours. But pick the right one, and graduation is just around the corner. The tips below can help the next generation of artists select the ideal NVIDIA Studio laptop to maximize performance for the critical workload demands of their unique creative fields — all within budget. The post NVIDIA Studio Laptops Offer Students AI, Creative Capabilities That Are Best in… Class appeared first on NVIDIA Blog.  ( 10 min )
    How’s That? Startup Ups Game for Cricket, Football and More With Vision AI
    Sports produce a slew of data. In a game of cricket, for example, each play generates millions of video-frame data points for a sports analyst to scrutinize, according to Masoumeh Izadi, managing director of deep-tech startup TVConal. The Singapore-based company uses NVIDIA AI and computer vision to power its sports video analytics platform, which enables Read article > The post How’s That? Startup Ups Game for Cricket, Football and More With Vision AI appeared first on NVIDIA Blog.  ( 6 min )
  • Open

    New hardware offers faster computation for artificial intelligence, with much less energy
    Engineers working on “analog deep learning” have found a way to propel protons through solids at unprecedented speeds.  ( 9 min )
  • Open

    Humanizing Artificial Intelligence: An Approach Towards Future
    Artificial intelligence is a very complex topic that has been studied by many people in different fields. Though it has been thought to be…  ( 11 min )
  • Open

    dont get that neural link bs
    submitted by /u/Ok_Base_2789 [link] [comments]  ( 85 min )
  • Open

    Galois theory without fields
    My previous post described Galois connections, and how they generalize a pattern first recognized in the context of Galois theory. This pattern can extended far afield of its initial application to fields and their extensions. For example, you could take a random variable X and think of the pair consisting of its distribution function F: […] Galois theory without fields first appeared on John D. Cook.  ( 5 min )
  • Open

    Two New Papers: Learning to Fling and Singulate Fabrics
    The system for our IROS 2022 paper on singulating layers of cloth with tactile sensing. In collaboration with my colleagues at Berkeley and CMU, we recently uploaded two papers to arXiv on robotic fabric manipulation: Efficiently Learning Single-Arm Fling Motions to Smooth Garments, for ISRR 2022. Learning to Singulate Layers of Cloth using Tactile Feedback, for IROS 2022. Robotic fabric (or cloth) manipulation is a recurring theme in my research, and these two papers continue the trend. The first paper, which we started a while back in Spring 2021, is about dynamic fabric manipulation; it can be thought of as an extension of our earlier ICRA papers on “Robots of the Lost Arc” and “Planar Robot Casting” while incorporating ideas from Huy Ha and Shuran Song’s legendary FlingBot paper. Wh…  ( 3 min )
  • Open

    The Value of Real-Time Data Visualization and Interpretation
    Data representation using graphics such as charts, plots, infographics, heat maps, bubble clouds, scatter plots, mekko charts are referred to as data visualization. Such visual displays and representations of information help communicate complex data relationships and data-driven insights in a way that makes it easy to understand and base decisions on. The post The Value of Real-Time Data Visualization and Interpretation appeared first on Data Science Central.  ( 19 min )
  • Open

    A generalized regionalization framework for geographical modelling and its application in spatial regression. (arXiv:2206.09429v2 [stat.ME] UPDATED)
    Models applied to geographic data face a trade-off between producing general results and capturing local variations due to spatial heterogeneity. Spatial modelling within carefully defined regions offers an intermediate position between global and local models. However, current spatial optimization approaches to delineate homogeneous regions consider the similarity of attribute values, thus unable to identify regions with similar data generation processes described by geographical models. We propose a generalized regionalization framework, which optimizes region delineation corresponding to a model with region-specific parameters. Within this framework, we introduce three regionalization algorithms, namely automatic zoning procedure (AZP), K-Models, and Regional-K-Models. We adopt an objective function that jointly minimizes modelling errors and the complexity of the region scheme. Results from regression experiments indicate that the K-Models algorithm reconstructs the regions better than the baseline, according to Rand index and mutual information measures. Our suggested framework contributes to better capturing processes exhibiting spatial heterogeneity and may be applied to a wide range of modelling scenarios.  ( 2 min )
    Object discovery and representation networks. (arXiv:2203.08777v3 [cs.CV] UPDATED)
    The promise of self-supervised learning (SSL) is to leverage large amounts of unlabeled data to solve complex tasks. While there has been excellent progress with simple, image-level learning, recent methods have shown the advantage of including knowledge of image structure. However, by introducing hand-crafted image segmentations to define regions of interest, or specialized augmentation strategies, these methods sacrifice the simplicity and generality that makes SSL so powerful. Instead, we propose a self-supervised learning paradigm that discovers this image structure by itself. Our method, Odin, couples object discovery and representation networks to discover meaningful image segmentations without any supervision. The resulting learning paradigm is simpler, less brittle, and more general, and achieves state-of-the-art transfer learning results for object detection and instance segmentation on COCO, and semantic segmentation on PASCAL and Cityscapes, while strongly surpassing supervised pre-training for video segmentation on DAVIS.  ( 2 min )
    Fast TreeSHAP: Accelerating SHAP Value Computation for Trees. (arXiv:2109.09847v3 [cs.LG] UPDATED)
    SHAP (SHapley Additive exPlanation) values are one of the leading tools for interpreting machine learning models, with strong theoretical guarantees (consistency, local accuracy) and a wide availability of implementations and use cases. Even though computing SHAP values takes exponential time in general, TreeSHAP takes polynomial time on tree-based models. While the speedup is significant, TreeSHAP can still dominate the computation time of industry-level machine learning solutions on datasets with millions or more entries, causing delays in post-hoc model diagnosis and interpretation service. In this paper we present two new algorithms, Fast TreeSHAP v1 and v2, designed to improve the computational efficiency of TreeSHAP for large datasets. We empirically find that Fast TreeSHAP v1 is 1.5x faster than TreeSHAP while keeping the memory cost unchanged. Similarly, Fast TreeSHAP v2 is 2.5x faster than TreeSHAP, at the cost of a slightly higher memory usage, thanks to the pre-computation of expensive TreeSHAP steps. We also show that Fast TreeSHAP v2 is well-suited for multi-time model interpretations, resulting in as high as 3x faster explanation of newly incoming samples.  ( 2 min )
    Evaluation of creating scoring opportunities for teammates in soccer via trajectory prediction. (arXiv:2206.01899v3 [cs.AI] UPDATED)
    Evaluating the individual movements for teammates in soccer players is crucial for assessing teamwork, scouting, and fan engagement. It has been said that players in a 90-min game do not have the ball for about 87 minutes on average. However, it has remained difficult to evaluate an attacking player without receiving the ball, and to reveal how movement contributes to the creation of scoring opportunities for teammates. In this paper, we evaluate players who create off-ball scoring opportunities by comparing actual movements with the reference movements generated via trajectory prediction. First, we predict the trajectories of players using a graph variational recurrent neural network that can accurately model the relationship between players and predict the long-term trajectory. Next, based on the difference in the modified off-ball evaluation index between the actual and the predicted trajectory as a reference, we evaluate how the actual movement contributes to scoring opportunity compared to the predicted movement. For verification, we examined the relationship with the annual salary, the goals, and the rating in the game by experts for all games of a team in a professional soccer league in a year. The results show that the annual salary and the proposed indicator correlated significantly, which could not be explained by the existing indicators and goals. Our results suggest the effectiveness of the proposed method as an indicator for a player without the ball to create a scoring chance for teammates.  ( 3 min )
    On generalization bounds for deep networks based on loss surface implicit regularization. (arXiv:2201.04545v2 [stat.ML] UPDATED)
    The classical statistical learning theory implies that fitting too many parameters leads to overfitting and poor performance. That modern deep neural networks generalize well despite a large number of parameters contradicts this finding and constitutes a major unsolved problem towards explaining the success of deep learning. While previous work focuses on the implicit regularization induced by stochastic gradient descent (SGD), we study here how the local geometry of the energy landscape around local minima affects the statistical properties of SGD with Gaussian gradient noise. We argue that under reasonable assumptions, the local geometry forces SGD to stay close to a low dimensional subspace and that this induces another form of implicit regularization and results in tighter bounds on the generalization error for deep neural networks. To derive generalization error bounds for neural networks, we first introduce a notion of stagnation sets around the local minima and impose a local essential convexity property of the population risk. Under these conditions, lower bounds for SGD to remain in these stagnation sets are derived. If stagnation occurs, we derive a bound on the generalization error of deep neural networks involving the spectral norms of the weight matrices but not the number of network parameters. Technically, our proofs are based on controlling the change of parameter values in the SGD iterates and local uniform convergence of the empirical loss functions based on the entropy of suitable neighborhoods around local minima.  ( 3 min )
    ShAPO: Implicit Representations for Multi-Object Shape, Appearance, and Pose Optimization. (arXiv:2207.13691v1 [cs.CV])
    Our method studies the complex task of object-centric 3D understanding from a single RGB-D observation. As it is an ill-posed problem, existing methods suffer from low performance for both 3D shape and 6D pose and size estimation in complex multi-object scenarios with occlusions. We present ShAPO, a method for joint multi-object detection, 3D textured reconstruction, 6D object pose and size estimation. Key to ShAPO is a single-shot pipeline to regress shape, appearance and pose latent codes along with the masks of each object instance, which is then further refined in a sparse-to-dense fashion. A novel disentangled shape and appearance database of priors is first learned to embed objects in their respective shape and appearance space. We also propose a novel, octree-based differentiable optimization step, allowing us to further improve object shape, pose and appearance simultaneously under the learned latent space, in an analysis-by-synthesis fashion. Our novel joint implicit textured object representation allows us to accurately identify and reconstruct novel unseen objects without having access to their 3D meshes. Through extensive experiments, we show that our method, trained on simulated indoor scenes, accurately regresses the shape, appearance and pose of novel objects in the real-world with minimal fine-tuning. Our method significantly out-performs all baselines on the NOCS dataset with an 8% absolute improvement in mAP for 6D pose estimation. Project page: https://zubair-irshad.github.io/projects/ShAPO.html  ( 3 min )
    Towards Clear Expectations for Uncertainty Estimation. (arXiv:2207.13341v1 [cs.LG])
    If Uncertainty Quantification (UQ) is crucial to achieve trustworthy Machine Learning (ML), most UQ methods suffer from disparate and inconsistent evaluation protocols. We claim this inconsistency results from the unclear requirements the community expects from UQ. This opinion paper offers a new perspective by specifying those requirements through five downstream tasks where we expect uncertainty scores to have substantial predictive power. We design these downstream tasks carefully to reflect real-life usage of ML models. On an example benchmark of 7 classification datasets, we did not observe statistical superiority of state-of-the-art intrinsic UQ methods against simple baselines. We believe that our findings question the very rationale of why we quantify uncertainty and call for a standardized protocol for UQ evaluation based on metrics proven to be relevant for the ML practitioner.  ( 2 min )
    Learning Multi-Object Dynamics with Compositional Neural Radiance Fields. (arXiv:2202.11855v3 [cs.CV] UPDATED)
    We present a method to learn compositional multi-object dynamics models from image observations based on implicit object encoders, Neural Radiance Fields (NeRFs), and graph neural networks. NeRFs have become a popular choice for representing scenes due to their strong 3D prior. However, most NeRF approaches are trained on a single scene, representing the whole scene with a global model, making generalization to novel scenes, containing different numbers of objects, challenging. Instead, we present a compositional, object-centric auto-encoder framework that maps multiple views of the scene to a set of latent vectors representing each object separately. The latent vectors parameterize individual NeRFs from which the scene can be reconstructed. Based on those latent vectors, we train a graph neural network dynamics model in the latent space to achieve compositionality for dynamics prediction. A key feature of our approach is that the latent vectors are forced to encode 3D information through the NeRF decoder, which enables us to incorporate structural priors in learning the dynamics models, making long-term predictions more stable compared to several baselines. Simulated and real world experiments show that our method can model and learn the dynamics of compositional scenes including rigid and deformable objects. Video: https://dannydriess.github.io/compnerfdyn/  ( 3 min )
    Neural Style Transfer and Unpaired Image-to-Image Translation to deal with the Domain Shift Problem on Spheroid Segmentation. (arXiv:2112.09043v2 [cs.CV] UPDATED)
    Background and objectives. Domain shift is a generalisation problem of machine learning models that occurs when the data distribution of the training set is different to the data distribution encountered by the model when it is deployed. This is common in the context of biomedical image segmentation due to the variance of experimental conditions, equipment, and capturing settings. In this work, we address this challenge by studying both neural style transfer algorithms and unpaired image-to-image translation methods in the context of the segmentation of tumour spheroids. Methods. We have illustrated the domain shift problem in the context of spheroid segmentation with 4 deep learning segmentation models that achieved an IoU over 97% when tested with images following the training distribution, but whose performance decreased up to an 84\% when applied to images captured under different conditions. In order to deal with this problem, we have explored 3 style transfer algorithms (NST, deep image analogy, and STROTSS), and 6 unpaired image-to-image translations algorithms (CycleGAN, DualGAN, ForkGAN, GANILLA, CUT, and FastCUT). These algorithms have been integrated into a high-level API that facilitates their application to other contexts where the domain-shift problem occurs. Results. We have considerably improved the performance of the 4 segmentation models when applied to images captured under different conditions by using both style transfer and image-to-image translation algorithms. In particular, there are 2 style transfer algorithms (NST and deep image analogy) and 1 unpaired image-to-image translations algorithm (CycleGAN) that improve the IoU of the models in a range from 0.24 to 76.07. Therefore, reaching a similar performance to the one obtained with the models are applied to images following the training distribution.  ( 3 min )
    Post-Train Adaptive MobileNet for Fast Anti-Spoofing. (arXiv:2207.13410v1 [cs.CV])
    Many applications require high accuracy of neural networks as well as low latency and user data privacy guaranty. Face anti-spoofing is one of such tasks. However, a single model might not give the best results for different device performance categories, while training multiple models is time consuming. In this work we present Post-Train Adaptive (PTA) block. Such a block is simple in structure and offers a drop-in replacement for the MobileNetV2 Inverted Residual block. The PTA block has multiple branches with different computation costs. The branch to execute can be selected on-demand and at runtime; thus, offering different inference times and configuration capability for multiple device tiers. Crucially, the model is trained once and can be easily reconfigured after training, even directly on a mobile device. In addition, the proposed approach shows substantially better overall performance in comparison to the original MobileNetV2 as tested on CelebA-Spoof dataset. Different PTA block configurations are sampled at training time, which also decreases overall wall-clock time needed to train the model. While we present computational results for the anti-spoofing problem, the MobileNetV2 with PTA blocks is applicable to any problem solvable with convolutional neural networks, which makes the results presented practically significant.  ( 2 min )
    Learned Label Aggregation for Weak Supervision. (arXiv:2207.13545v1 [cs.LG])
    The lack of labeled training data is the bottleneck of machine learning in many applications. To resolve the bottleneck, one promising direction is the data programming approach that aggregates different sources of weak supervision signals to generate labeled data easily. Data programming encodes each weak supervision source with a labeling function (LF), a user-provided program that predicts noisy labels. The quality of the generated labels depends on a label aggregation model that aggregates all noisy labels from all LFs to infer the ground-truth labels. Existing label aggregation methods typically rely on various assumptions and are not robust across datasets, as we will show empirically. We for the first time provide an analytical label aggregation method that makes minimum assumption and is optimal in minimizing a certain form of the averaged prediction error. Since the complexity of the analytical form is exponential, we train a model that learns to be the analytical method. Once trained, the model can be used for any unseen datasets and the model predicts the ground-truth labels for each dataset in a single forward pass in linear time. We show the model can be trained using synthetically generated data and design an effective architecture for the model. On 14 real-world datasets, our model significantly outperforms the best existing methods in both accuracy (by 3.5 points on average) and efficiency (by six times on average).  ( 3 min )
    Visualizing Confidence Intervals for Critical Point Probabilities in 2D Scalar Field Ensembles. (arXiv:2207.13661v1 [cs.HC])
    An important task in visualization is the extraction and highlighting of dominant features in data to support users in their analysis process. Topological methods are a well-known means of identifying such features in deterministic fields. However, many real-world phenomena studied today are the result of a chaotic system that cannot be fully described by a single simulation. Instead, the variability of such systems is usually captured with ensemble simulations that produce a variety of possible outcomes of the simulated process. The topological analysis of such ensemble data sets and uncertain data, in general, is less well studied. In this work, we present an approach for the computation and visual representation of confidence intervals for the occurrence probabilities of critical points in ensemble data sets. We demonstrate the added value of our approach over existing methods for critical point prediction in uncertain data on a synthetic data set and show its applicability to a data set from climate research.  ( 2 min )
    Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons. (arXiv:2202.13163v2 [stat.ML] UPDATED)
    We consider reinforcement learning (RL) methods in offline domains without additional online data collection, such as mobile health applications. Most of existing policy optimization algorithms in the computer science literature are developed in online settings where data are easy to collect or simulate. Their generalizations to mobile health applications with a pre-collected offline dataset remain unknown. The aim of this paper is to develop a novel advantage learning framework in order to efficiently use pre-collected data for policy optimization. The proposed method takes an optimal Q-estimator computed by any existing state-of-the-art RL algorithms as input, and outputs a new policy whose value is guaranteed to converge at a faster rate than the policy derived based on the initial Q-estimator. Extensive numerical experiments are conducted to back up our theoretical findings. A Python implementation of our proposed method is available at https://github.com/leyuanheart/SEAL.  ( 2 min )
    GCN-WP -- Semi-Supervised Graph Convolutional Networks for Win Prediction in Esports. (arXiv:2207.13191v1 [cs.LG])
    Win prediction is crucial to understanding skill modeling, teamwork and matchmaking in esports. In this paper we propose GCN-WP, a semi-supervised win prediction model for esports based on graph convolutional networks. This model learns the structure of an esports league over the course of a season (1 year) and makes predictions on another similar league. This model integrates over 30 features about the match and players and employs graph convolution to classify games based on their neighborhood. Our model achieves state-of-the-art prediction accuracy when compared to machine learning or skill rating models for LoL. The framework is generalizable so it can easily be extended to other multiplayer online games.  ( 2 min )
    Multi-modal Misinformation Detection: Approaches, Challenges and Opportunities. (arXiv:2203.13883v3 [cs.LG] UPDATED)
    As social media platforms are evolving from text-based forums into multi-modal environments, the nature of misinformation in social media is also changing accordingly. Taking advantage of the fact that visual modalities such as images and videos are more favorable and attractive to the users, and textual contents are sometimes skimmed carelessly, misinformation spreaders have recently targeted contextual correlations between modalities e.g., text and image. Thus, many research efforts have been put into development of automatic techniques for detecting possible cross-modal discordances in web-based media. In this work, we aim to analyze, categorize and identify existing approaches in addition to challenges and shortcomings they face in order to unearth new opportunities in furthering the research in the field of multi-modal misinformation detection.
    DynaMarks: Defending Against Deep Learning Model Extraction Using Dynamic Watermarking. (arXiv:2207.13321v1 [cs.CR])
    The functionality of a deep learning (DL) model can be stolen via model extraction where an attacker obtains a surrogate model by utilizing the responses from a prediction API of the original model. In this work, we propose a novel watermarking technique called DynaMarks to protect the intellectual property (IP) of DL models against such model extraction attacks in a black-box setting. Unlike existing approaches, DynaMarks does not alter the training process of the original model but rather embeds watermark into a surrogate model by dynamically changing the output responses from the original model prediction API based on certain secret parameters at inference runtime. The experimental outcomes on Fashion MNIST, CIFAR-10, and ImageNet datasets demonstrate the efficacy of DynaMarks scheme to watermark surrogate models while preserving the accuracies of the original models deployed in edge devices. In addition, we also perform experiments to evaluate the robustness of DynaMarks against various watermark removal strategies, thus allowing a DL model owner to reliably prove model ownership.
    A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck. (arXiv:2207.13529v1 [cs.LG])
    We propose a VAE for Transformers by developing a variational information bottleneck regulariser for Transformer embeddings. We formalise the embedding space of Transformer encoders as mixture probability distributions, and use Bayesian nonparametrics to derive a nonparametric variational information bottleneck (NVIB) for such attention-based embeddings. The variable number of mixture components supported by nonparametric methods captures the variable number of vectors supported by attention, and the exchangeability of our nonparametric distributions captures the permutation invariance of attention. This allows NVIB to regularise the number of vectors accessible with attention, as well as the amount of information in individual vectors. By regularising the cross-attention of a Transformer encoder-decoder with NVIB, we propose a nonparametric variational autoencoder (NVAE). Initial experiments on training a NVAE on natural language text show that the induced embedding space has the desired properties of a VAE for Transformers.
    Open Source Vizier: Distributed Infrastructure and API for Reliable and Flexible Blackbox Optimization. (arXiv:2207.13676v1 [cs.LG])
    Vizier is the de-facto blackbox and hyperparameter optimization service across Google, having optimized some of Google's largest products and research efforts. To operate at the scale of tuning thousands of users' critical systems, Google Vizier solved key design challenges in providing multiple different features, while remaining fully fault-tolerant. In this paper, we introduce Open Source (OSS) Vizier, a standalone Python-based interface for blackbox optimization and research, based on the Google-internal Vizier infrastructure and framework. OSS Vizier provides an API capable of defining and solving a wide variety of optimization problems, including multi-metric, early stopping, transfer learning, and conditional search. Furthermore, it is designed to be a distributed system that assures reliability, and allows multiple parallel evaluations of the user's objective function. The flexible RPC-based infrastructure allows users to access OSS Vizier from binaries written in any language. OSS Vizier also provides a back-end ("Pythia") API that gives algorithm authors a way to interface new algorithms with the core OSS Vizier system. OSS Vizier is available at https://github.com/google/vizier.
    Bi-SimCut: A Simple Strategy for Boosting Neural Machine Translation. (arXiv:2206.02368v2 [cs.CL] UPDATED)
    We introduce Bi-SimCut: a simple but effective training strategy to boost neural machine translation (NMT) performance. It consists of two procedures: bidirectional pretraining and unidirectional finetuning. Both procedures utilize SimCut, a simple regularization method that forces the consistency between the output distributions of the original and the cutoff sentence pairs. Without leveraging extra dataset via back-translation or integrating large-scale pretrained model, Bi-SimCut achieves strong translation performance across five translation benchmarks (data sizes range from 160K to 20.2M): BLEU scores of 31.16 for en -> de and 38.37 for de -> en on the IWSLT14 dataset, 30.78 for en -> de and 35.15 for de -> en on the WMT14 dataset, and 27.17 for zh -> en on the WMT17 dataset. SimCut is not a new method, but a version of Cutoff (Shen et al., 2020) simplified and adapted for NMT, and it could be considered as a perturbation-based method. Given the universality and simplicity of SimCut and Bi-SimCut, we believe they can serve as strong baselines for future NMT research.
    Membership Inference Attacks via Adversarial Examples. (arXiv:2207.13572v1 [cs.LG])
    The raise of machine learning and deep learning led to significant improvement in several domains. This change is supported by both the dramatic rise in computation power and the collection of large datasets. Such massive datasets often include personal data which can represent a threat to privacy. Membership inference attacks are a novel direction of research which aims at recovering training data used by a learning algorithm. In this paper, we develop a mean to measure the leakage of training data leveraging a quantity appearing as a proxy of the total variation of a trained model near its training samples. We extend our work by providing a novel defense mechanism. Our contributions are supported by empirical evidence through convincing numerical experiments.
    INTERACT: Achieving Low Sample and Communication Complexities in Decentralized Bilevel Learning over Networks. (arXiv:2207.13283v1 [cs.LG])
    In recent years, decentralized bilevel optimization problems have received increasing attention in the networking and machine learning communities thanks to their versatility in modeling decentralized learning problems over peer-to-peer networks (e.g., multi-agent meta-learning, multi-agent reinforcement learning, personalized training, and Byzantine-resilient learning). However, for decentralized bilevel optimization over peer-to-peer networks with limited computation and communication capabilities, how to achieve low sample and communication complexities are two fundamental challenges that remain under-explored so far. In this paper, we make the first attempt to investigate the class of decentralized bilevel optimization problems with nonconvex and strongly-convex structure corresponding to the outer and inner subproblems, respectively. Our main contributions in this paper are two-fold: i) We first propose a deterministic algorithm called INTERACT (inner-gradient-descent-outer-tracked-gradient) that requires the sample complexity of $\mathcal{O}(n \epsilon^{-1})$ and communication complexity of $\mathcal{O}(\epsilon^{-1})$ to solve the bilevel optimization problem, where $n$ and $\epsilon > 0$ are the number of samples at each agent and the desired stationarity gap, respectively. ii) To relax the need for full gradient evaluations in each iteration, we propose a stochastic variance-reduced version of INTERACT (SVR-INTERACT), which improves the sample complexity to $\mathcal{O}(\sqrt{n} \epsilon^{-1})$ while achieving the same communication complexity as the deterministic algorithm. To our knowledge, this work is the first that achieves both low sample and communication complexities for solving decentralized bilevel optimization problems over networks. Our numerical experiments also corroborate our theoretical findings.
    A new perspective on the approximation capability of GNNs. (arXiv:2106.08992v4 [cs.LG] UPDATED)
    Graph Neural Networks (GNNs) are a broad class of connectionist models for graph processing. Recent studies have shown that GNNs can approximate any function on graphs, modulo the equivalence relation on nodes defined by the Weisfeiler - Lehman test. However, these results suffer from some limitations, both because they were derived using the Stone-Weierstrass theorem - which is existential in nature -, and because they assume that the target function to be approximated must be continuous. In this paper, we propose an alternative way to demonstrate the approximation capability of GNNs that overcomes these limitations. In particular, some new results are proved, which allow to: (1) define GNN architectures capable of obtaining a given approximation; (2) show that the Weisfeiler-Lehman test converges in r+1 steps, where r is the diameter of the graph; (3) derive a formal relationship between the Weisfeiler-Lehman test and the unfolding trees, that is trees that can be built by visiting the graph starting from a given node. These results provide a more comprehensive understanding of the approximation power of GNNs, definitely showing that the 1-WL test and the unfolding tree concepts can be used interchangeably to study the their expressiveness.
    Cascade Decoders-Based Autoencoders for Image Reconstruction. (arXiv:2107.00002v2 [cs.LG] UPDATED)
    Autoencoders are composed of coding and decoding units, hence they hold the inherent potential of high-performance data compression and signal compressed sensing. The main disadvantages of current autoencoders comprise the following several aspects: the research objective is not data reconstruction but feature representation; the performance evaluation of data recovery is neglected; it is hard to achieve lossless data reconstruction by pure autoencoders, even by pure deep learning. This paper aims for image reconstruction of autoencoders, employs cascade decoders-based autoencoders, perfects the performance of image reconstruction, approaches gradually lossless image recovery, and provides solid theory and application basis for autoencoders-based image compression and compressed sensing. The proposed serial decoders-based autoencoders include the architectures of multi-level decoders and the related optimization algorithms. The cascade decoders consist of general decoders, residual decoders, adversarial decoders and their combinations. It is evaluated by the experimental results that the proposed autoencoders outperform the classical autoencoders in the performance of image reconstruction.
    Detecting Concept Drift in the Presence of Sparsity -- A Case Study of Automated Change Risk Assessment System. (arXiv:2207.13287v1 [cs.LG])
    Missing values, widely called as \textit{sparsity} in literature, is a common characteristic of many real-world datasets. Many imputation methods have been proposed to address this problem of data incompleteness or sparsity. However, the accuracy of a data imputation method for a given feature or a set of features in a dataset is highly dependent on the distribution of the feature values and its correlation with other features. Another problem that plagues industry deployments of machine learning (ML) solutions is concept drift detection, which becomes more challenging in the presence of missing values. Although data imputation and concept drift detection have been studied extensively, little work has attempted a combined study of the two phenomena, i.e., concept drift detection in the presence of sparsity. In this work, we carry out a systematic study of the following: (i) different patterns of missing values, (ii) various statistical and ML based data imputation methods for different kinds of sparsity, (iii) several concept drift detection methods, (iv) practical analysis of the various drift detection metrics, (v) selecting the best concept drift detector given a dataset with missing values based on the different metrics. We first analyze it on synthetic data and publicly available datasets, and finally extend the findings to our deployed solution of automated change risk assessment system. One of the major findings from our empirical study is the absence of supremacy of any one concept drift detection method across all the relevant metrics. Therefore, we adopt a majority voting based ensemble of concept drift detectors for abrupt and gradual concept drifts. Our experiments show optimal or near optimal performance can be achieved for this ensemble method across all the metrics.
    Time Series Forecasting Models Copy the Past: How to Mitigate. (arXiv:2207.13441v1 [cs.LG])
    Time series forecasting is at the core of important application domains posing significant challenges to machine learning algorithms. Recently neural network architectures have been widely applied to the problem of time series forecasting. Most of these models are trained by minimizing a loss function that measures predictions' deviation from the real values. Typical loss functions include mean squared error (MSE) and mean absolute error (MAE). In the presence of noise and uncertainty, neural network models tend to replicate the last observed value of the time series, thus limiting their applicability to real-world data. In this paper, we provide a formal definition of the above problem and we also give some examples of forecasts where the problem is observed. We also propose a regularization term penalizing the replication of previously seen values. We evaluate the proposed regularization term both on synthetic and real-world datasets. Our results indicate that the regularization term mitigates to some extent the aforementioned problem and gives rise to more robust models.
    On Missing Labels, Long-tails and Propensities in Extreme Multi-label Classification. (arXiv:2207.13186v1 [cs.LG])
    The propensity model introduced by Jain et al. 2016 has become a standard approach for dealing with missing and long-tail labels in extreme multi-label classification (XMLC). In this paper, we critically revise this approach showing that despite its theoretical soundness, its application in contemporary XMLC works is debatable. We exhaustively discuss the flaws of the propensity-based approach, and present several recipes, some of them related to solutions used in search engines and recommender systems, that we believe constitute promising alternatives to be followed in XMLC.
    Online Continual Learning with Contrastive Vision Transformer. (arXiv:2207.13516v1 [cs.LG])
    Online continual learning (online CL) studies the problem of learning sequential tasks from an online data stream without task boundaries, aiming to adapt to new data while alleviating catastrophic forgetting on the past tasks. This paper proposes a framework Contrastive Vision Transformer (CVT), which designs a focal contrastive learning strategy based on a transformer architecture, to achieve a better stability-plasticity trade-off for online CL. Specifically, we design a new external attention mechanism for online CL that implicitly captures previous tasks' information. Besides, CVT contains learnable focuses for each class, which could accumulate the knowledge of previous classes to alleviate forgetting. Based on the learnable focuses, we design a focal contrastive loss to rebalance contrastive learning between new and past classes and consolidate previously learned representations. Moreover, CVT contains a dual-classifier structure for decoupling learning current classes and balancing all observed classes. The extensive experimental results show that our approach achieves state-of-the-art performance with even fewer parameters on online CL benchmarks and effectively alleviates the catastrophic forgetting.
    Unsupervised Training for Neural TSP Solver. (arXiv:2207.13667v1 [cs.LG])
    There has been a growing number of machine learning methods for approximately solving the travelling salesman problem. However, these methods often require solved instances for training or use complex reinforcement learning approaches that need a large amount of tuning. To avoid these problems, we introduce a novel unsupervised learning approach. We use a relaxation of an integer linear program for TSP to construct a loss function that does not require correct instance labels. With variable discretization, its minimum coincides with the optimal or near-optimal solution. Furthermore, this loss function is differentiable and thus can be used to train neural networks directly. We use our loss function with a Graph Neural Network and design controlled experiments on both Euclidean and asymmetric TSP. Our approach has the advantage over supervised learning of not requiring large labelled datasets. In addition, the performance of our approach surpasses reinforcement learning for asymmetric TSP and is comparable to reinforcement learning for Euclidean instances. Our approach is also more stable and easier to train than reinforcement learning.
    Information We Can Extract About a User From 'One Minute Mobile Application Usage'. (arXiv:2207.13222v1 [cs.LG])
    Understanding human behavior is an important task and has applications in many domains such as targeted advertisement, health analytics, security, and entertainment, etc. For this purpose, designing a system for activity recognition (AR) is important. However, since every human can have different behaviors, understanding and analyzing common patterns become a challenging task. Since smartphones are easily available to every human being in the modern world, using them to track the human activities becomes possible. In this paper, we extracted different human activities using accelerometer, magnetometer, and gyroscope sensors of android smartphones by building an android mobile applications. Using different social media applications, such as Facebook, Instagram, Whatsapp, and Twitter, we extracted the raw sensor values along with the attributes of $29$ subjects along with their attributes (class labels) such as age, gender, and left/right/both hands application usage. We extract features from the raw signals and use them to perform classification using different machine learning (ML) algorithms. Using statistical analysis, we show the importance of different features towards the prediction of class labels. In the end, we use the trained ML model on our data to extract unknown features from a well known activity recognition data from UCI repository, which highlights the potential of privacy breach using ML models. This security analysis could help researchers in future to take appropriate steps to preserve the privacy of human subjects.
    Do Quantum Circuit Born Machines Generalize?. (arXiv:2207.13645v1 [quant-ph])
    In recent proposals of quantum circuit models for generative tasks, the discussion about their performance has been limited to their ability to reproduce a known target distribution. For example, expressive model families such as Quantum Circuit Born Machines (QCBMs) have been almost entirely evaluated on their capability to learn a given target distribution with high accuracy. While this aspect may be ideal for some tasks, it limits the scope of a generative model's assessment to its ability to memorize data rather than generalize. As a result, there has been little understanding of a model's generalization performance and the relation between such capability and the resource requirements, e.g., the circuit depth and the amount of training data. In this work, we leverage upon a recently proposed generalization evaluation framework to begin addressing this knowledge gap. We first investigate the QCBM's learning process of a cardinality-constrained distribution and see an increase in generalization performance while increasing the circuit depth. In the 12-qubit example presented here, we observe that with as few as 30% of the valid patterns as the training set, the QCBM exhibits the best generalization performance toward generating unseen and valid patterns. Lastly, we assess the QCBM's ability to generalize not only to valid features, but to high-quality bitstrings distributed according to an adequately biased distribution. We see that the QCBM is able to effectively learn the bias and generate unseen samples with higher quality than those in the training set. To the best of our knowledge, this is the first work in the literature that presents the QCBM's generalization performance as an integral evaluation metric for quantum generative models, and demonstrates the QCBM's ability to generalize to high-quality, desired novel samples.
    Handling Hard Affine SDP Shape Constraints in RKHSs. (arXiv:2101.01519v2 [stat.ML] UPDATED)
    Shape constraints, such as non-negativity, monotonicity, convexity or supermodularity, play a key role in various applications of machine learning and statistics. However, incorporating this side information into predictive models in a hard way (for example at all points of an interval) for rich function classes is a notoriously challenging problem. We propose a unified and modular convex optimization framework, relying on second-order cone (SOC) tightening, to encode hard affine SDP constraints on function derivatives, for models belonging to vector-valued reproducing kernel Hilbert spaces (vRKHSs). The modular nature of the proposed approach allows to simultaneously handle multiple shape constraints, and to tighten an infinite number of constraints into finitely many. We prove the convergence of the proposed scheme and that of its adaptive variant, leveraging geometric properties of vRKHSs. Due to the covering-based construction of the tightening, the method is particularly well-suited to tasks with small to moderate input dimensions. The efficiency of the approach is illustrated in the context of shape optimization, robotics and econometrics.
    Fault Detection and Classification of Aerospace Sensors using a VGG16-based Deep Neural Network. (arXiv:2207.13267v1 [cs.CV])
    Compared with traditional model-based fault detection and classification (FDC) methods, deep neural networks (DNN) prove to be effective for the aerospace sensors FDC problems. However, time being consumed in training the DNN is excessive, and explainability analysis for the FDC neural network is still underwhelming. A concept known as imagefication-based intelligent FDC has been studied in recent years. This concept advocates to stack the sensors measurement data into an image format, the sensors FDC issue is then transformed to abnormal regions detection problem on the stacked image, which may well borrow the recent advances in the machine vision vision realm. Although promising results have been claimed in the imagefication-based intelligent FDC researches, due to the low size of the stacked image, small convolutional kernels and shallow DNN layers were used, which hinders the FDC performance. In this paper, we first propose a data augmentation method which inflates the stacked image to a larger size (correspondent to the VGG16 net developed in the machine vision realm). The FDC neural network is then trained via fine-tuning the VGG16 directly. To truncate and compress the FDC net size (hence its running time), we perform model pruning on the fine-tuned net. Class activation mapping (CAM) method is also adopted for explainability analysis of the FDC net to verify its internal operations. Via data augmentation, fine-tuning from VGG16, and model pruning, the FDC net developed in this paper claims an FDC accuracy 98.90% across 4 aircraft at 5 flight conditions (running time 26 ms). The CAM results also verify the FDC net w.r.t. its internal operations.
    Learning the Evolution of Correlated Stochastic Power System Dynamics. (arXiv:2207.13310v1 [cs.LG])
    A machine learning technique is proposed for quantifying uncertainty in power system dynamics with spatiotemporally correlated stochastic forcing. We learn one-dimensional linear partial differential equations for the probability density functions of real-valued quantities of interest. The method is suitable for high-dimensional systems and helps to alleviate the curse of dimensionality.
    Safe and Robust Experience Sharing for Deterministic Policy Gradient Algorithms. (arXiv:2207.13453v1 [cs.LG])
    Learning in high dimensional continuous tasks is challenging, mainly when the experience replay memory is very limited. We introduce a simple yet effective experience sharing mechanism for deterministic policies in continuous action domains for the future off-policy deep reinforcement learning applications in which the allocated memory for the experience replay buffer is limited. To overcome the extrapolation error induced by learning from other agents' experiences, we facilitate our algorithm with a novel off-policy correction technique without any action probability estimates. We test the effectiveness of our method in challenging OpenAI Gym continuous control tasks and conclude that it can achieve a safe experience sharing across multiple agents and exhibits a robust performance when the replay memory is strictly limited.
    Correlations Between COVID-19 and Dengue. (arXiv:2207.13561v1 [q-bio.PE])
    A dramatic increase in the number of outbreaks of Dengue has recently been reported, and climate change is likely to extend the geographical spread of the disease. In this context, this paper shows how a neural network approach can incorporate Dengue and COVID-19 data as well as external factors (such as social behaviour or climate variables), to develop predictive models that could improve our knowledge and provide useful tools for health policy makers. Through the use of neural networks with different social and natural parameters, in this paper we define a Correlation Model through which we show that the number of cases of COVID-19 and Dengue have very similar trends. We then illustrate the relevance of our model by extending it to a Long short-term memory model (LSTM) that incorporates both diseases, and using this to estimate Dengue infections via COVID-19 data in countries that lack sufficient Dengue data.
    VDL-Surrogate: A View-Dependent Latent-based Model for Parameter Space Exploration of Ensemble Simulations. (arXiv:2207.13091v1 [cs.GR])
    We propose VDL-Surrogate, a view-dependent neural-network-latent-based surrogate model for parameter space exploration of ensemble simulations that allows high-resolution visualizations and user-specified visual mappings. Surrogate-enabled parameter space exploration allows domain scientists to preview simulation results without having to run a large number of computationally costly simulations. Limited by computational resources, however, existing surrogate models may not produce previews with sufficient resolution for visualization and analysis. To improve the efficient use of computational resources and support high-resolution exploration, we perform ray casting from different viewpoints to collect samples and produce compact latent representations. This latent encoding process reduces the cost of surrogate model training while maintaining the output quality. In the model training stage, we select viewpoints to cover the whole viewing sphere and train corresponding VDL-Surrogate models for the selected viewpoints. In the model inference stage, we predict the latent representations at previously selected viewpoints and decode the latent representations to data space. For any given viewpoint, we make interpolations over decoded data at selected viewpoints and generate visualizations with user-specified visual mappings. We show the effectiveness and efficiency of VDL-Surrogate in cosmological and ocean simulations with quantitative and qualitative evaluations. Source code is publicly available at \url{https://github.com/trainsn/VDL-Surrogate}.
    Concurrent Subsidiary Supervision for Unsupervised Source-Free Domain Adaptation. (arXiv:2207.13247v1 [cs.CV])
    The prime challenge in unsupervised domain adaptation (DA) is to mitigate the domain shift between the source and target domains. Prior DA works show that pretext tasks could be used to mitigate this domain shift by learning domain invariant representations. However, in practice, we find that most existing pretext tasks are ineffective against other established techniques. Thus, we theoretically analyze how and when a subsidiary pretext task could be leveraged to assist the goal task of a given DA problem and develop objective subsidiary task suitability criteria. Based on this criteria, we devise a novel process of sticker intervention and cast sticker classification as a supervised subsidiary DA problem concurrent to the goal task unsupervised DA. Our approach not only improves goal task adaptation performance, but also facilitates privacy-oriented source-free DA i.e. without concurrent source-target access. Experiments on the standard Office-31, Office-Home, DomainNet, and VisDA benchmarks demonstrate our superiority for both single-source and multi-source source-free DA. Our approach also complements existing non-source-free works, achieving leading performance.
    A Proper Orthogonal Decomposition approach for parameters reduction of Single Shot Detector networks. (arXiv:2207.13551v1 [cs.CV])
    As a major breakthrough in artificial intelligence and deep learning, Convolutional Neural Networks have achieved an impressive success in solving many problems in several fields including computer vision and image processing. Real-time performance, robustness of algorithms and fast training processes remain open problems in these contexts. In addition object recognition and detection are challenging tasks for resource-constrained embedded systems, commonly used in the industrial sector. To overcome these issues, we propose a dimensionality reduction framework based on Proper Orthogonal Decomposition, a classical model order reduction technique, in order to gain a reduction in the number of hyperparameters of the net. We have applied such framework to SSD300 architecture using PASCAL VOC dataset, demonstrating a reduction of the network dimension and a remarkable speedup in the fine-tuning of the network in a transfer learning context.
    Unsupervised Learning under Latent Label Shift. (arXiv:2207.13179v1 [cs.LG])
    What sorts of structure might enable a learner to discover classes from unlabeled data? Traditional approaches rely on feature-space similarity and heroic assumptions on the data. In this paper, we introduce unsupervised learning under Latent Label Shift (LLS), where we have access to unlabeled data from multiple domains such that the label marginals $p_d(y)$ can shift across domains but the class conditionals $p(\mathbf{x}|y)$ do not. This work instantiates a new principle for identifying classes: elements that shift together group together. For finite input spaces, we establish an isomorphism between LLS and topic modeling: inputs correspond to words, domains to documents, and labels to topics. Addressing continuous data, we prove that when each label's support contains a separable region, analogous to an anchor word, oracle access to $p(d|\mathbf{x})$ suffices to identify $p_d(y)$ and $p_d(y|\mathbf{x})$ up to permutation. Thus motivated, we introduce a practical algorithm that leverages domain-discriminative models as follows: (i) push examples through domain discriminator $p(d|\mathbf{x})$; (ii) discretize the data by clustering examples in $p(d|\mathbf{x})$ space; (iii) perform non-negative matrix factorization on the discrete data; (iv) combine the recovered $p(y|d)$ with the discriminator outputs $p(d|\mathbf{x})$ to compute $p_d(y|x) \; \forall d$. With semi-synthetic experiments, we show that our algorithm can leverage domain information to improve state of the art unsupervised classification methods. We reveal a failure mode of standard unsupervised classification methods when feature-space similarity does not indicate true groupings, and show empirically that our method better handles this case. Our results establish a deep connection between distribution shift and topic modeling, opening promising lines for future work.
    XADLiME: eXplainable Alzheimer's Disease Likelihood Map Estimation via Clinically-guided Prototype Learning. (arXiv:2207.13223v1 [cs.LG])
    Diagnosing Alzheimer's disease (AD) involves a deliberate diagnostic process owing to its innate traits of irreversibility with subtle and gradual progression. These characteristics make AD biomarker identification from structural brain imaging (e.g., structural MRI) scans quite challenging. Furthermore, there is a high possibility of getting entangled with normal aging. We propose a novel deep-learning approach through eXplainable AD Likelihood Map Estimation (XADLiME) for AD progression modeling over 3D sMRIs using clinically-guided prototype learning. Specifically, we establish a set of topologically-aware prototypes onto the clusters of latent clinical features, uncovering an AD spectrum manifold. We then measure the similarities between latent clinical features and well-established prototypes, estimating a "pseudo" likelihood map. By considering this pseudo map as an enriched reference, we employ an estimating network to estimate the AD likelihood map over a 3D sMRI scan. Additionally, we promote the explainability of such a likelihood map by revealing a comprehensible overview from two perspectives: clinical and morphological. During the inference, this estimated likelihood map served as a substitute over unseen sMRI scans for effectively conducting the downstream task while providing thorough explainable states.
    PI-ARS: Accelerating Evolution-Learned Visual-Locomotion with Predictive Information Representations. (arXiv:2207.13224v1 [cs.RO])
    Evolution Strategy (ES) algorithms have shown promising results in training complex robotic control policies due to their massive parallelism capability, simple implementation, effective parameter-space exploration, and fast training time. However, a key limitation of ES is its scalability to large capacity models, including modern neural network architectures. In this work, we develop Predictive Information Augmented Random Search (PI-ARS) to mitigate this limitation by leveraging recent advancements in representation learning to reduce the parameter search space for ES. Namely, PI-ARS combines a gradient-based representation learning technique, Predictive Information (PI), with a gradient-free ES algorithm, Augmented Random Search (ARS), to train policies that can process complex robot sensory inputs and handle highly nonlinear robot dynamics. We evaluate PI-ARS on a set of challenging visual-locomotion tasks where a quadruped robot needs to walk on uneven stepping stones, quincuncial piles, and moving platforms, as well as to complete an indoor navigation task. Across all tasks, PI-ARS demonstrates significantly better learning efficiency and performance compared to the ARS baseline. We further validate our algorithm by demonstrating that the learned policies can successfully transfer to a real quadruped robot, for example, achieving a 100% success rate on the real-world stepping stone environment, dramatically improving prior results achieving 40% success.
    Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition. (arXiv:2207.13259v1 [cs.CV])
    Transformer-based methods have recently achieved great advancement on 2D image-based vision tasks. For 3D video-based tasks such as action recognition, however, directly applying spatiotemporal transformers on video data will bring heavy computation and memory burdens due to the largely increased number of patches and the quadratic complexity of self-attention computation. How to efficiently and effectively model the 3D self-attention of video data has been a great challenge for transformers. In this paper, we propose a Temporal Patch Shift (TPS) method for efficient 3D self-attention modeling in transformers for video-based action recognition. TPS shifts part of patches with a specific mosaic pattern in the temporal dimension, thus converting a vanilla spatial self-attention operation to a spatiotemporal one with little additional cost. As a result, we can compute 3D self-attention using nearly the same computation and memory cost as 2D self-attention. TPS is a plug-and-play module and can be inserted into existing 2D transformer models to enhance spatiotemporal feature learning. The proposed method achieves competitive performance with state-of-the-arts on Something-something V1 & V2, Diving-48, and Kinetics400 while being much more efficient on computation and memory cost. The source code of TPS can be found at https://github.com/MartinXM/TPS.
    Statistical Keystroke Synthesis for Improved Bot Detection. (arXiv:2207.13394v1 [cs.LG])
    This work proposes two statistical approaches for the synthesis of keystroke biometric data based on Universal and User-dependent Models. Both approaches are validated on the bot detection task, using the keystroke synthetic data to better train the systems. Our experiments include a dataset with 136 million keystroke events from 168,000 subjects. We have analyzed the performance of the two synthesis approaches through qualitative and quantitative experiments. Different bot detectors are considered based on two supervised classifiers (Support Vector Machine and Long Short-Term Memory network) and a learning framework including human and generated samples. Our results prove that the proposed statistical approaches are able to generate realistic human-like synthetic keystroke samples. Also, the classification results suggest that in scenarios with large labeled data, these synthetic samples can be detected with high accuracy. However, in few-shot learning scenarios it represents an important challenge.
    Initial Orbit Determination for the CR3BP using Particle Swarm Optimization. (arXiv:2207.13175v1 [physics.comp-ph])
    This work utilizes a particle swarm optimizer (PSO) for initial orbit determination for a chief and deputy scenario in the circular restricted three-body problem (CR3BP). The PSO is used to minimize the difference between actual and estimated observations and knowledge of the chief's position with known CR3BP dynamics to determine the deputy's initial state. Convergence is achieved through limiting particle starting positions to feasible positions based on the known chief position, and sensor constraints. Parallel and GPU processing methods are used to improve computation time and provide an accurate initial state estimate for a variety of cislunar orbit geometries.
    Sliced Wasserstein Variational Inference. (arXiv:2207.13177v1 [stat.ML])
    Variational Inference approximates an unnormalized distribution via the minimization of Kullback-Leibler (KL) divergence. Although this divergence is efficient for computation and has been widely used in applications, it suffers from some unreasonable properties. For example, it is not a proper metric, i.e., it is non-symmetric and does not preserve the triangle inequality. On the other hand, optimal transport distances recently have shown some advantages over KL divergence. With the help of these advantages, we propose a new variational inference method by minimizing sliced Wasserstein distance, a valid metric arising from optimal transport. This sliced Wasserstein distance can be approximated simply by running MCMC but without solving any optimization problem. Our approximation also does not require a tractable density function of variational distributions so that approximating families can be amortized by generators like neural networks. Furthermore, we provide an analysis of the theoretical properties of our method. Experiments on synthetic and real data are illustrated to show the performance of the proposed method.
    One Simple Trick to Fix Your Bayesian Neural Network. (arXiv:2207.13167v1 [stat.ML])
    One of the most popular estimation methods in Bayesian neural networks (BNN) is mean-field variational inference (MFVI). In this work, we show that neural networks with ReLU activation function induce posteriors, that are hard to fit with MFVI. We provide a theoretical justification for this phenomenon, study it empirically, and report the results of a series of experiments to investigate the effect of activation function on the calibration of BNNs. We find that using Leaky ReLU activations leads to more Gaussian-like weight posteriors and achieves a lower expected calibration error (ECE) than its ReLU-based counterpart.
    Faster online calibration without randomization: interval forecasts and the power of two choices. (arXiv:2204.13087v2 [cs.LG] UPDATED)
    We study the problem of making calibrated probabilistic forecasts for a binary sequence generated by an adversarial nature. Following the seminal paper of Foster and Vohra (1998), nature is often modeled as an adaptive adversary who sees all activity of the forecaster except the randomization that the forecaster may deploy. A number of papers have proposed randomized forecasting strategies that achieve an $\epsilon$-calibration error rate of $O(1/\sqrt{T})$, which we prove is tight in general. On the other hand, it is well known that it is not possible to be calibrated without randomization, or if nature also sees the forecaster's randomization; in both cases the calibration error could be $\Omega(1)$. Inspired by the equally seminal works on the "power of two choices" and imprecise probability theory, we study a small variant of the standard online calibration problem. The adversary gives the forecaster the option of making two nearby probabilistic forecasts, or equivalently an interval forecast of small width, and the endpoint closest to the revealed outcome is used to judge calibration. This power of two choices, or imprecise forecast, accords the forecaster with significant power -- we show that a faster $\epsilon$-calibration rate of $O(1/T)$ can be achieved even without deploying any randomization.  ( 3 min )
    Time to augment contrastive learning. (arXiv:2207.13492v1 [cs.LG])
    Biological vision systems are unparalleled in their ability to learn visual representations without supervision. In machine learning, contrastive learning (CL) has led to major advances in forming object representations in an unsupervised fashion. These systems learn representations invariant to augmentation operations over images, like cropping or flipping. In contrast, biological vision systems exploit the temporal structure of the visual experience. This gives access to augmentations not commonly used in CL, like watching the same object from multiple viewpoints or against different backgrounds. Here, we systematically investigate and compare the potential benefits of such time-based augmentations for learning object categories. Our results show that time-based augmentations achieve large performance gains over state-of-the-art image augmentations. Specifically, our analyses reveal that: 1) 3-D object rotations drastically improve the learning of object categories; 2) viewing objects against changing backgrounds is vital for learning to discard background-related information. Overall, we conclude that time-based augmentations can greatly improve contrastive learning, narrowing the gap between artificial and biological vision systems.  ( 2 min )
    Fairness and Randomness in Machine Learning: Statistical Independence and Relativization. (arXiv:2207.13596v1 [cs.LG])
    Fair Machine Learning endeavors to prevent unfairness arising in the context of machine learning applications embedded in society. Despite the variety of definitions of fairness and proposed "fair algorithms", there remain unresolved conceptual problems regarding fairness. In this paper, we argue that randomness and fairness can be considered equivalent concepts in machine learning. We obtain a relativized notion of randomness expressed as statistical independence by appealing to Von Mises' century-old foundations for probability. Via fairness notions in machine learning, which are expressed as statistical independence as well, we then link the ante randomness assumptions about the data to the ex post requirements for fair predictions. This connection proves fruitful: we use it to argue that randomness and fairness are essentially relative and that randomness should reflect its nature as a modeling assumption in machine learning.  ( 2 min )
    Perception-Aware Attack: Creating Adversarial Music via Reverse-Engineering Human Perception. (arXiv:2207.13192v1 [cs.SD])
    Recently, adversarial machine learning attacks have posed serious security threats against practical audio signal classification systems, including speech recognition, speaker recognition, and music copyright detection. Previous studies have mainly focused on ensuring the effectiveness of attacking an audio signal classifier via creating a small noise-like perturbation on the original signal. It is still unclear if an attacker is able to create audio signal perturbations that can be well perceived by human beings in addition to its attack effectiveness. This is particularly important for music signals as they are carefully crafted with human-enjoyable audio characteristics. In this work, we formulate the adversarial attack against music signals as a new perception-aware attack framework, which integrates human study into adversarial attack design. Specifically, we conduct a human study to quantify the human perception with respect to a change of a music signal. We invite human participants to rate their perceived deviation based on pairs of original and perturbed music signals, and reverse-engineer the human perception process by regression analysis to predict the human-perceived deviation given a perturbed signal. The perception-aware attack is then formulated as an optimization problem that finds an optimal perturbation signal to minimize the prediction of perceived deviation from the regressed human perception model. We use the perception-aware framework to design a realistic adversarial music attack against YouTube's copyright detector. Experiments show that the perception-aware attack produces adversarial music with significantly better perceptual quality than prior work.  ( 3 min )
    TINYCD: A (Not So) Deep Learning Model For Change Detection. (arXiv:2207.13159v1 [cs.CV])
    The aim of change detection (CD) is to detect changes occurred in the same area by comparing two images of that place taken at different times. The challenging part of the CD is to keep track of the changes the user wants to highlight, such as new buildings, and to ignore changes due to external factors such as environmental, lighting condition, fog or seasonal changes. Recent developments in the field of deep learning enabled researchers to achieve outstanding performance in this area. In particular, different mechanisms of space-time attention allowed to exploit the spatial features that are extracted from the models and to correlate them also in a temporal way by exploiting both the available images. The downside is that the models have become increasingly complex and large, often unfeasible for edge applications. These are limitations when the models must be applied to the industrial field or in applications requiring real-time performances. In this work we propose a novel model, called TinyCD, demonstrating to be both lightweight and effective, able to achieve performances comparable or even superior to the current state of the art with 13-150X fewer parameters. In our approach we have exploited the importance of low-level features to compare images. To do this, we use only few backbone blocks. This strategy allow us to keep the number of network parameters low. To compose the features extracted from the two images, we introduce a novel, economical in terms of parameters, mixing block capable of cross correlating features in both space and time domains. Finally, to fully exploit the information contained in the computed features, we define the PW-MLP block able to perform a pixel wise classification. Source code, models and results are available here: https://github.com/AndreaCodegoni/Tiny_model_4_CD
    LGV: Boosting Adversarial Example Transferability from Large Geometric Vicinity. (arXiv:2207.13129v1 [cs.LG])
    We propose transferability from Large Geometric Vicinity (LGV), a new technique to increase the transferability of black-box adversarial attacks. LGV starts from a pretrained surrogate model and collects multiple weight sets from a few additional training epochs with a constant and high learning rate. LGV exploits two geometric properties that we relate to transferability. First, models that belong to a wider weight optimum are better surrogates. Second, we identify a subspace able to generate an effective surrogate ensemble among this wider optimum. Through extensive experiments, we show that LGV alone outperforms all (combinations of) four established test-time transformations by 1.8 to 59.9 percentage points. Our findings shed new light on the importance of the geometry of the weight space to explain the transferability of adversarial examples.
    Atomic structure generation from reconstructing structural fingerprints. (arXiv:2207.13227v1 [cond-mat.mtrl-sci])
    Data-driven machine learning methods have the potential to dramatically accelerate the rate of materials design over conventional human-guided approaches. These methods would help identify or, in the case of generative models, even create novel crystal structures of materials with a set of specified functional properties to then be synthesized or isolated in the laboratory. For crystal structure generation, a key bottleneck lies in developing suitable atomic structure fingerprints or representations for the machine learning model, analogous to the graph-based or SMILES representations used in molecular generation. However, finding data-efficient representations that are invariant to translations, rotations, and permutations, while remaining invertible to the Cartesian atomic coordinates remains an ongoing challenge. Here, we propose an alternative approach to this problem by taking existing non-invertible representations with the desired invariances and developing an algorithm to reconstruct the atomic coordinates through gradient-based optimization using automatic differentiation. This can then be coupled to a generative machine learning model which generates new materials within the representation space, rather than in the data-inefficient Cartesian space. In this work, we implement this end-to-end structure generation approach using atom-centered symmetry functions as the representation and conditional variational autoencoders as the generative model. We are able to successfully generate novel and valid atomic structures of sub-nanometer Pt nanoparticles as a proof of concept. Furthermore, this method can be readily extended to any suitable structural representation, thereby providing a powerful, generalizable framework towards structure-based generation.
    Should Bank Stress Tests Be Fair?. (arXiv:2207.13319v1 [stat.ML])
    Regulatory stress tests have become the primary tool for setting capital requirements at the largest U.S. banks. The Federal Reserve uses confidential models to evaluate bank-specific outcomes for bank-specific portfolios in shared stress scenarios. As a matter of policy, the same models are used for all banks, despite considerable heterogeneity across institutions; individual banks have contended that some models are not suited to their businesses. Motivated by this debate, we ask, what is a fair aggregation of individually tailored models into a common model? We argue that simply pooling data across banks treats banks equally but is subject to two deficiencies: it may distort the impact of legitimate portfolio features, and it is vulnerable to implicit misdirection of legitimate information to infer bank identity. We compare various notions of regression fairness to address these deficiencies, considering both forecast accuracy and equal treatment. In the setting of linear models, we argue for estimating and then discarding centered bank fixed effects as preferable to simply ignoring differences across banks. We present evidence that the overall impact can be material. We also discuss extensions to nonlinear models.
    Gaia: Graph Neural Network with Temporal Shift aware Attention for Gross Merchandise Value Forecast in E-commerce. (arXiv:2207.13329v1 [cs.LG])
    E-commerce has gone a long way in empowering merchants through the internet. In order to store the goods efficiently and arrange the marketing resource properly, it is important for them to make the accurate gross merchandise value (GMV) prediction. However, it's nontrivial to make accurate prediction with the deficiency of digitized data. In this article, we present a solution to better forecast GMV inside Alipay app. Thanks to graph neural networks (GNN) which has great ability to correlate different entities to enrich information, we propose Gaia, a graph neural network (GNN) model with temporal shift aware attention. Gaia leverages the relevant e-seller' sales information and learn neighbor correlation based on temporal dependencies. By testing on Alipay's real dataset and comparing with other baselines, Gaia has shown the best performance. And Gaia is deployed in the simulated online environment, which also achieves great improvement compared with baselines.  ( 2 min )
    The Randomness of Input Data Spaces is an A Priori Predictor for Generalization. (arXiv:2106.04181v2 [cs.LG] UPDATED)
    Over-parameterized models can perfectly learn various types of data distributions, however, generalization error is usually lower for real data in comparison to artificial data. This suggests that the properties of data distributions have an impact on generalization capability. This work focuses on the search space defined by the input data and assumes that the correlation between labels of neighboring input values influences generalization. If correlation is low, the randomness of the input data space is high leading to high generalization error. We suggest to measure the randomness of an input data space using Maurer's universal. Results for synthetic classification tasks and common image classification benchmarks (MNIST, CIFAR10, and Microsoft's cats vs. dogs data set) find a high correlation between the randomness of input data spaces and the generalization error of deep neural networks for binary classification problems.  ( 2 min )
    Efficient Personalized Speech Enhancement through Self-Supervised Learning. (arXiv:2104.02017v2 [eess.AS] UPDATED)
    This work presents self-supervised learning methods for developing monaural speaker-specific (i.e., personalized) speech enhancement models. While generalist models must broadly address many speakers, specialist models can adapt their enhancement function towards a particular speaker's voice, expecting to solve a narrower problem. Hence, specialists are capable of achieving more optimal performance in addition to reducing computational complexity. However, naive personalization methods can require clean speech from the target user, which is inconvenient to acquire, e.g., due to subpar recording conditions. To this end, we pose personalization as either a zero-shot task, in which no additional clean speech of the target speaker is used for training, or a few-shot learning task, in which the goal is to minimize the duration of the clean speech used for transfer learning. With this paper, we propose self-supervised learning methods as a solution to both zero- and few-shot personalization tasks. The proposed methods are designed to learn the personalized speech features from unlabeled data (i.e., in-the-wild noisy recordings from the target user) without knowing the corresponding clean sources. Our experiments investigate three different self-supervised learning mechanisms. The results show that self-supervised models achieve zero-shot and few-shot personalization using fewer model parameters and less clean data from the target user, achieving the data efficiency and model compression goals.  ( 3 min )
    Uncertainty-based Visual Question Answering: Estimating Semantic Inconsistency between Image and Knowledge Base. (arXiv:2207.13242v1 [cs.CV])
    Knowledge-based visual question answering (KVQA) task aims to answer questions that require additional external knowledge as well as an understanding of images and questions. Recent studies on KVQA inject an external knowledge in a multi-modal form, and as more knowledge is used, irrelevant information may be added and can confuse the question answering. In order to properly use the knowledge, this study proposes the following: 1) we introduce a novel semantic inconsistency measure computed from caption uncertainty and semantic similarity; 2) we suggest a new external knowledge assimilation method based on the semantic inconsistency measure and apply it to integrate explicit knowledge and implicit knowledge for KVQA; 3) the proposed method is evaluated with the OK-VQA dataset and achieves the state-of-the-art performance.
    Time Series Anomaly Detection via Reinforcement Learning-Based Model Selection. (arXiv:2205.09884v4 [cs.LG] UPDATED)
    Time series anomaly detection has been recognized as of critical importance for the reliable and efficient operation of real-world systems. Many anomaly detection methods have been developed based on various assumptions on anomaly characteristics. However, due to the complex nature of real-world data, different anomalies within a time series usually have diverse profiles supporting different anomaly assumptions. This makes it difficult to find a single anomaly detector that can consistently outperform other models. In this work, to harness the benefits of different base models, we propose a reinforcement learning-based model selection framework. Specifically, we first learn a pool of different anomaly detection models, and then utilize reinforcement learning to dynamically select a candidate model from these base models. Experiments on real-world data have demonstrated that the proposed strategy can indeed outplay all baseline models in terms of overall performance.  ( 2 min )
    Deep Clustering with Features from Self-Supervised Pretraining. (arXiv:2207.13364v1 [cs.CV])
    A deep clustering model conceptually consists of a feature extractor that maps data points to a latent space, and a clustering head that groups data points into clusters in the latent space. Although the two components used to be trained jointly in an end-to-end fashion, recent works have proved it beneficial to train them separately in two stages. In the first stage, the feature extractor is trained via self-supervised learning, which enables the preservation of the cluster structures among the data points. To preserve the cluster structures even better, we propose to replace the first stage with another model that is pretrained on a much larger dataset via self-supervised learning. The method is simple and might suffer from domain shift. Nonetheless, we have empirically shown that it can achieve superior clustering performance. When a vision transformer (ViT) architecture is used for feature extraction, our method has achieved clustering accuracy 94.0%, 55.6% and 97.9% on CIFAR-10, CIFAR-100 and STL-10 respectively. The corresponding previous state-of-the-art results are 84.3%, 47.7% and 80.8%. Our code will be available online with the publication of the paper.
    Dynamic Shielding for Reinforcement Learning in Black-Box Environments. (arXiv:2207.13446v1 [cs.LG])
    It is challenging to use reinforcement learning (RL) in cyber-physical systems due to the lack of safety guarantees during learning. Although there have been various proposals to reduce undesired behaviors during learning, most of these techniques require prior system knowledge, and their applicability is limited. This paper aims to reduce undesired behaviors during learning without requiring any prior system knowledge. We propose dynamic shielding: an extension of a model-based safe RL technique called shielding using automata learning. The dynamic shielding technique constructs an approximate system model in parallel with RL using a variant of the RPNI algorithm and suppresses undesired explorations due to the shield constructed from the learned model. Through this combination, potentially unsafe actions can be foreseen before the agent experiences them. Experiments show that our dynamic shield significantly decreases the number of undesired events during training.  ( 2 min )
    Semi-analytical Industrial Cooling System Model for Reinforcement Learning. (arXiv:2207.13131v1 [cs.AI])
    We present a hybrid industrial cooling system model that embeds analytical solutions within a multi-physics simulation. This model is designed for reinforcement learning (RL) applications and balances simplicity with simulation fidelity and interpretability. The model's fidelity is evaluated against real world data from a large scale cooling system. This is followed by a case study illustrating how the model can be used for RL research. For this, we develop an industrial task suite that allows specifying different problem settings and levels of complexity, and use it to evaluate the performance of different RL algorithms.
    Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks. (arXiv:2207.13243v1 [cs.LG])
    The last decade of machine learning has seen drastic increases in scale and capabilities, and deep neural networks (DNNs) are increasingly being deployed across a wide range of domains. However, the inner workings of DNNs are generally difficult to understand, raising concerns about the safety of using these systems without a rigorous understanding of how they function. In this survey, we review literature on techniques for interpreting the inner components of DNNs, which we call "inner" interpretability methods. Specifically, we review methods for interpreting weights, neurons, subnetworks, and latent representations with a focus on how these techniques relate to the goal of designing safer, more trustworthy AI systems. We also highlight connections between interpretability and work in modularity, adversarial robustness, continual learning, network compression, and studying the human visual system. Finally, we discuss key challenges and argue for future work in interpretability for AI safety that focuses on diagnostics, benchmarking, and robustness.
    PointFix: Learning to Fix Domain Bias for Robust Online Stereo Adaptation. (arXiv:2207.13340v1 [cs.CV])
    Online stereo adaptation tackles the domain shift problem, caused by different environments between synthetic (training) and real (test) datasets, to promptly adapt stereo models in dynamic real-world applications such as autonomous driving. However, previous methods often fail to counteract particular regions related to dynamic objects with more severe environmental changes. To mitigate this issue, we propose to incorporate an auxiliary point-selective network into a meta-learning framework, called PointFix, to provide a robust initialization of stereo models for online stereo adaptation. In a nutshell, our auxiliary network learns to fix local variants intensively by effectively back-propagating local information through the meta-gradient for the robust initialization of the baseline model. This network is model-agnostic, so can be used in any kind of architectures in a plug-and-play manner. We conduct extensive experiments to verify the effectiveness of our method under three adaptation settings such as short-, mid-, and long-term sequences. Experimental results show that the proper initialization of the base stereo model by the auxiliary network enables our learning paradigm to achieve state-of-the-art performance at inference.  ( 2 min )
    Deep Model-Based Architectures for Inverse Problems under Mismatched Priors. (arXiv:2207.13200v1 [eess.IV])
    There is a growing interest in deep model-based architectures (DMBAs) for solving imaging inverse problems by combining physical measurement models and learned image priors specified using convolutional neural nets (CNNs). For example, well-known frameworks for systematically designing DMBAs include plug-and-play priors (PnP), deep unfolding (DU), and deep equilibrium models (DEQ). While the empirical performance and theoretical properties of DMBAs have been widely investigated, the existing work in the area has primarily focused on their performance when the desired image prior is known exactly. This work addresses the gap in the prior work by providing new theoretical and numerical insights into DMBAs under mismatched CNN priors. Mismatched priors arise naturally when there is a distribution shift between training and testing data, for example, due to test images being from a different distribution than images used for training the CNN prior. They also arise when the CNN prior used for inference is an approximation of some desired statistical estimator (MAP or MMSE). Our theoretical analysis provides explicit error bounds on the solution due to the mismatched CNN priors under a set of clearly specified assumptions. Our numerical results compare the empirical performance of DMBAs under realistic distribution shifts and approximate statistical estimators.  ( 3 min )
    Transporters with Visual Foresight for Solving Unseen Rearrangement Tasks. (arXiv:2202.10765v3 [cs.RO] UPDATED)
    Rearrangement tasks have been identified as a crucial challenge for intelligent robotic manipulation, but few methods allow for precise construction of unseen structures. We propose a visual foresight model for pick-and-place rearrangement manipulation which is able to learn efficiently. In addition, we develop a multi-modal action proposal module which builds on the Goal-Conditioned Transporter Network, a state-of-the-art imitation learning method. Our image-based task planning method, Transporters with Visual Foresight, is able to learn from only a handful of data and generalize to multiple unseen tasks in a zero-shot manner. TVF is able to improve the performance of a state-of-the-art imitation learning method on unseen tasks in simulation and real robot experiments. In particular, the average success rate on unseen tasks improves from 55.4% to 78.5% in simulation experiments and from 30% to 63.3% in real robot experiments when given only tens of expert demonstrations. Video and code are available on our project website: https://chirikjianlab.github.io/tvf/  ( 2 min )
    Towards Soft Fairness in Restless Multi-Armed Bandits. (arXiv:2207.13343v1 [cs.LG])
    Restless multi-armed bandits (RMAB) is a framework for allocating limited resources under uncertainty. It is an extremely useful model for monitoring beneficiaries and executing timely interventions to ensure maximum benefit in public health settings (e.g., ensuring patients take medicines in tuberculosis settings, ensuring pregnant mothers listen to automated calls about good pregnancy practices). Due to the limited resources, typically certain communities or regions are starved of interventions that can have follow-on effects. To avoid starvation in the executed interventions across individuals/regions/communities, we first provide a soft fairness constraint and then provide an approach to enforce the soft fairness constraint in RMABs. The soft fairness constraint requires that an algorithm never probabilistically favor one arm over another if the long-term cumulative reward of choosing the latter arm is higher. Our approach incorporates softmax based value iteration method in the RMAB setting to design selection algorithms that manage to satisfy the proposed fairness constraint. Our method, referred to as SoftFair, also provides theoretical performance guarantees and is asymptotically optimal. Finally, we demonstrate the utility of our approaches on simulated benchmarks and show that the soft fairness constraint can be handled without a significant sacrifice on value.  ( 2 min )
    Efficient Resource Allocation with Fairness Constraints in Restless Multi-Armed Bandits. (arXiv:2206.03883v2 [cs.LG] UPDATED)
    Restless Multi-Armed Bandits (RMAB) is an apt model to represent decision-making problems in public health interventions (e.g., tuberculosis, maternal, and child care), anti-poaching planning, sensor monitoring, personalized recommendations and many more. Existing research in RMAB has contributed mechanisms and theoretical results to a wide variety of settings, where the focus is on maximizing expected value. In this paper, we are interested in ensuring that RMAB decision making is also fair to different arms while maximizing expected value. In the context of public health settings, this would ensure that different people and/or communities are fairly represented while making public health intervention decisions. To achieve this goal, we formally define the fairness constraints in RMAB and provide planning and learning methods to solve RMAB in a fair manner. We demonstrate key theoretical properties of fair RMAB and experimentally demonstrate that our proposed methods handle fairness constraints without sacrificing significantly on solution quality.  ( 2 min )
    FedVLN: Privacy-preserving Federated Vision-and-Language Navigation. (arXiv:2203.14936v2 [cs.AI] UPDATED)
    Data privacy is a central problem for embodied agents that can perceive the environment, communicate with humans, and act in the real world. While helping humans complete tasks, the agent may observe and process sensitive information of users, such as house environments, human activities, etc. In this work, we introduce privacy-preserving embodied agent learning for the task of Vision-and-Language Navigation (VLN), where an embodied agent navigates house environments by following natural language instructions. We view each house environment as a local client, which shares nothing other than local updates with the cloud server and other clients, and propose a novel federated vision-and-language navigation (FedVLN) framework to protect data privacy during both training and pre-exploration. Particularly, we propose a decentralized training strategy to limit the data of each client to its local model training and a federated pre-exploration method to do partial model aggregation to improve model generalizability to unseen environments. Extensive results on R2R and RxR datasets show that under our FedVLN framework, decentralized VLN models achieve comparable results with centralized training while protecting seen environment privacy, and federated pre-exploration significantly outperforms centralized pre-exploration while preserving unseen environment privacy.  ( 2 min )
    Encoding Concepts in Graph Neural Networks. (arXiv:2207.13586v1 [cs.LG])
    The opaque reasoning of Graph Neural Networks induces a lack of human trust. Existing graph network explainers attempt to address this issue by providing post-hoc explanations, however, they fail to make the model itself more interpretable. To fill this gap, we introduce the Concept Encoder Module, the first differentiable concept-discovery approach for graph networks. The proposed approach makes graph networks explainable by design by first discovering graph concepts and then using these to solve the task. Our results demonstrate that this approach allows graph networks to: (i) attain model accuracy comparable with their equivalent vanilla versions, (ii) discover meaningful concepts that achieve high concept completeness and purity scores, (iii) provide high-quality concept-based logic explanations for their prediction, and (iv) support effective interventions at test time: these can increase human trust as well as significantly improve model performance.  ( 2 min )
    Analysis and Design of Quadratic Neural Networks for Regression, Classification, and Lyapunov Control of Dynamical Systems. (arXiv:2207.13120v1 [cs.LG])
    This paper addresses the analysis and design of quadratic neural networks, which have been recently introduced in the literature, and their applications to regression, classification, system identification and control of dynamical systems. These networks offer several advantages, the most important of which are the fact that the architecture is a by-product of the design and is not determined a-priori, their training can be done by solving a convex optimization problem so that the global optimum of the weights is achieved, and the input-output mapping can be expressed analytically by a quadratic form. It also appears from several examples that these networks work extremely well using only a small fraction of the training data. The results in the paper cast regression, classification, system identification, stability and control design as convex optimization problems, which can be solved efficiently with polynomial-time algorithms to a global optimum. Several examples will show the effectiveness of quadratic neural networks in applications.
    The Sample Complexity of Forecast Aggregation. (arXiv:2207.13126v1 [cs.LG])
    We consider a Bayesian forecast aggregation model where $n$ experts, after observing private signals about an unknown binary event, report their posterior beliefs about the event to a principal, who then aggregates the reports into a single prediction for the event. The signals of the experts and the outcome of the event follow a joint distribution that is unknown to the principal, but the principal has access to i.i.d. "samples" from the distribution, where each sample is a tuple of experts' reports (not signals) and the realization of the event. Using these samples, the principal aims to find an $\varepsilon$-approximately optimal (Bayesian) aggregator. We study the sample complexity of this problem. We show that, for arbitrary discrete distributions, the number of samples must be at least $\tilde \Omega(m^{n-2} / \varepsilon)$, where $m$ is the size of each expert's signal space. This sample complexity grows exponentially in the number of experts $n$. But if experts' signals are independent conditioned on the realization of the event, then the sample complexity is significantly reduced, to $\tilde O(1 / \varepsilon^2)$, which does not depend on $n$.  ( 2 min )
    Intelligent Zero Trust Architecture for 5G/6G Networks: Principles, Challenges, and the Role of Machine Learning in the context of O-RAN. (arXiv:2105.01478v3 [cs.NI] UPDATED)
    In this position paper, we discuss the critical need for integrating zero trust (ZT) principles into next-generation communication networks (5G/6G). We highlight the challenges and introduce the concept of an intelligent zero trust architecture (i-ZTA) as a security framework in 5G/6G networks with untrusted components. While network virtualization, software-defined networking (SDN), and service-based architectures (SBA) are key enablers of 5G networks, operating in an untrusted environment has also become a key feature of the networks. Further, seamless connectivity to a high volume of devices has broadened the attack surface on information infrastructure. Network assurance in a dynamic untrusted environment calls for revolutionary architectures beyond existing static security frameworks. To the best of our knowledge, this is the first position paper that presents the architectural concept design of an i-ZTA upon which modern artificial intelligence (AI) algorithms can be developed to provide information security in untrusted networks. We introduce key ZT principles as real-time Monitoring of the security state of network assets, Evaluating the risk of individual access requests, and Deciding on access authorization using a dynamic trust algorithm, called MED components. To ensure ease of integration, the envisioned architecture adopts an SBA-based design, similar to the 3GPP specification of 5G networks, by leveraging the open radio access network (O-RAN) architecture with appropriate real-time engines and network interfaces for collecting necessary machine learning data. Therefore, this work provides novel research directions to design machine learning based components that contribute towards i-ZTA for the future 5G/6G networks.  ( 3 min )
    Fast expansion into harmonics on the disk: a steerable basis with fast radial convolutions. (arXiv:2207.13674v1 [math.NA])
    We present a fast and numerically accurate method for expanding digitized $L \times L$ images representing functions on $[-1,1]^2$ supported on the disk $\{x \in \mathbb{R}^2 : |x|<1\}$ in the harmonics (Dirichlet Laplacian eigenfunctions) on the disk. Our method runs in $\mathcal{O}(L^2 \log L)$ operations. This basis is also known as the Fourier-Bessel basis and it has several computational advantages: it is orthogonal, ordered by frequency, and steerable in the sense that images expanded in the basis can be rotated by applying a diagonal transform to the coefficients. Moreover, we show that convolution with radial functions can also be efficiently computed by applying a diagonal transform to the coefficients.  ( 2 min )
    Deep Partial Updating: Towards Communication Efficient Updating for On-device Inference. (arXiv:2007.03071v3 [cs.LG] UPDATED)
    Emerging edge intelligence applications require the server to retrain and update deep neural networks deployed on remote edge nodes to leverage newly collected data samples. Unfortunately, it may be impossible in practice to continuously send fully updated weights to these edge nodes due to the highly constrained communication resource. In this paper, we propose the weight-wise deep partial updating paradigm, which smartly selects a small subset of weights to update in each server-to-edge communication round, while achieving a similar performance compared to full updating. Our method is established through analytically upper-bounding the loss difference between partial updating and full updating, and only updates the weights which make the largest contributions to the upper bound. Extensive experimental results demonstrate the efficacy of our partial updating methodology which achieves a high inference accuracy while updating a rather small number of weights.  ( 2 min )
    Accelerating the Learning of TAMER with Counterfactual Explanations. (arXiv:2108.01358v2 [cs.AI] UPDATED)
    The capability to interactively learn from human feedback would enable agents in new settings. For example, even novice users could train service robots in new tasks naturally and interactively. Human-in-the-loop Reinforcement Learning (HRL) combines human feedback and Reinforcement Learning (RL) techniques. State-of-the-art interactive learning techniques suffer from slow learning speed, thus leading to a frustrating experience for the human. We approach this problem by extending the HRL framework TAMER for evaluative feedback with the possibility to enhance human feedback with two different types of counterfactual explanations (action and state based). We experimentally show that our extensions improve the speed of learning.  ( 2 min )
    Accurate detection of sepsis at ED triage using machine learning with clinical natural language processing. (arXiv:2204.07657v3 [cs.LG] UPDATED)
    Sepsis is a life-threatening condition with organ dysfunction and is a leading cause of death and critical illness worldwide. Accurate detection of sepsis during emergency department triage would allow early initiation of lab analysis, antibiotic administration, and other sepsis treatment protocols. The purpose of this study was to determine whether EHR data can be extracted and synthesized with the latest machine learning algorithms (KATE Sepsis) and clinical natural language processing to produce accurate sepsis models, and compare KATE Sepsis performance with existing sepsis screening protocols, such as SIRS and qSOFA. A machine learning model (KATE Sepsis) was developed using patient encounters with triage data from 16 participating hospitals. KATE Sepsis, SIRS, standard screening (SIRS with source of infection) and qSOFA were tested in three settings. Cohort-A was a retrospective analysis on medical records from a single Site 1. Cohort-B was a prospective analysis of Site 1. Cohort-C was a retrospective analysis on Site 1 with 15 additional sites. Across all cohorts, KATE Sepsis demonstrates an AUC of 0.94-0.963 with 73-74.87% TPR and 3.76-7.17% FPR. Standard screening demonstrates an AUC of 0.682-0.726 with 39.39-51.19% TPR and 2.9-6.02% FPR. The qSOFA protocol demonstrates an AUC of 0.544-0.56, with 10.52-13.18% TPR and 1.22-1.68% FPR. For severe sepsis, across all cohorts, KATE Sepsis demonstrates an AUC of 0.935-0.972 with 70-82.26% TPR and 4.64-8.62% FPR. For septic shock, across all cohorts, KATE Sepsis demonstrates an AUC of 0.96-0.981 with 85.71-89.66% TPR and 4.85-8.8% FPR. SIRS, standard screening, and qSOFA demonstrate low AUC and TPR for severe sepsis and septic shock detection. KATE Sepsis provided substantially better sepsis detection performance in triage than commonly used screening protocols.  ( 3 min )
    Understanding Convolutional Neural Networks from Volterra Convolution Perspective. (arXiv:2110.09902v2 [cs.LG] UPDATED)
    We make an attempt to understanding convolutional neural network by exploring the relationship between (deep) convolutional neural networks and Volterra convolutions. We propose a novel approach to explain and study the overall characteristics of neural networks without being disturbed by the horribly complex architectures. Specifically, we convert the basic structures and their combinations to the form of Volterra convolutions. The results show that most of convolutional neural networks can be converted to the form of Volterra convolution, where the converted proxy kernels preserve the characteristics of the original network. Analyzing these proxy kernels may give valuable insight about the original network. Base on this setup, we presented methods to approximating the order-zero and order-one proxy kernels, and verified the correctness and effectiveness of our results.  ( 2 min )
    Using Deep Learning to Detecting Deepfakes. (arXiv:2207.13644v1 [cs.CV])
    In the recent years, social media has grown to become a major source of information for many online users. This has given rise to the spread of misinformation through deepfakes. Deepfakes are videos or images that replace one persons face with another computer-generated face, often a more recognizable person in society. With the recent advances in technology, a person with little technological experience can generate these videos. This enables them to mimic a power figure in society, such as a president or celebrity, creating the potential danger of spreading misinformation and other nefarious uses of deepfakes. To combat this online threat, researchers have developed models that are designed to detect deepfakes. This study looks at various deepfake detection models that use deep learning algorithms to combat this looming threat. This survey focuses on providing a comprehensive overview of the current state of deepfake detection models and the unique approaches many researchers take to solving this problem. The benefits, limitations, and suggestions for future work will be thoroughly discussed throughout this paper.  ( 2 min )
    Emergence of Novelty in Evolutionary Algorithms. (arXiv:2207.04857v2 [cs.NE] UPDATED)
    One of the main problems of evolutionary algorithms is the convergence of the population to local minima. In this paper, we explore techniques that can avoid this problem by encouraging a diverse behavior of the agents through a shared reward system. The rewards are randomly distributed in the environment, and the agents are only rewarded for collecting them first. This leads to an emergence of a novel behavior of the agents. We introduce our approach to the maze problem and compare it to the previously proposed solution, denoted as Novelty Search (Lehman and Stanley, 2011a). We find that our solution leads to an improved performance while being significantly simpler. Building on that, we generalize the problem and apply our approach to a more advanced set of tasks, Atari Games, where we observe a similar performance quality with much less computational power needed.  ( 2 min )
    Graph Neural Networks for Communication Networks: Context, Use Cases and Opportunities. (arXiv:2112.14792v2 [cs.NI] UPDATED)
    Graph neural networks (GNN) have shown outstanding applications in many fields where data is fundamentally represented as graphs (e.g., chemistry, biology, recommendation systems). In this vein, communication networks comprise many fundamental components that are naturally represented in a graph-structured manner (e.g., topology, configurations, traffic flows). This position article presents GNNs as a fundamental tool for modeling, control and management of communication networks. GNNs represent a new generation of data-driven models that can accurately learn and reproduce the complex behaviors behind real networks. As a result, such models can be applied to a wide variety of networking use cases, such as planning, online optimization, or troubleshooting. The main advantage of GNNs over traditional neural networks lies in its unprecedented generalization capabilities when applied to other networks and configurations unseen during training, which is a critical feature for achieving practical data-driven solutions for networking. This article comprises a brief tutorial on GNNs and their possible applications to communication networks. To showcase the potential of this technology, we present two use cases with state-of-the-art GNN models respectively applied to wired and wireless networks. Lastly, we delve into the key open challenges and opportunities yet to be explored in this novel research area.  ( 3 min )
    Multi-layer Representation Learning for Robust OOD Image Classification. (arXiv:2207.13678v1 [cs.CV])
    Convolutional Neural Networks have become the norm in image classification. Nevertheless, their difficulty to maintain high accuracy across datasets has become apparent in the past few years. In order to utilize such models in real-world scenarios and applications, they must be able to provide trustworthy predictions on unseen data. In this paper, we argue that extracting features from a CNN's intermediate layers can assist in the model's final prediction. Specifically, we adapt the Hypercolumns method to a ResNet-18 and find a significant increase in the model's accuracy, when evaluating on the NICO dataset.  ( 2 min )
    BPFISH: Blockchain and Privacy-preserving FL Inspired Smart Healthcare. (arXiv:2207.11654v2 [cs.NI] UPDATED)
    This paper proposes Federated Learning (FL) based smart healthcare system where Medical Centers (MCs) train the local model using the data collected from patients and send the model weights to the miners in a blockchain-based robust framework without sharing raw data, keeping privacy preservation into deliberation. We formulate an optimization problem by maximizing the utility and minimizing the loss function considering energy consumption and FL process delay of MCs for learning effective models on distributed healthcare data underlying a blockchain-based framework. We propose a solution in two stages: first, offer a stable matching-based association algorithm to maximize the utility of both miners and MCs and then solve loss minimization using Stochastic Gradient Descent (SGD) algorithm employing FL under Differential Privacy (DP) and blockchain technology. Moreover, we incorporate blockchain technology to provide tempered resistant and decentralized model weight sharing in the proposed FL-based framework. The effectiveness of the proposed model is shown through simulation on real-world healthcare data comparing other state-of-the-art techniques.  ( 2 min )
    Adversarial Imitation Learning from Video using a State Observer. (arXiv:2202.00243v2 [cs.RO] UPDATED)
    The imitation learning research community has recently made significant progress towards the goal of enabling artificial agents to imitate behaviors from video demonstrations alone. However, current state-of-the-art approaches developed for this problem exhibit high sample complexity due, in part, to the high-dimensional nature of video observations. Towards addressing this issue, we introduce here a new algorithm called Visual Generative Adversarial Imitation from Observation using a State Observer VGAIfO-SO. At its core, VGAIfO-SO seeks to address sample inefficiency using a novel, self-supervised state observer, which provides estimates of lower-dimensional proprioceptive state representations from high-dimensional images. We show experimentally in several continuous control environments that VGAIfO-SO is more sample efficient than other IfO algorithms at learning from video-only demonstrations and can sometimes even achieve performance close to the Generative Adversarial Imitation from Observation (GAIfO) algorithm that has privileged access to the demonstrator's proprioceptive state information.  ( 2 min )
    Latent Space Smoothing for Individually Fair Representations. (arXiv:2111.13650v3 [cs.LG] UPDATED)
    Fair representation learning transforms user data into a representation that ensures fairness and utility regardless of the downstream application. However, learning individually fair representations, i.e., guaranteeing that similar individuals are treated similarly, remains challenging in high-dimensional settings such as computer vision. In this work, we introduce LASSI, the first representation learning method for certifying individual fairness of high-dimensional data. Our key insight is to leverage recent advances in generative modeling to capture the set of similar individuals in the generative latent space. This enables us to learn individually fair representations that map similar individuals close together by using adversarial training to minimize the distance between their representations. Finally, we employ randomized smoothing to provably map similar individuals close together, in turn ensuring that local robustness verification of the downstream application results in end-to-end fairness certification. Our experimental evaluation on challenging real-world image data demonstrates that our method increases certified individual fairness by up to 90% without significantly affecting task utility.  ( 2 min )
    JDRec: Practical Actor-Critic Framework for Online Combinatorial Recommender System. (arXiv:2207.13311v1 [cs.IR])
    A combinatorial recommender (CR) system feeds a list of items to a user at a time in the result page, in which the user behavior is affected by both contextual information and items. The CR is formulated as a combinatorial optimization problem with the objective of maximizing the recommendation reward of the whole list. Despite its importance, it is still a challenge to build a practical CR system, due to the efficiency, dynamics, personalization requirement in online environment. In particular, we tear the problem into two sub-problems, list generation and list evaluation. Novel and practical model architectures are designed for these sub-problems aiming at jointly optimizing effectiveness and efficiency. In order to adapt to online case, a bootstrap algorithm forming an actor-critic reinforcement framework is given to explore better recommendation mode in long-term user interaction. Offline and online experiment results demonstrate the efficacy of proposed JDRec framework. JDRec has been applied in online JD recommendation, improving click through rate by 2.6% and synthetical value for the platform by 5.03%. We will publish the large-scale dataset used in this study to contribute to the research community.  ( 3 min )
    Multi-Objective Hyperparameter Optimization -- An Overview. (arXiv:2206.07438v2 [cs.LG] UPDATED)
    Hyperparameter optimization constitutes a large part of typical modern machine learning workflows. This arises from the fact that machine learning methods and corresponding preprocessing steps often only yield optimal performance when hyperparameters are properly tuned. But in many applications, we are not only interested in optimizing ML pipelines solely for predictive accuracy; additional metrics or constraints must be considered when determining an optimal configuration, resulting in a multi-objective optimization problem. This is often neglected in practice, due to a lack of knowledge and readily available software implementations for multi-objective hyperparameter optimization. In this work, we introduce the reader to the basics of multi-objective hyperparameter optimization and motivate its usefulness in applied ML. Furthermore, we provide an extensive survey of existing optimization strategies, both from the domain of evolutionary algorithms and Bayesian optimization. We illustrate the utility of MOO in several specific ML applications, considering objectives such as operating conditions, prediction time, sparseness, fairness, interpretability and robustness.  ( 2 min )
    Dynamical simulation via quantum machine learning with provable generalization. (arXiv:2204.10269v2 [quant-ph] UPDATED)
    Much attention has been paid to dynamical simulation and quantum machine learning (QML) independently as applications for quantum advantage, while the possibility of using QML to enhance dynamical simulations has not been thoroughly investigated. Here we develop a framework for using QML methods to simulate quantum dynamics on near-term quantum hardware. We use generalization bounds, which bound the error a machine learning model makes on unseen data, to rigorously analyze the training data requirements of an algorithm within this framework. This provides a guarantee that our algorithm is resource-efficient, both in terms of qubit and data requirements. Our numerics exhibit efficient scaling with problem size, and we simulate 20 times longer than Trotterization on IBMQ-Bogota.  ( 2 min )
    Scalable Certified Segmentation via Randomized Smoothing. (arXiv:2107.00228v2 [cs.LG] UPDATED)
    We present a new certification method for image and point cloud segmentation based on randomized smoothing. The method leverages a novel scalable algorithm for prediction and certification that correctly accounts for multiple testing, necessary for ensuring statistical guarantees. The key to our approach is reliance on established multiple-testing correction mechanisms as well as the ability to abstain from classifying single pixels or points while still robustly segmenting the overall input. Our experimental evaluation on synthetic data and challenging datasets, such as Pascal Context, Cityscapes, and ShapeNet, shows that our algorithm can achieve, for the first time, competitive accuracy and certification guarantees on real-world segmentation tasks. We provide an implementation at https://github.com/eth-sri/segmentation-smoothing.  ( 2 min )
    Thermal half-lives of azobenzene derivatives: virtual screening based on intersystem crossing using a machine learning potential. (arXiv:2207.11592v2 [physics.chem-ph] UPDATED)
    Molecular photoswitches are the foundation of light-activated drugs. A key photoswitch is azobenzene, which exhibits trans-cis isomerism in response to light. The thermal half-life of the cis isomer is of crucial importance, since it controls the duration of the light-induced biological effect. Here we introduce a computational tool for predicting the thermal half-lives of azobenzene derivatives. Our automated approach uses a fast and accurate machine learning potential trained on quantum chemistry data. Building on well-established earlier evidence, we argue that thermal isomerization proceeds through rotation mediated by intersystem crossing, and incorporate this mechanism into our automated workflow. We use our approach to predict the thermal half-lives of 19,000 azobenzene derivatives. We explore trends and tradeoffs between barriers and absorption wavelengths, and open-source our data and software to accelerate research in photopharmacology.  ( 2 min )
    Fixed-Time Convergence for a Class of Nonconvex-Nonconcave Min-Max Problems. (arXiv:2207.12845v1 [math.OC] CROSS LISTED)
    This study develops a fixed-time convergent saddle point dynamical system for solving min-max problems under a relaxation of standard convexity-concavity assumption. In particular, it is shown that by leveraging the dynamical systems viewpoint of an optimization algorithm, accelerated convergence to a saddle point can be obtained. Instead of requiring the objective function to be strongly-convex--strongly-concave (as necessitated for accelerated convergence of several saddle-point algorithms), uniform fixed-time convergence is guaranteed for functions satisfying only the two-sided Polyak-{\L}ojasiewicz (PL) inequality. A large number of practical problems, including the robust least squares estimation, are known to satisfy the two-sided PL inequality. The proposed method achieves arbitrarily fast convergence compared to any other state-of-the-art method with linear or even super-linear convergence, as also corroborated in numerical case studies.  ( 2 min )
    The Implications of the No-Free-Lunch Theorems for Meta-induction. (arXiv:2103.11956v3 [cs.LG] UPDATED)
    The important recent book by G. Schurz appreciates that the no-free-lunch theorems (NFL) have major implications for the problem of (meta) induction. Here I review the NFL theorems, emphasizing that they do not only concern the case where there is a uniform prior -- they prove that there are "as many priors" (loosely speaking) for which any induction algorithm $A$ out-generalizes some induction algorithm $B$ as vice-versa. Importantly though, in addition to the NFL theorems, there are many {free lunch} theorems. In particular, the NFL theorems can only be used to compare the {marginal} expected performance of an induction algorithm $A$ with the marginal expected performance of an induction algorithm $B$. There is a rich set of free lunches which instead concern the statistical correlations among the generalization errors of induction algorithms. As I describe, the meta-induction algorithms that Schurz advocate as a "solution to Hume's problem" are just an example of such a free lunch based on correlations among the generalization errors of induction algorithms. I end by pointing out that the prior that Schurz advocates, which is uniform over bit frequencies rather than bit patterns, is contradicted by thousands of experiments in statistical physics and by the great success of the maximum entropy procedure in inductive inference.  ( 3 min )
    Bioinspired random projections for robust, sparse classification. (arXiv:2206.09222v2 [stat.ML] UPDATED)
    Inspired by the use of random projections in biological sensing systems, we present a new algorithm for processing data in classification problems. This is based on observations of the human brain and the fruit fly's olfactory system and involves randomly projecting data into a space of greatly increased dimension before applying a cap operation to truncate the smaller entries. This leads to a simple algorithm that is very computationally efficient and can be used to either give a sparse representation with minimal loss in classification accuracy or give improved robustness, in the sense that classification accuracy is improved when noise is added to the data. This is demonstrated with numerical experiments, which supplement theoretical results demonstrating that the resulting signal transform is continuous and invertible, in an appropriate sense.  ( 2 min )
    Exploring Representation of Horn Clauses using GNNs (Extended Technical Report). (arXiv:2206.06986v4 [cs.AI] UPDATED)
    Learning program semantics from raw source code is challenging due to the complexity of real-world programming language syntax and due to the difficulty of reconstructing long-distance relational information implicitly represented in programs using identifiers. Addressing the first point, we consider Constrained Horn Clauses (CHCs) as a standard representation of program verification problems, providing a simple and programming language-independent syntax. For the second challenge, we explore graph representations of CHCs, and propose a new Relational Hypergraph Neural Network (R-HyGNN) architecture to learn program features. We introduce two different graph representations of CHCs. One is called constraint graph (CG), and emphasizes syntactic information of CHCs by translating the symbols and their relations in CHCs as typed nodes and binary edges, respectively, and constructing the constraints as abstract syntax trees. The second one is called control- and data-flow hypergraph (CDHG), and emphasizes semantic information of CHCs by representing the control and data flow through ternary hyperedges. We then propose a new GNN architecture, R-HyGNN, extending Relational Graph Convolutional Networks, to handle hypergraphs. To evaluate the ability of R-HyGNN to extract semantic information from programs, we use R-HyGNNs to train models on the two graph representations, and on five proxy tasks with increasing difficulty, using benchmarks from CHC-COMP 2021 as training data. The most difficult proxy task requires the model to predict the occurrence of clauses in counter-examples, which subsumes satisfiability of CHCs. CDHG achieves 90.59% accuracy in this task. Furthermore, R-HyGNN has perfect predictions on one of the graphs consisting of more than 290 clauses. Overall, our experiments indicate that R-HyGNN can capture intricate program features for guiding verification problems.  ( 3 min )
    D3C2-Net: Dual-Domain Deep Convolutional Coding Network for Compressive Sensing. (arXiv:2207.13560v1 [cs.CV])
    Mapping optimization algorithms into neural networks, deep unfolding networks (DUNs) have achieved impressive success in compressive sensing (CS). From the perspective of optimization, DUNs inherit a well-defined and interpretable structure from iterative steps. However, from the viewpoint of neural network design, most existing DUNs are inherently established based on traditional image-domain unfolding, which takes one-channel images as inputs and outputs between adjacent stages, resulting in insufficient information transmission capability and inevitable loss of the image details. In this paper, to break the above bottleneck, we first propose a generalized dual-domain optimization framework, which is general for inverse imaging and integrates the merits of both (1) image-domain and (2) convolutional-coding-domain priors to constrain the feasible region in the solution space. By unfolding the proposed framework into deep neural networks, we further design a novel Dual-Domain Deep Convolutional Coding Network (D3C2-Net) for CS imaging with the capability of transmitting high-throughput feature-level image representation through all the unfolded stages. Experiments on natural and MR images demonstrate that our D3C2-Net achieves higher performance and better accuracy-complexity trade-offs than other state-of-the-arts.  ( 2 min )
    Towards noise robust trigger-word detection with contrastive learning pre-task for fast on-boarding of new trigger-words. (arXiv:2111.03971v3 [cs.SD] UPDATED)
    Trigger-word detection plays an important role as the entry point of user's communication with voice assistants. But supporting a particular word as a trigger-word involves huge amount of data collection, augmentation and labelling for that word. This makes supporting new trigger-words a tedious and time consuming process. To combat this, we explore the use of contrastive learning as a pre-training task that helps the detection model to generalize to different words and noise conditions. We explore supervised contrastive techniques and also propose a novel self-supervised training technique using chunked words from long sentence audios. We show that both supervised and the new self-supervised contrastive pre-training techniques have comparable results to a traditional classification pre-training on new trigger words with less data availability.  ( 2 min )
    Explain My Surprise: Learning Efficient Long-Term Memory by Predicting Uncertain Outcomes. (arXiv:2207.13649v1 [cs.LG])
    In many sequential tasks, a model needs to remember relevant events from the distant past to make correct predictions. Unfortunately, a straightforward application of gradient based training requires intermediate computations to be stored for every element of a sequence. This requires prohibitively large computing memory if a sequence consists of thousands or even millions elements, and as a result, makes learning of very long-term dependencies infeasible. However, the majority of sequence elements can usually be predicted by taking into account only temporally local information. On the other hand, predictions affected by long-term dependencies are sparse and characterized by high uncertainty given only local information. We propose MemUP, a new training method that allows to learn long-term dependencies without backpropagating gradients through the whole sequence at a time. This method can be potentially applied to any gradient based sequence learning. MemUP implementation for recurrent architectures shows performances better or comparable to baselines while requiring significantly less computing memory.  ( 2 min )
    TracInAD: Measuring Influence for Anomaly Detection. (arXiv:2205.01362v3 [cs.LG] UPDATED)
    As with many other tasks, neural networks prove very effective for anomaly detection purposes. However, very few deep-learning models are suited for detecting anomalies on tabular datasets. This paper proposes a novel methodology to flag anomalies based on TracIn, an influence measure initially introduced for explicability purposes. The proposed methods can serve to augment any unsupervised deep anomaly detection method. We test our approach using Variational Autoencoders and show that the average influence of a subsample of training points on a test point can serve as a proxy for abnormality. Our model proves to be competitive in comparison with state-of-the-art approaches: it achieves comparable or better performance in terms of detection accuracy on medical and cyber-security tabular benchmark data.  ( 2 min )
    Understanding Non-linearity in Graph Neural Networks from the Bayesian-Inference Perspective. (arXiv:2207.11311v2 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have shown superiority in many prediction tasks over graphs due to their impressive capability of capturing nonlinear relations in graph-structured data. However, for node classification tasks, often, only marginal improvement of GNNs over their linear counterparts has been observed. Previous works provide very few understandings of this phenomenon. In this work, we resort to Bayesian learning to deeply investigate the functions of non-linearity in GNNs for node classification tasks. Given a graph generated from the statistical model CSBM, we observe that the max-a-posterior estimation of a node label given its own and neighbors' attributes consists of two types of non-linearity, a possibly non-linear transformation of node attributes and a ReLU-activated feature aggregation from neighbors. The latter surprisingly matches the type of non-linearity used in many GNN models. By further imposing Gaussian assumption on node attributes, we prove that the superiority of those ReLU activations is only significant when the node attributes are far more informative than the graph structure, which nicely matches many previous empirical observations. A similar argument can be achieved when there is a distribution shift of node attributes between the training and testing datasets. Finally, we verify our theory on both synthetic and real-world networks.  ( 3 min )
    Reasonable Effectiveness of Random Weighting: A Litmus Test for Multi-Task Learning. (arXiv:2111.10603v2 [cs.LG] UPDATED)
    Multi-Task Learning (MTL) has achieved success in various fields. However, how to balance different tasks to achieve good performance is a key problem. To achieve the task balancing, there are many works to carefully design dynamical loss/gradient weighting strategies but the basic random experiments are ignored to examine their effectiveness. In this paper, we propose the Random Weighting (RW) methods, including Random Loss Weighting (RLW) and Random Gradient Weighting (RGW), where an MTL model is trained with random loss/gradient weights sampled from a distribution. To show the effectiveness and necessity of RW methods, theoretically we analyze the convergence of RW and reveal that RW has a higher probability to escape local minima, resulting in better generalization ability. Empirically, we extensively evaluate the proposed RW methods to compare with twelve state-of-the-art methods on five image datasets and two multilingual problems from the XTREME benchmark to show RW methods can achieve comparable performance with state-of-the-art baselines. Therefore, we think that the RW methods are important baselines for MTL and should attract more attentions.  ( 2 min )
    Learning with Combinatorial Optimization Layers: a Probabilistic Approach. (arXiv:2207.13513v1 [stat.ML])
    Combinatorial optimization (CO) layers in machine learning (ML) pipelines are a powerful tool to tackle data-driven decision tasks, but they come with two main challenges. First, the solution of a CO problem often behaves as a piecewise constant function of its objective parameters. Given that ML pipelines are typically trained using stochastic gradient descent, the absence of slope information is very detrimental. Second, standard ML losses do not work well in combinatorial settings. A growing body of research addresses these challenges through diverse methods. Unfortunately, the lack of well-maintained implementations slows down the adoption of CO layers. In this paper, building upon previous works, we introduce a probabilistic perspective on CO layers, which lends itself naturally to approximate differentiation and the construction of structured losses. We recover many approaches from the literature as special cases, and we also derive new ones. Based on this unifying perspective, we present InferOpt.jl, an open-source Julia package that 1) allows turning any CO oracle with a linear objective into a differentiable layer, and 2) defines adequate losses to train pipelines containing such layers. Our library works with arbitrary optimization algorithms, and it is fully compatible with Julia's ML ecosystem. We demonstrate its abilities using a pathfinding problem on video game maps.  ( 2 min )
    Learning from Positive and Unlabeled Data with Augmented Classes. (arXiv:2207.13274v1 [cs.LG])
    Positive Unlabeled (PU) learning aims to learn a binary classifier from only positive and unlabeled data, which is utilized in many real-world scenarios. However, existing PU learning algorithms cannot deal with the real-world challenge in an open and changing scenario, where examples from unobserved augmented classes may emerge in the testing phase. In this paper, we propose an unbiased risk estimator for PU learning with Augmented Classes (PUAC) by utilizing unlabeled data from the augmented classes distribution, which can be easily collected in many real-world scenarios. Besides, we derive the estimation error bound for the proposed estimator, which provides a theoretical guarantee for its convergence to the optimal solution. Experiments on multiple realistic datasets demonstrate the effectiveness of proposed approach.  ( 2 min )
    Representation Learning for Dynamic Hyperedges. (arXiv:2112.10154v2 [cs.LG] UPDATED)
    The explosion of digital information and the growing involvement of people in social networks led to enormous research activity to develop methods that can extract meaningful information from interaction data. Commonly, interactions are represented by edges in a network or a graph, which implicitly assumes that the interactions are pairwise and static. However, real-world interactions deviate from these assumptions: (i) interactions can be multi-way involving more than two nodes or individuals (e.g., family relationships, protein interactions), and (ii) interactions can change over a period of time (e.g., change of opinions and friendship status). While pairwise interactions have been studied in a dynamic network setting and multi-way interactions have been studied using hypergraphs in static networks, there exists no method that can predict multi-way interactions or hyperedges in dynamic settings. Existing related methods cannot answer temporal queries like what type of interaction will occur next and when it will occur. This paper proposes a temporal point process model for hyperedge prediction to address these problems. Our proposed model uses dynamic representation techniques for nodes in a neural point process framework to forecast hyperedges. We present several experimental results and set benchmark results. As far as our knowledge, this is the first work that uses the temporal point process to forecast hyperedges in dynamic networks.  ( 3 min )
    The Computational Limits of Deep Learning. (arXiv:2007.05558v2 [cs.LG] UPDATED)
    Deep learning's recent history has been one of achievement: from triumphing over humans in the game of Go to world-leading performance in image classification, voice recognition, translation, and other tasks. But this progress has come with a voracious appetite for computing power. This article catalogs the extent of this dependency, showing that progress across a wide variety of applications is strongly reliant on increases in computing power. Extrapolating forward this reliance reveals that progress along current lines is rapidly becoming economically, technically, and environmentally unsustainable. Thus, continued progress in these applications will require dramatically more computationally-efficient methods, which will either have to come from changes to deep learning or from moving to other machine learning methods.  ( 2 min )
    Optimizing transformations for contrastive learning in a differentiable framework. (arXiv:2207.13367v1 [cs.LG])
    Current contrastive learning methods use random transformations sampled from a large list of transformations, with fixed hyperparameters, to learn invariance from an unannotated database. Following previous works that introduce a small amount of supervision, we propose a framework to find optimal transformations for contrastive learning using a differentiable transformation network. Our method increases performances at low annotated data regime both in supervision accuracy and in convergence speed. In contrast to previous work, no generative model is needed for transformation optimization. Transformed images keep relevant information to solve the supervised task, here classification. Experiments were performed on 34000 2D slices of brain Magnetic Resonance Images and 11200 chest X-ray images. On both datasets, with 10% of labeled data, our model achieves better performances than a fully supervised model with 100% labels.  ( 2 min )
    A hybrid ensemble method with negative correlation learning for regression. (arXiv:2104.02317v3 [cs.LG] UPDATED)
    Hybrid ensemble, an essential branch of ensembles, has flourished in numerous machine learning problems, especially regression. Several studies have confirmed the importance of diversity; however, previous ensembles only consider diversity in the sub-model training stage, with limited improvement compared to single models. In contrast, this study selects and weights sub-models from a heterogeneous model pool automatically. It solves an optimization problem using an interior-point filtering linear-search algorithm. This optimization problem innovatively incorporates negative correlation learning as a penalty term, with which a diverse model subset can be selected. Experimental results show some meaningful points. Model pool construction requires different classes of models, with all possible parameter sets for each class as sub-models. The best sub-models from each class are selected to construct an NCL-based ensemble, which is far more better than the average of the sub-models. Furthermore, comparing with classical constant and non-constant weighting methods, NCL-based ensemble has a significant advantage in several prediction metrics. In practice, it is difficult to conclude the optimal sub-model for a dataset prior due to the model uncertainty. However, our method would achieve comparable accuracy as the potential optimal sub-models on RMSE metric. In conclusion, the value of this study lies in its ease of use and effectiveness, allowing the hybrid ensemble to embrace both diversity and accuracy.  ( 3 min )
    ShiftAddNAS: Hardware-Inspired Search for More Accurate and Efficient Neural Networks. (arXiv:2205.08119v2 [cs.LG] UPDATED)
    Neural networks (NNs) with intensive multiplications (e.g., convolutions and transformers) are capable yet power hungry, impeding their more extensive deployment into resource-constrained devices. As such, multiplication-free networks, which follow a common practice in energy-efficient hardware implementation to parameterize NNs with more efficient operators (e.g., bitwise shifts and additions), have gained growing attention. However, multiplication-free networks usually under-perform their vanilla counterparts in terms of the achieved accuracy. To this end, this work advocates hybrid NNs that consist of both powerful yet costly multiplications and efficient yet less powerful operators for marrying the best of both worlds, and proposes ShiftAddNAS, which can automatically search for more accurate and more efficient NNs. Our ShiftAddNAS highlights two enablers. Specifically, it integrates (1) the first hybrid search space that incorporates both multiplication-based and multiplication-free operators for facilitating the development of both accurate and efficient hybrid NNs; and (2) a novel weight sharing strategy that enables effective weight sharing among different operators that follow heterogeneous distributions (e.g., Gaussian for convolutions vs. Laplacian for add operators) and simultaneously leads to a largely reduced supernet size and much better searched networks. Extensive experiments and ablation studies on various models, datasets, and tasks consistently validate the efficacy of ShiftAddNAS, e.g., achieving up to a +7.7% higher accuracy or a +4.9 better BLEU score compared to state-of-the-art NN, while leading to up to 93% or 69% energy and latency savings, respectively. Codes and pretrained models are available at https://github.com/RICE-EIC/ShiftAddNAS.  ( 3 min )
  • Open

    Conformal Prediction Bands for Two-Dimensional Functional Time Series. (arXiv:2207.13656v1 [stat.ME])
    Conformal Prediction (CP) is a versatile nonparametric framework used to quantify uncertainty in prediction problems. In this work, we provide an extension of such method to the case of time series of functions defined on a bivariate domain, by proposing for the first time a distribution-free technique which can be applied to time-evolving surfaces. In order to obtain meaningful and efficient prediction regions, CP must be coupled with an accurate forecasting algorithm, for this reason, we extend the theory of autoregressive processes in Hilbert space in order to allow for functions with a bivariate domain. Given the novelty of the subject, we present estimation techniques for the Functional Autoregressive model (FAR). A simulation study is implemented, in order to investigate how different point predictors affect the resulting prediction bands. Finally, we explore benefits and limits of the proposed approach on a real dataset, collecting daily observations of Sea Level Anomalies of the Black Sea in the last twenty years.
    The Cellwise Minimum Covariance Determinant Estimator. (arXiv:2207.13493v1 [stat.ME])
    The usual Minimum Covariance Determinant (MCD) estimator of a covariance matrix is robust against casewise outliers. These are cases (that is, rows of the data matrix) that behave differently from the majority of cases, raising suspicion that they might belong to a different population. On the other hand, cellwise outliers are individual cells in the data matrix. When a row contains one or more outlying cells, the other cells in the same row still contain useful information that we wish to preserve. We propose a cellwise robust version of the MCD method, called cellMCD. Its main building blocks are observed likelihood and a sparsity penalty on the number of flagged cellwise outliers. It possesses good breakdown properties. We construct a fast algorithm for cellMCD based on concentration steps (C-steps) that always lower the objective. The method performs well in simulations with cellwise outliers, and has high finite-sample efficiency on clean data. It is illustrated on real data with visualizations of the results.
    Membership Inference Attacks via Adversarial Examples. (arXiv:2207.13572v1 [cs.LG])
    The raise of machine learning and deep learning led to significant improvement in several domains. This change is supported by both the dramatic rise in computation power and the collection of large datasets. Such massive datasets often include personal data which can represent a threat to privacy. Membership inference attacks are a novel direction of research which aims at recovering training data used by a learning algorithm. In this paper, we develop a mean to measure the leakage of training data leveraging a quantity appearing as a proxy of the total variation of a trained model near its training samples. We extend our work by providing a novel defense mechanism. Our contributions are supported by empirical evidence through convincing numerical experiments.
    Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons. (arXiv:2202.13163v2 [stat.ML] UPDATED)
    We consider reinforcement learning (RL) methods in offline domains without additional online data collection, such as mobile health applications. Most of existing policy optimization algorithms in the computer science literature are developed in online settings where data are easy to collect or simulate. Their generalizations to mobile health applications with a pre-collected offline dataset remain unknown. The aim of this paper is to develop a novel advantage learning framework in order to efficiently use pre-collected data for policy optimization. The proposed method takes an optimal Q-estimator computed by any existing state-of-the-art RL algorithms as input, and outputs a new policy whose value is guaranteed to converge at a faster rate than the policy derived based on the initial Q-estimator. Extensive numerical experiments are conducted to back up our theoretical findings. A Python implementation of our proposed method is available at https://github.com/leyuanheart/SEAL.
    Robust Prediction Error Estimation with Monte-Carlo Methodology. (arXiv:2207.13612v1 [stat.ME])
    In this paper, we aim to estimate the prediction error of machine learning models under the true distribution of the data on hand. We consider the prediction model as a data-driven black-box function and quantify its statistical properties using non-parametric methods. We propose a novel sampling technique that takes advantage of the underlying probability distribution information embedded in the data. The proposed method combines two existing frameworks for estimating the prediction inaccuracy error; $m$ out of $n$ bootstrapping and iterative bootstrapping. $m$ out of $n$ bootstrapping is to maintain the consistency, and iterative bootstrapping is often used for bias correction of the prediction error estimation. Using Monte-Carlo uncertainty quantification techniques, we disintegrate the total variance of the estimator so the user can make informed decisions regarding measures to overcome the preventable errors. In addition, via the same Monte-Carlo framework, we provide a way to estimate the bias due to using the empirical distribution. This bias captures the sensitivity of the estimator to the on hand input data and help with understanding the robustness of the estimator. The application of the proposed uncertainty quantification is tested in a model selection case study using simulated and real datasets. We evaluate the performance of the proposed estimator in two frameworks; first, directly applying is as an optimization model to find the best model; second, fixing an optimization engine and use the proposed estimator as a fitness function withing the optimizer. Furthermore, we compare the asymptotic statistical properties and numerical results in a finite dataset of the proposed estimator with the existing state-of-the-art methods.
    Fast TreeSHAP: Accelerating SHAP Value Computation for Trees. (arXiv:2109.09847v3 [cs.LG] UPDATED)
    SHAP (SHapley Additive exPlanation) values are one of the leading tools for interpreting machine learning models, with strong theoretical guarantees (consistency, local accuracy) and a wide availability of implementations and use cases. Even though computing SHAP values takes exponential time in general, TreeSHAP takes polynomial time on tree-based models. While the speedup is significant, TreeSHAP can still dominate the computation time of industry-level machine learning solutions on datasets with millions or more entries, causing delays in post-hoc model diagnosis and interpretation service. In this paper we present two new algorithms, Fast TreeSHAP v1 and v2, designed to improve the computational efficiency of TreeSHAP for large datasets. We empirically find that Fast TreeSHAP v1 is 1.5x faster than TreeSHAP while keeping the memory cost unchanged. Similarly, Fast TreeSHAP v2 is 2.5x faster than TreeSHAP, at the cost of a slightly higher memory usage, thanks to the pre-computation of expensive TreeSHAP steps. We also show that Fast TreeSHAP v2 is well-suited for multi-time model interpretations, resulting in as high as 3x faster explanation of newly incoming samples.
    Open Source Vizier: Distributed Infrastructure and API for Reliable and Flexible Blackbox Optimization. (arXiv:2207.13676v1 [cs.LG])
    Vizier is the de-facto blackbox and hyperparameter optimization service across Google, having optimized some of Google's largest products and research efforts. To operate at the scale of tuning thousands of users' critical systems, Google Vizier solved key design challenges in providing multiple different features, while remaining fully fault-tolerant. In this paper, we introduce Open Source (OSS) Vizier, a standalone Python-based interface for blackbox optimization and research, based on the Google-internal Vizier infrastructure and framework. OSS Vizier provides an API capable of defining and solving a wide variety of optimization problems, including multi-metric, early stopping, transfer learning, and conditional search. Furthermore, it is designed to be a distributed system that assures reliability, and allows multiple parallel evaluations of the user's objective function. The flexible RPC-based infrastructure allows users to access OSS Vizier from binaries written in any language. OSS Vizier also provides a back-end ("Pythia") API that gives algorithm authors a way to interface new algorithms with the core OSS Vizier system. OSS Vizier is available at https://github.com/google/vizier.
    The Computational Limits of Deep Learning. (arXiv:2007.05558v2 [cs.LG] UPDATED)
    Deep learning's recent history has been one of achievement: from triumphing over humans in the game of Go to world-leading performance in image classification, voice recognition, translation, and other tasks. But this progress has come with a voracious appetite for computing power. This article catalogs the extent of this dependency, showing that progress across a wide variety of applications is strongly reliant on increases in computing power. Extrapolating forward this reliance reveals that progress along current lines is rapidly becoming economically, technically, and environmentally unsustainable. Thus, continued progress in these applications will require dramatically more computationally-efficient methods, which will either have to come from changes to deep learning or from moving to other machine learning methods.
    Improving Generalization of Batch Whitening by Convolutional Unit Optimization. (arXiv:2108.10629v2 [cs.CV] CROSS LISTED)
    Batch Whitening is a technique that accelerates and stabilizes training by transforming input features to have a zero mean (Centering) and a unit variance (Scaling), and by removing linear correlation between channels (Decorrelation). In commonly used structures, which are empirically optimized with Batch Normalization, the normalization layer appears between convolution and activation function. Following Batch Whitening studies have employed the same structure without further analysis; even Batch Whitening was analyzed on the premise that the input of a linear layer is whitened. To bridge the gap, we propose a new Convolutional Unit that is in line with the theory, and our method generally improves the performance of Batch Whitening. Moreover, we show the inefficacy of the original Convolutional Unit by investigating rank and correlation of features. As our method is employable off-the-shelf whitening modules, we use Iterative Normalization (IterNorm), the state-of-the-art whitening module, and obtain significantly improved performance on five image classification datasets: CIFAR-10, CIFAR-100, CUB-200-2011, Stanford Dogs, and ImageNet. Notably, we verify that our method improves stability and performance of whitening when using large learning rate, group size, and iteration number.
    Bioinspired random projections for robust, sparse classification. (arXiv:2206.09222v2 [stat.ML] UPDATED)
    Inspired by the use of random projections in biological sensing systems, we present a new algorithm for processing data in classification problems. This is based on observations of the human brain and the fruit fly's olfactory system and involves randomly projecting data into a space of greatly increased dimension before applying a cap operation to truncate the smaller entries. This leads to a simple algorithm that is very computationally efficient and can be used to either give a sparse representation with minimal loss in classification accuracy or give improved robustness, in the sense that classification accuracy is improved when noise is added to the data. This is demonstrated with numerical experiments, which supplement theoretical results demonstrating that the resulting signal transform is continuous and invertible, in an appropriate sense.
    Unsupervised Learning under Latent Label Shift. (arXiv:2207.13179v1 [cs.LG])
    What sorts of structure might enable a learner to discover classes from unlabeled data? Traditional approaches rely on feature-space similarity and heroic assumptions on the data. In this paper, we introduce unsupervised learning under Latent Label Shift (LLS), where we have access to unlabeled data from multiple domains such that the label marginals $p_d(y)$ can shift across domains but the class conditionals $p(\mathbf{x}|y)$ do not. This work instantiates a new principle for identifying classes: elements that shift together group together. For finite input spaces, we establish an isomorphism between LLS and topic modeling: inputs correspond to words, domains to documents, and labels to topics. Addressing continuous data, we prove that when each label's support contains a separable region, analogous to an anchor word, oracle access to $p(d|\mathbf{x})$ suffices to identify $p_d(y)$ and $p_d(y|\mathbf{x})$ up to permutation. Thus motivated, we introduce a practical algorithm that leverages domain-discriminative models as follows: (i) push examples through domain discriminator $p(d|\mathbf{x})$; (ii) discretize the data by clustering examples in $p(d|\mathbf{x})$ space; (iii) perform non-negative matrix factorization on the discrete data; (iv) combine the recovered $p(y|d)$ with the discriminator outputs $p(d|\mathbf{x})$ to compute $p_d(y|x) \; \forall d$. With semi-synthetic experiments, we show that our algorithm can leverage domain information to improve state of the art unsupervised classification methods. We reveal a failure mode of standard unsupervised classification methods when feature-space similarity does not indicate true groupings, and show empirically that our method better handles this case. Our results establish a deep connection between distribution shift and topic modeling, opening promising lines for future work.
    Faster online calibration without randomization: interval forecasts and the power of two choices. (arXiv:2204.13087v2 [cs.LG] UPDATED)
    We study the problem of making calibrated probabilistic forecasts for a binary sequence generated by an adversarial nature. Following the seminal paper of Foster and Vohra (1998), nature is often modeled as an adaptive adversary who sees all activity of the forecaster except the randomization that the forecaster may deploy. A number of papers have proposed randomized forecasting strategies that achieve an $\epsilon$-calibration error rate of $O(1/\sqrt{T})$, which we prove is tight in general. On the other hand, it is well known that it is not possible to be calibrated without randomization, or if nature also sees the forecaster's randomization; in both cases the calibration error could be $\Omega(1)$. Inspired by the equally seminal works on the "power of two choices" and imprecise probability theory, we study a small variant of the standard online calibration problem. The adversary gives the forecaster the option of making two nearby probabilistic forecasts, or equivalently an interval forecast of small width, and the endpoint closest to the revealed outcome is used to judge calibration. This power of two choices, or imprecise forecast, accords the forecaster with significant power -- we show that a faster $\epsilon$-calibration rate of $O(1/T)$ can be achieved even without deploying any randomization.
    Learning with Combinatorial Optimization Layers: a Probabilistic Approach. (arXiv:2207.13513v1 [stat.ML])
    Combinatorial optimization (CO) layers in machine learning (ML) pipelines are a powerful tool to tackle data-driven decision tasks, but they come with two main challenges. First, the solution of a CO problem often behaves as a piecewise constant function of its objective parameters. Given that ML pipelines are typically trained using stochastic gradient descent, the absence of slope information is very detrimental. Second, standard ML losses do not work well in combinatorial settings. A growing body of research addresses these challenges through diverse methods. Unfortunately, the lack of well-maintained implementations slows down the adoption of CO layers. In this paper, building upon previous works, we introduce a probabilistic perspective on CO layers, which lends itself naturally to approximate differentiation and the construction of structured losses. We recover many approaches from the literature as special cases, and we also derive new ones. Based on this unifying perspective, we present InferOpt.jl, an open-source Julia package that 1) allows turning any CO oracle with a linear objective into a differentiable layer, and 2) defines adequate losses to train pipelines containing such layers. Our library works with arbitrary optimization algorithms, and it is fully compatible with Julia's ML ecosystem. We demonstrate its abilities using a pathfinding problem on video game maps.
    Data-Driven Sample Average Approximation with Covariate Information. (arXiv:2207.13554v1 [math.OC])
    We study optimization for data-driven decision-making when we have observations of the uncertain parameters within the optimization model together with concurrent observations of covariates. Given a new covariate observation, the goal is to choose a decision that minimizes the expected cost conditioned on this observation. We investigate three data-driven frameworks that integrate a machine learning prediction model within a stochastic programming sample average approximation (SAA) for approximating the solution to this problem. Two of the SAA frameworks are new and use out-of-sample residuals of leave-one-out prediction models for scenario generation. The frameworks we investigate are flexible and accommodate parametric, nonparametric, and semiparametric regression techniques. We derive conditions on the data generation process, the prediction model, and the stochastic program under which solutions of these data-driven SAAs are consistent and asymptotically optimal, and also derive convergence rates and finite sample guarantees. Computational experiments validate our theoretical results, demonstrate the potential advantages of our data-driven formulations over existing approaches (even when the prediction model is misspecified), and illustrate the benefits of our new data-driven formulations in the limited data regime.
    Should Bank Stress Tests Be Fair?. (arXiv:2207.13319v1 [stat.ML])
    Regulatory stress tests have become the primary tool for setting capital requirements at the largest U.S. banks. The Federal Reserve uses confidential models to evaluate bank-specific outcomes for bank-specific portfolios in shared stress scenarios. As a matter of policy, the same models are used for all banks, despite considerable heterogeneity across institutions; individual banks have contended that some models are not suited to their businesses. Motivated by this debate, we ask, what is a fair aggregation of individually tailored models into a common model? We argue that simply pooling data across banks treats banks equally but is subject to two deficiencies: it may distort the impact of legitimate portfolio features, and it is vulnerable to implicit misdirection of legitimate information to infer bank identity. We compare various notions of regression fairness to address these deficiencies, considering both forecast accuracy and equal treatment. In the setting of linear models, we argue for estimating and then discarding centered bank fixed effects as preferable to simply ignoring differences across banks. We present evidence that the overall impact can be material. We also discuss extensions to nonlinear models.
    INTERACT: Achieving Low Sample and Communication Complexities in Decentralized Bilevel Learning over Networks. (arXiv:2207.13283v1 [cs.LG])
    In recent years, decentralized bilevel optimization problems have received increasing attention in the networking and machine learning communities thanks to their versatility in modeling decentralized learning problems over peer-to-peer networks (e.g., multi-agent meta-learning, multi-agent reinforcement learning, personalized training, and Byzantine-resilient learning). However, for decentralized bilevel optimization over peer-to-peer networks with limited computation and communication capabilities, how to achieve low sample and communication complexities are two fundamental challenges that remain under-explored so far. In this paper, we make the first attempt to investigate the class of decentralized bilevel optimization problems with nonconvex and strongly-convex structure corresponding to the outer and inner subproblems, respectively. Our main contributions in this paper are two-fold: i) We first propose a deterministic algorithm called INTERACT (inner-gradient-descent-outer-tracked-gradient) that requires the sample complexity of $\mathcal{O}(n \epsilon^{-1})$ and communication complexity of $\mathcal{O}(\epsilon^{-1})$ to solve the bilevel optimization problem, where $n$ and $\epsilon > 0$ are the number of samples at each agent and the desired stationarity gap, respectively. ii) To relax the need for full gradient evaluations in each iteration, we propose a stochastic variance-reduced version of INTERACT (SVR-INTERACT), which improves the sample complexity to $\mathcal{O}(\sqrt{n} \epsilon^{-1})$ while achieving the same communication complexity as the deterministic algorithm. To our knowledge, this work is the first that achieves both low sample and communication complexities for solving decentralized bilevel optimization problems over networks. Our numerical experiments also corroborate our theoretical findings.
    Multi-Objective Hyperparameter Optimization -- An Overview. (arXiv:2206.07438v2 [cs.LG] UPDATED)
    Hyperparameter optimization constitutes a large part of typical modern machine learning workflows. This arises from the fact that machine learning methods and corresponding preprocessing steps often only yield optimal performance when hyperparameters are properly tuned. But in many applications, we are not only interested in optimizing ML pipelines solely for predictive accuracy; additional metrics or constraints must be considered when determining an optimal configuration, resulting in a multi-objective optimization problem. This is often neglected in practice, due to a lack of knowledge and readily available software implementations for multi-objective hyperparameter optimization. In this work, we introduce the reader to the basics of multi-objective hyperparameter optimization and motivate its usefulness in applied ML. Furthermore, we provide an extensive survey of existing optimization strategies, both from the domain of evolutionary algorithms and Bayesian optimization. We illustrate the utility of MOO in several specific ML applications, considering objectives such as operating conditions, prediction time, sparseness, fairness, interpretability and robustness.
    On generalization bounds for deep networks based on loss surface implicit regularization. (arXiv:2201.04545v2 [stat.ML] UPDATED)
    The classical statistical learning theory implies that fitting too many parameters leads to overfitting and poor performance. That modern deep neural networks generalize well despite a large number of parameters contradicts this finding and constitutes a major unsolved problem towards explaining the success of deep learning. While previous work focuses on the implicit regularization induced by stochastic gradient descent (SGD), we study here how the local geometry of the energy landscape around local minima affects the statistical properties of SGD with Gaussian gradient noise. We argue that under reasonable assumptions, the local geometry forces SGD to stay close to a low dimensional subspace and that this induces another form of implicit regularization and results in tighter bounds on the generalization error for deep neural networks. To derive generalization error bounds for neural networks, we first introduce a notion of stagnation sets around the local minima and impose a local essential convexity property of the population risk. Under these conditions, lower bounds for SGD to remain in these stagnation sets are derived. If stagnation occurs, we derive a bound on the generalization error of deep neural networks involving the spectral norms of the weight matrices but not the number of network parameters. Technically, our proofs are based on controlling the change of parameter values in the SGD iterates and local uniform convergence of the empirical loss functions based on the entropy of suitable neighborhoods around local minima.
    Handling Hard Affine SDP Shape Constraints in RKHSs. (arXiv:2101.01519v2 [stat.ML] UPDATED)
    Shape constraints, such as non-negativity, monotonicity, convexity or supermodularity, play a key role in various applications of machine learning and statistics. However, incorporating this side information into predictive models in a hard way (for example at all points of an interval) for rich function classes is a notoriously challenging problem. We propose a unified and modular convex optimization framework, relying on second-order cone (SOC) tightening, to encode hard affine SDP constraints on function derivatives, for models belonging to vector-valued reproducing kernel Hilbert spaces (vRKHSs). The modular nature of the proposed approach allows to simultaneously handle multiple shape constraints, and to tighten an infinite number of constraints into finitely many. We prove the convergence of the proposed scheme and that of its adaptive variant, leveraging geometric properties of vRKHSs. Due to the covering-based construction of the tightening, the method is particularly well-suited to tasks with small to moderate input dimensions. The efficiency of the approach is illustrated in the context of shape optimization, robotics and econometrics.
    Sliced Wasserstein Variational Inference. (arXiv:2207.13177v1 [stat.ML])
    Variational Inference approximates an unnormalized distribution via the minimization of Kullback-Leibler (KL) divergence. Although this divergence is efficient for computation and has been widely used in applications, it suffers from some unreasonable properties. For example, it is not a proper metric, i.e., it is non-symmetric and does not preserve the triangle inequality. On the other hand, optimal transport distances recently have shown some advantages over KL divergence. With the help of these advantages, we propose a new variational inference method by minimizing sliced Wasserstein distance, a valid metric arising from optimal transport. This sliced Wasserstein distance can be approximated simply by running MCMC but without solving any optimization problem. Our approximation also does not require a tractable density function of variational distributions so that approximating families can be amortized by generators like neural networks. Furthermore, we provide an analysis of the theoretical properties of our method. Experiments on synthetic and real data are illustrated to show the performance of the proposed method.
    Deep Partial Updating: Towards Communication Efficient Updating for On-device Inference. (arXiv:2007.03071v3 [cs.LG] UPDATED)
    Emerging edge intelligence applications require the server to retrain and update deep neural networks deployed on remote edge nodes to leverage newly collected data samples. Unfortunately, it may be impossible in practice to continuously send fully updated weights to these edge nodes due to the highly constrained communication resource. In this paper, we propose the weight-wise deep partial updating paradigm, which smartly selects a small subset of weights to update in each server-to-edge communication round, while achieving a similar performance compared to full updating. Our method is established through analytically upper-bounding the loss difference between partial updating and full updating, and only updates the weights which make the largest contributions to the upper bound. Extensive experimental results demonstrate the efficacy of our partial updating methodology which achieves a high inference accuracy while updating a rather small number of weights.  ( 2 min )
    Rethinking Efficacy of Softmax for Lightweight Non-Local Neural Networks. (arXiv:2207.13423v1 [cs.CV])
    Non-local (NL) block is a popular module that demonstrates the capability to model global contexts. However, NL block generally has heavy computation and memory costs, so it is impractical to apply the block to high-resolution feature maps. In this paper, to investigate the efficacy of NL block, we empirically analyze if the magnitude and direction of input feature vectors properly affect the attention between vectors. The results show the inefficacy of softmax operation which is generally used to normalize the attention map of the NL block. Attention maps normalized with softmax operation highly rely upon magnitude of key vectors, and performance is degenerated if the magnitude information is removed. By replacing softmax operation with the scaling factor, we demonstrate improved performance on CIFAR-10, CIFAR-100, and Tiny-ImageNet. In Addition, our method shows robustness to embedding channel reduction and embedding weight initialization. Notably, our method makes multi-head attention employable without additional computational cost.  ( 2 min )
    One Simple Trick to Fix Your Bayesian Neural Network. (arXiv:2207.13167v1 [stat.ML])
    One of the most popular estimation methods in Bayesian neural networks (BNN) is mean-field variational inference (MFVI). In this work, we show that neural networks with ReLU activation function induce posteriors, that are hard to fit with MFVI. We provide a theoretical justification for this phenomenon, study it empirically, and report the results of a series of experiments to investigate the effect of activation function on the calibration of BNNs. We find that using Leaky ReLU activations leads to more Gaussian-like weight posteriors and achieves a lower expected calibration error (ECE) than its ReLU-based counterpart.  ( 2 min )
    LGV: Boosting Adversarial Example Transferability from Large Geometric Vicinity. (arXiv:2207.13129v1 [cs.LG])
    We propose transferability from Large Geometric Vicinity (LGV), a new technique to increase the transferability of black-box adversarial attacks. LGV starts from a pretrained surrogate model and collects multiple weight sets from a few additional training epochs with a constant and high learning rate. LGV exploits two geometric properties that we relate to transferability. First, models that belong to a wider weight optimum are better surrogates. Second, we identify a subspace able to generate an effective surrogate ensemble among this wider optimum. Through extensive experiments, we show that LGV alone outperforms all (combinations of) four established test-time transformations by 1.8 to 59.9 percentage points. Our findings shed new light on the importance of the geometry of the weight space to explain the transferability of adversarial examples.  ( 2 min )

  • Open

    [D] good follow up venue to NeurIPS rejection?
    So my scores came in got 2/4/4/4/7 at NeurIPS, planning to rebuttal. Main strengths were extensive experiments run, achieved SOTA, and had good qualitative info. Also compliments on introduction and motivation. Main complaints were around typos, feeling the paper was rushed, not giving enough discussion on very specific citations, and feeling the solution methodology section was confusing and needed a serious rewrite. Also a couple of complaints on ablations or they wanted to see different things studied than what we studied. ​ I am trying to decide what is a good path forward with this paper, should I try to get into AAAI or should i just go for a good journal with quick review time to get it out there? submitted by /u/AbjectDrink3276 [link] [comments]  ( 111 min )
    "[Discussion]" Need advice for my master's program (online or in-person)
    Hey guys, I need adviceI'm planning to do my masters in AI, and I couldn't enroll in the thesis-based program, but I managed to get an offer from Queens Marry Uni (project base), generally, my goal is either to land a good job in the industry or enter the academia on the long term. However, now I saw the option of doing online programs which have almost the same course as the in-person "while being a way cheaper", there are some good universities to provide this (Georgia Tech, Leeds, Liverpool). So my question is doing it online would give me the same benefits as the in-person (for project-based) ?? also at this point does it actually matter that much whether to get an online or in-person program?? also, has anyone tried the online program from these universities and can share the experience? submitted by /u/Mogady [link] [comments]  ( 88 min )
    [D] What methods/tools should I use for a combination of linear and non-linear tabular data?
    Title. The non-linear data cannot be transformed into a linear form. What methods or tools should I use for this submitted by /u/NathanA2C [link] [comments]  ( 106 min )
    [D] Is self driving entirely machine learning?
    It's my understanding that labeling needed for the car to understand its surrounding is done by a neural net or some other machine learning technique. What I'm curious about is whether the decisions of how to operate the car based on it's labeled surroundings is done with more conventional programming like, "If I'm about to hit this thing labeled as a wall, then brake" or "If the bounds of the road angle to the left, then steer left" or if a black box neural net approach is used where we train it to less deterministically produce certain outputs based on the conditions of the labels? TLDR: is self driving label -> black box neural net -> control output OR label -> if/then -> control output submitted by /u/entropythagorean [link] [comments]  ( 89 min )
    [R] [D] Mythbusting my preconceptions of ML
    So, I have always been interested in getting into practical ML but am unsure how/where to start. Where do you think I should start with my journey? I am a student who is digitally literate and analytical, but I want to avoid as many obstacles to my being able to use ML in a practical sense at work. I am at a data-based company looking to create strong resources, so I guess I am interested in the benefits of using ML in administrative work? Please send help haha submitted by /u/Jad0Matic [link] [comments]  ( 87 min )
    [R] Geometric Deep Learning Lecture Course (AMMI'22)
    Hi everyone, I am pleased to share with you all, our new & improved material for diving into geometric deep learning! For a second year in a row, Michael Bronstein (Oxford / Twitter), Joan Bruna (NYU), Taco Cohen (Qualcomm) and I have delivered our Master's course on Geometric DL for the African Master's in Machine Intelligence, designed to closely follow our proto-book released last year. We make all materials publicly available! https://geometricdeeplearning.com/lectures/ For 2022, we made careful modifications to our content, making it more streamlined and (hopefully) more accessible! This features, among other things: A revamped introductory lecture, with a plethora of new historical context on deep learning and geometry; Clearer discussion of Transformers, and how they fit int…  ( 91 min )
    [D] Help needed! The code in the OpenAI gym documentation does not work.
    I am an absolute beginner in reinforcement learning. I'm trying to execute the second code snippet given here. I'm using python version 3.9.12 as part of the anaconda package. Curiously, no error is thrown when I try to execute this code in a kaggle notebook except for the fact that the notebook can obviously not display the output environment. I checked the version of python in kaggle, and it's 3.7.12Is that the cause behind this issue? Moreover, I was playing around with the code given in the documentation and was able to modify it such that it inadvertently worked natively on my machine. Attaching a screenshot of my code. Can somebody please tell me if I'm doing something wrong? If it is because of the python version, what kind of changes would I have to make in the code given in the OpenAI documentation? Thanks in advance. My code submitted by /u/Zephyrus_2002 [link] [comments]  ( 88 min )
    Do you know any prior work on quantifying Reinforcement Learning environment difficulty / complexity? [Discussion]
    Hi, I am interested in learning more about frameworks for characterizing the relative complexity of Reinforcement Learning environments. This can be used to better understand comparable problems and compare across environments: e.g. How much harder is Mountain Car than Cartpole? There are many different characteristics that define environments and many different problem formulations - some of which are likely not meaningfully comparable quantitatively (single agent vs multi agent setup) and some that should be (low dimensional action space vs high dimensional action space) Here are some different dimensions of environment difficulty split by problem setup and relative complexity Problem formulation dimensions: - number of agents: single or multi agent - stochasticity: is the environment stochastic or deterministic - action space: discrete or continuous Complexity dimensions: - dimensionality: high dimensionality state and action space - credit assignment: delayed rewards - state representation: noisy signal from raw pixels vs cleanly represented state - small number of solutions: some environments require a specific sequential pattern to be discovered (and remembered) E.g. Montezumas revenge vs others have many solutions such as Cartpole - how sensitive the environment is to initial conditions Does anyone know which subfield this falls under? Or can you please link relevant papers / where I can go to learn more? submitted by /u/notabot789 [link] [comments]  ( 88 min )
    [D] Is it possible to use machine learning to create 3D images for the purpose of 3D printing?
    I think this is a longshot but I was thinking that I could build a model gathering image data to create a model that creates 3D images that can be added to 3D printing software so it could 3D print the model and sell it on Amazon. Some items could include 3D printed toys or statues or decorations, small stuff you could add to a desk or somewhere in your room or purchase it as a gift. Easier said than done, I assume but would such a thing be possible? submitted by /u/swagonflyyyy [link] [comments]  ( 88 min )
    [D] A Semi-automatic approach for Generating Video Trailers for Learning Pathways (Poster Walkthrough)
    In this video, I present a walkthrough of my poster "A Semi-automatic approach for Generating Video Trailers for Learning Pathways" that got accepted at the venue AIED 2022. I will be sharing the paper soon. Let me know your thoughts 💭 Much Appreciated! 🤗 https://youtu.be/Y93GXvVERmk submitted by /u/prakhar21 [link] [comments]  ( 87 min )
    [P] Luminaire v0.4.0 Release with Support up to python 3.10
    Excited to share that the latest Luminaire v0.4.0 release has several new capabilities with support up to python 3.10 and other package upgrades. Checkout the latest release here: https://github.com/zillow/luminaire submitted by /u/sayan341 [link] [comments]  ( 87 min )
    [D] Reading Group Presentation: Scalable Video-to-Speech Synthesis
    ​ https://preview.redd.it/uewvxpt7f4e91.png?width=1200&format=png&auto=webp&s=e2a3a89df1337b2df0f81d39cb75b4774e163bd7 outsystems-ai-reading-group.github.io for more info submitted by /u/JClub [link] [comments]  ( 109 min )
    [D] Albumentations VS Detectron2
    How the augmentations of Detectron2 regarding HSV (https://github.com/facebookresearch/detectron2/blob/48b598b4f61fbb24182a69b521b2a0ba3252b842/detectron2/data/transforms/augmentation_impl.py) correlate with albumentations ones - ColorJitter (https://albumentations.ai/docs/api_reference/augmentations/transforms/)? submitted by /u/giakou4 [link] [comments]  ( 87 min )
    [D] Is anyone training large language models on academic literature?
    I am wondering whether someone is trying to train LLMs on academic literature. I am thinking If openAI codex can spit out functional code from training on all publically available code, surely a model trained on all digital books and research papers can see patterns across different domains and generate surprising insights. If it works, it can be ground breaking in terms of pushing forward science since the scientific disciplines have become so specialized that humans cannot become experts within multiple disciplines within a lifetime but a machine may have a chance at it! Ideas, suggestions are welcome. submitted by /u/GullibleEngineer4 [link] [comments]  ( 88 min )
    [D] What do you think will be the most exciting thing in ML three years from now?
    I would list the most interesting things happening in machine learning right now to be: GPT-3, Gato, Dalle2 (creating incredible models by just pouring in data into them) NERF What do you think we will be most exited about three years from now? GPT-3 was released two years ago. submitted by /u/ThePerson654321 [link] [comments]  ( 93 min )
    What is the "major bottleneck" for "self driving cars"? "[D]"
    Question 1) I was wondering if anyone here can ELI5 (or even idiot-er) could explain something about "the major bottleneck" that I keep reading about with "error processing" or whatever is "the major issue" with Tesla. For the record, this is not laziness but practicality. There is simply too much to keep up with and I am too busy tryin' to survive. If anyone is willing to help me, I thank you in advance. (I just wanna keep up, but I can't get it done alone. Sad face) JUST CHECKING IN WITH AN EDIT AT 10:30 AM OR SO: Question 2) So, from what I gather, the issue is, no one really knows why in the hells error checking does not work? Am I understanding that correctly? If you answered post edit can you reference if you are answering question 1 or question 2. You do not have to, but it would help my scattered mind! submitted by /u/TheBloneRanger [link] [comments]  ( 99 min )
    [P] I Made An Easy-To-Use Python Package That Creates Beautiful Html Reports From Jupyter Notebooks
    Pretty Jupyter is an easy-to-use package that creates beautifully styled and dynamic html webpage from Jupyter notebook. Its repo is available here: https://github.com/JanPalasek/pretty-jupyter . Check out the demo and compare it with the default jupyter. You can try also Pretty Jupyter online without the need to install it. Main Features Visually appealing styles. Automatic Table of Contents generation. Tabsets: Tabs that hold section content inside them. Using Python variables in Markdown: Helps in creating dynamic reports. Code Folding: Show/Hide code to filter out unnecessary content. submitted by /u/Jan2579 [link] [comments]  ( 89 min )
    [D] Choosing right aws instance for training.
    Currently we have training jobs of both image models and large language models. I am having difficult timing choosing between g4 and p3 instances. Please suggest a multi gpu instance that is cost as well as time optimised(we plan to add a savings plan over that). If there is a benchmarks to compare the two from each other please share. submitted by /u/Garlic-Naan-7249 [link] [comments]  ( 88 min )
    [P] Git-based model registry
    Hey everyone! We are excited today to announce the release of our ML model registry for Iterative Studio (from the team behind DVC). This Model Registry is an UI for our open source tool (MLEM) we introduced in this subreddit earlier this year. Our philosophy is that ML projects - and MLOps practices - should be built on top of traditional software tools (such as Git), and not as a separate platform. Our goal is to extend DevOps’ wins from software development to ML. Git repository as single source of truth for models - the core principle behind our registry. This idea is not new if you are familiar with GitOps. We just implemented the model deployment specific workflow using this ideas. Technically, all is stored in Git repository: assign a version to a model - it creates a corresponded Git tag in your repository deploy model to production - a special Git tag is pushed and your CI/CD system triggers for model deployment. ML model description and a link to a file in storage (S3, Azure Blob) - is stored in text file in Git. This functionality can be used from open source tool mlem.ai and our released UI that helps to visualize the entire inventory of your models - https://studio.iterative.ai/ Iterative Model Registry We would love your feedback on using GitOps principles in model deployment! submitted by /u/dmpetrov [link] [comments]  ( 89 min )
    [P] How To Train NPCs With Video? VR Badminton Project
    Currently with my team we are developing a VR badminton game and we are looking for ways to train the NPCs to play as realistic as possible. We were thinking if there was a way to train the NPCs with video from real badminton players? Any help or idea is highly appreciated! Thanks in advance. submitted by /u/ZimaLion [link] [comments]  ( 90 min )
    [D] How does the loss used in Imagen differ from the loss used in IDDPM?
    The loss term used in the paper Improved Denoising Diffusion Probabilistic Models is called L_hybrid. It is a mixture of 2 things: - The variational lower bound loss - The MSE loss between what the model predicts I've been staring at the loss term used in Imagen, and I haven't been able to make heads or tails of it. It seems like they use a MSE loss between the model's predicted noise and the actual noise, but do they also do anything else? (For example: utilize the variational lower bound loss). ​ ​ Imagen Loss submitted by /u/vanilla-acc [link] [comments]  ( 88 min )
  • Open

    "Lake Eye" user creation on pixelz.ai
    submitted by /u/PixelzJ [link] [comments]  ( 86 min )
    A Multi-Model Approach to Synthetic Data Generation
    submitted by /u/Repeat-or [link] [comments]  ( 86 min )
    Researchers at Graz University of Technology Develop AdaNeRF: Adaptive Sampling for Real-time Rendering of Neural Radiance Fields Directly from Sparse Observations
    submitted by /u/ai-lover [link] [comments]  ( 87 min )
    A COMPUTER WROTE POETRY !
    Here in this video, we will check out some code that Vish had written in the past which takes lyrics stored in some files and the computer writes its own code!! Amazing. Not only are we here to educate about the power of computers but get into great discussions about where the technology comes from and where it could possibly take us in the future! Join the community in this great adventure. We have created our own company to further us on this venture of inspiring tech heads and entrepreneurs. https://www.drpinnacle.com/blog ​ https://youtu.be/xoNudNcDuXc submitted by /u/malwaregeek [link] [comments]  ( 86 min )
    Cool use of combining image and text generation to create Magic the Gathering cards
    submitted by /u/BeautifulVegetable10 [link] [comments]  ( 86 min )
    Storm Clouds
    submitted by /u/widgia [link] [comments]  ( 86 min )
    A.I. Speech Generator
    I'm trying to find something I can feed dialogue from a character and create a speech generator of it. My goal is to get the english voices dubbed over some japan-exclusive anime episodes. submitted by /u/outstandingowl [link] [comments]  ( 87 min )
    Hungry Baby Alarm
    submitted by /u/GoochCommander [link] [comments]  ( 86 min )
    How You Can Use AI / Automation to Make Money (from home)
    submitted by /u/kbf_ [link] [comments]  ( 86 min )
    Darkened bridges sink away into the brackishness Swirling sin into a rainbow of atrophy When the winters help the golden autumns take it’s leave
    submitted by /u/nalr00n [link] [comments]  ( 91 min )
  • Open

    Look and Talk: Natural Conversations with Google Assistant
    Posted by Tuan Anh Nguyen, Google Assistant and Sourish Chaudhuri, Google Research In natural conversations, we don't say people's names every time we speak to each other. Instead, we rely on contextual signaling mechanisms to initiate conversations, and eye contact is often all it takes. Google Assistant, now available in more than 95 countries and over 29 languages, has primarily relied on a hotword mechanism ("Hey Google" or “OK Google”) to help more than 700 million people every month get things done across Assistant devices. As virtual assistants become an integral part of our everyday lives, we're developing ways to initiate conversations more naturally. At Google I/O 2022, we announced Look and Talk, a major development in our journey to create natural and intuitive ways to intera…  ( 26 min )
  • Open

    Help needed! The code in the OpenAI gym documentation does not work.
    I am an absolute beginner in reinforcement learning. I'm trying to execute the second code snippet given here. I'm using python version 3.9.12 as part of the anaconda package. Curiously, no error is thrown when I try to execute this code in a kaggle notebook except for the fact that the notebook can obviously not display the output environment. I checked the version of python in kaggle, and it's 3.7.12 Is that the cause behind this issue? Moreover, I was playing around with the code given in the documentation and was able to modify it such that it inadvertently worked natively on my machine. Attaching a screenshot of my code. Can somebody please tell me if I'm doing something wrong? If it is because of the python version, what kind of changes would I have to make in the code given in the OpenAI documentation? Thanks in advance. My code submitted by /u/Zephyrus_2002 [link] [comments]  ( 87 min )
    Do you know any prior work on quantifying RL environment difficulty / complexity?
    Hi, I am interested in learning more about frameworks for characterizing the relative complexity of RL environments. This can be used to better understand comparable problems and compare across environments: e.g. How much harder is Mountain Car than Cartpole? There are many different characteristics that define environments and many different problem formulations - some of which are likely not meaningfully comparable quantitatively (single agent vs multi agent setup) and some that should be (low dimensional action space vs high dimensional action space) Here are some different dimensions of environment difficulty split by problem setup and relative complexity Problem formulation dimensions: - number of agents: single or multi agent - stochasticity: is the environment stochastic or deterministic - action space: discrete or continuous Complexity dimensions: - dimensionality: high dimensionality state and action space - credit assignment: delayed rewards - state representation: noisy signal from raw pixels vs cleanly represented state - small number of solutions: some environments require a specific sequential pattern to be discovered (and remembered) E.g. Montezuma's revenge vs others have many solutions such as Cartpole Does anyone know which subfield this falls under? Or can you please link relevant papers / where I can go to learn more? submitted by /u/notabot789 [link] [comments]  ( 87 min )
    "Offline Reinforcement Learning at Multiple Frequencies", Burns et al 2022
    submitted by /u/gwern [link] [comments]  ( 94 min )
    In Multi Agent Reinforcement Learning, what exactly does "coordinated actions" means? Do they mean similar actions or something else. How does this work out? Can someone explain.
    I was reading a paper where it says MARL leads to coordinated actions between the agents. Does Centralized critic helps to make all agent's actions coordinated. Can someone give an example. Thanks submitted by /u/aabra__ka__daabra [link] [comments]  ( 87 min )
    credit assignment problem
    Can anyone explain what is the term "credit assignment problem" in the context of RL? Here you find some excerpts from books: - "If γ is small, then an agent will only care about the rewards received in the current time step and just a few steps in the future. This effectively reduces the length of the RL problem to a few time steps and can drastically simplify the credit assignment problem." - "Speeding up TD learning amounts to either speeding up the credit assignment process or shortening the trial-and-error process" submitted by /u/rlopes404 [link] [comments]  ( 88 min )
    Action space for MultiDiscrete
    Hi guys, So I'm currently doing some experiment with RL and stumbled on a problem involving getting action space. (I'm using gym-wrappers for ML-Agents) With Discrete, we can extract it with env.action_space.n With Box, we can use env.action_space.shape[0] I wonder if there is any way for me to extract action space for MultiDiscrete space? Thanks in advance submitted by /u/feelingBlue_44260264 [link] [comments]  ( 86 min )
    reinforcement learing to play cuphead videogame
    Hi everyone, since I haven't found any complete project that uses reinforcement learning to play cuphed (the videogame) I was wondering: How difficult is it to implement? what are the main issues? submitted by /u/rotitJ [link] [comments]  ( 91 min )
  • Open

    Integrate Amazon SageMaker Data Wrangler with MLOps workflows
    As enterprises move from running ad hoc machine learning (ML) models to using AI/ML to transform their business at scale, the adoption of ML Operations (MLOps) becomes inevitable. As shown in the following figure, the ML lifecycle begins with framing a business problem as an ML use case followed by a series of phases, including […]  ( 13 min )
  • Open

    nbdev+Quarto: A new secret weapon for productivity
    Contents Our new secret weapon for productivity nbdev in industry What’s nbdev? What we learned after three years of using nbdev Enter Quarto: A pandoc super-processor A blazing fast notebook kernel: execnb Towards a dialect of python that embraces its dynamic nature The future of nbdev How you can get started with nbdev Thank You A conversation with JJ Allaire Our new secret weapon for productivity Today we’re excited to announce that we’ve teamed up with Quarto to give nbdev superpowers. nbdev offers Python programmers a common set of tools for using Jupyter notebooks to: Write & distribute software packages Test code, and Author documentation and technical articles A single notebook can create a python module, tests, CI, pypi/conda packages, and more. Although notebooks are already…  ( 9 min )
  • Open

    My journey and switch into Data Science — what you could learn from my journey
    Here I explain my journey switching my career into Data Science in my late 30s, my thinking, motivations, expectations, courses I took… Continue reading on Becoming Human: Artificial Intelligence Magazine »  ( 23 min )
  • Open

    Top uses of QR Codes for Co-working Spaces
    Co-working spaces have existed for some time now it offers convenient amenities who work in a more conducive environment. Following the CDC guidelines, incorporating QR codes into these co-working spaces allow customers to book their slots easily. ghost bookings is prevented by using an QR code generator online that is easily tracked. According to the… Read More »Top uses of QR Codes for Co-working Spaces The post Top uses of QR Codes for Co-working Spaces appeared first on Data Science Central.  ( 19 min )
    Best IoT Projects for Beginners
    IoT has been a popular topic for a while now. It is only sensible to learn and train if you want to begin a profession in this area. And what better approach is there to developing initiatives around talent than to master it? This blog will discuss simple IoT projects that you may get started… Read More »Best IoT Projects for Beginners The post Best IoT Projects for Beginners appeared first on Data Science Central.  ( 23 min )
  • Open

    KamNet: An Integrated Spatiotemporal Deep Neural Network for Rare Event Search in KamLAND-Zen. (arXiv:2203.01870v5 [physics.ins-det] UPDATED)
    Rare event searches allow us to search for new physics at energy scales inaccessible with other means by leveraging specialized large-mass detectors. Machine learning provides a new tool to maximize the information provided by these detectors. The information is sparse, which forces these algorithms to start from the lowest level data and exploit all symmetries in the detector to produce results. In this work we present KamNet which harnesses breakthroughs in geometric deep learning and spatiotemporal data analysis to maximize the physics reach of KamLAND-Zen, a kiloton scale spherical liquid scintillator detector searching for neutrinoless double beta decay ($0\nu\beta\beta$). Using a simplified background model for KamLAND we show that KamNet outperforms a conventional CNN on benchmarking MC simulations with an increasing level of robustness. Using simulated data, we then demonstrate KamNet's ability to increase KamLAND-Zen's sensitivity to $0\nu\beta\beta$ and $0\nu\beta\beta$ to excited states. A key component of this work is the addition of an attention mechanism to elucidate the underlying physics KamNet is using for the background rejection.
    Folding over Neural Networks. (arXiv:2207.01090v2 [cs.PL] UPDATED)
    Neural networks are typically represented as data structures that are traversed either through iteration or by manual chaining of method calls. However, a deeper analysis reveals that structured recursion can be used instead, so that traversal is directed by the structure of the network itself. This paper shows how such an approach can be realised in Haskell, by encoding neural networks as recursive data types, and then their training as recursion scheme patterns. In turn, we promote a coherent implementation of neural networks that delineates between their structure and semantics, allowing for compositionality in both how they are built and how they are trained.
    An Adaptive Deep Clustering Pipeline to Inform Text Labeling at Scale. (arXiv:2202.01211v2 [cs.CL] UPDATED)
    Mining the latent intentions from large volumes of natural language inputs is a key step to help data analysts design and refine Intelligent Virtual Assistants (IVAs) for customer service and sales support. We created a flexible and scalable clustering pipeline within the Verint Intent Manager (VIM) that integrates the fine-tuning of language models, a high performing k-NN library and community detection techniques to help analysts quickly surface and organize relevant user intentions from conversational texts. The fine-tuning step is necessary because pre-trained language models cannot encode texts to efficiently surface particular clustering structures when the target texts are from an unseen domain or the clustering task is not topic detection. We describe the pipeline and demonstrate its performance and ability to scale on three real-world text mining tasks. As deployed in the VIM application, this clustering pipeline produces high quality results, improving the performance of data analysts and reducing the time it takes to surface intentions from customer service data, thereby reducing the time it takes to build and deploy IVAs in new domains.
    Quantifying Inequality in Underreported Medical Conditions. (arXiv:2110.04133v2 [cs.CY] UPDATED)
    Estimating the prevalence of a medical condition, or the proportion of the population in which it occurs, is a fundamental problem in healthcare and public health. Accurate estimates of the relative prevalence across groups -- capturing, for example, that a condition affects women more frequently than men -- facilitate effective and equitable health policy which prioritizes groups who are disproportionately affected by a condition. However, it is difficult to estimate relative prevalence when a medical condition is underreported. In this work, we provide a method for accurately estimating the relative prevalence of underreported medical conditions, building upon the positive unlabeled learning framework. We show that under the commonly made covariate shift assumption -- i.e., that the probability of having a disease conditional on symptoms remains constant across groups -- we can recover the relative prevalence, even without restrictive assumptions commonly made in positive unlabeled learning and even if it is impossible to recover the absolute prevalence. We provide a suite of experiments on synthetic and real health data that demonstrate our method's ability to recover the relative prevalence more accurately than do baselines, and the method's robustness to plausible violations of the covariate shift assumption.
    Hyperdimensional Computing vs. Neural Networks: Comparing Architecture and Learning Process. (arXiv:2207.12932v1 [cs.NE])
    Hyperdimensional Computing (HDC) has obtained abundant attention as an emerging non von Neumann computing paradigm. Inspired by the way human brain functions, HDC leverages high dimensional patterns to perform learning tasks. Compared to neural networks, HDC has shown advantages such as energy efficiency and smaller model size, but sub-par learning capabilities in sophisticated applications. Recently, researchers have observed when combined with neural network components, HDC can achieve better performance than conventional HDC models. This motivates us to explore the deeper insights behind theoretical foundations of HDC, particularly the connection and differences with neural networks. In this paper, we make a comparative study between HDC and neural network to provide a different angle where HDC can be derived from an extremely compact neural network trained upfront. Experimental results show such neural network-derived HDC model can achieve up to 21% and 5% accuracy increase from conventional and learning-based HDC models respectively. This paper aims to provide more insights and shed lights on future directions for researches on this popular emerging learning scheme.
    The Optimal Noise in Noise-Contrastive Learning Is Not What You Think. (arXiv:2203.01110v2 [stat.ML] UPDATED)
    Learning a parametric model of a data distribution is a well-known statistical problem that has seen renewed interest as it is brought to scale in deep learning. Framing the problem as a self-supervised task, where data samples are discriminated from noise samples, is at the core of state-of-the-art methods, beginning with Noise-Contrastive Estimation (NCE). Yet, such contrastive learning requires a good noise distribution, which is hard to specify; domain-specific heuristics are therefore widely used. While a comprehensive theory is missing, it is widely assumed that the optimal noise should in practice be made equal to the data, both in distribution and proportion. This setting underlies Generative Adversarial Networks (GANs) in particular. Here, we empirically and theoretically challenge this assumption on the optimal noise. We show that deviating from this assumption can actually lead to better statistical estimators, in terms of asymptotic variance. In particular, the optimal noise distribution is different from the data's and even from a different family.
    Demystifying Graph Convolution with a Simple Concatenation. (arXiv:2207.12931v1 [cs.LG])
    Graph convolution (GConv) is a widely used technique that has been demonstrated to be extremely effective for graph learning applications, most notably node categorization. On the other hand, many GConv-based models do not quantify the effect of graph topology and node features on performance, and are even surpassed by some models that do not consider graph structure or node properties. We quantify the information overlap between graph topology, node features, and labels in order to determine graph convolution's representation power in the node classification task. In this work, we first determine the linear separability of graph convoluted features using analysis of variance. Mutual information is used to acquire a better understanding of the possible non-linear relationship between graph topology, node features, and labels. Our theoretical analysis demonstrates that a simple and efficient graph operation that concatenates only graph topology and node properties consistently outperforms conventional graph convolution, especially in the heterophily case. Extensive empirical research utilizing a synthetic dataset and real-world benchmarks demonstrates that graph concatenation is a simple but more flexible alternative to graph convolution.
    Cooperative Actor-Critic via TD Error Aggregation. (arXiv:2207.12533v1 [eess.SY])
    In decentralized cooperative multi-agent reinforcement learning, agents can aggregate information from one another to learn policies that maximize a team-average objective function. Despite the willingness to cooperate with others, the individual agents may find direct sharing of information about their local state, reward, and value function undesirable due to privacy issues. In this work, we introduce a decentralized actor-critic algorithm with TD error aggregation that does not violate privacy issues and assumes that communication channels are subject to time delays and packet dropouts. The cost we pay for making such weak assumptions is an increased communication burden for every agent as measured by the dimension of the transmitted data. Interestingly, the communication burden is only quadratic in the graph size, which renders the algorithm applicable in large networks. We provide a convergence analysis under diminishing step size to verify that the agents maximize the team-average objective function.
    Efficient Algorithms for Sparse Moment Problems without Separation. (arXiv:2207.13008v1 [cs.LG])
    We consider the sparse moment problem of learning a $k$-spike mixture in high dimensional space from its noisy moment information in any dimension. We measure the accuracy of the learned mixtures using transportation distance. Previous algorithms either assume certain separation assumptions, use more recovery moments, or run in (super) exponential time. Our algorithm for the 1-dimension problem (also called the sparse Hausdorff moment problem) is a robust version of the classic Prony's method, and our contribution mainly lies in the analysis. We adopt a global and much tighter analysis than previous work (which analyzes the perturbation of the intermediate results of Prony's method). A useful technical ingredient is a connection between the linear system defined by the Vandermonde matrix and the Schur polynomial, which allows us to provide tight perturbation bound independent of the separation and may be useful in other contexts. To tackle the high dimensional problem, we first solve the 2-dimensional problem by extending the 1-dimension algorithm and analysis to complex numbers. Our algorithm for the high dimensional case determines the coordinates of each spike by aligning a 1-d projection of the mixture to a random vector and a set of 2d-projections of the mixture. Our results have applications to learning topic models and Gaussian mixtures, implying improved sample complexity results or running time over prior work.
    Differentially Private Estimation via Statistical Depth. (arXiv:2207.12602v1 [stat.ML])
    Constructing a differentially private (DP) estimator requires deriving the maximum influence of an observation, which can be difficult in the absence of exogenous bounds on the input data or the estimator, especially in high dimensional settings. This paper shows that standard notions of statistical depth, i.e., halfspace depth and regression depth, are particularly advantageous in this regard, both in the sense that the maximum influence of a single observation is easy to analyze and that this value is typically low. This is used to motivate new approximate DP location and regression estimators using the maximizers of these two notions of statistical depth. A more computationally efficient variant of the approximate DP regression estimator is also provided. Also, to avoid requiring that users specify a priori bounds on the estimates and/or the observations, variants of these DP mechanisms are described that satisfy random differential privacy (RDP), which is a relaxation of differential privacy provided by Hall, Wasserman, and Rinaldo (2013). We also provide simulations of the two DP regression methods proposed here. The proposed estimators appear to perform favorably relative to the existing DP regression methods we consider in these simulations when either the sample size is at least 100-200 or the privacy-loss budget is sufficiently high.
    Physics Embedded Machine Learning for Electromagnetic Data Imaging. (arXiv:2207.12607v1 [physics.comp-ph])
    Electromagnetic (EM) imaging is widely applied in sensing for security, biomedicine, geophysics, and various industries. It is an ill-posed inverse problem whose solution is usually computationally expensive. Machine learning (ML) techniques and especially deep learning (DL) show potential in fast and accurate imaging. However, the high performance of purely data-driven approaches relies on constructing a training set that is statistically consistent with practical scenarios, which is often not possible in EM imaging tasks. Consequently, generalizability becomes a major concern. On the other hand, physical principles underlie EM phenomena and provide baselines for current imaging techniques. To benefit from prior knowledge in big data and the theoretical constraint of physical laws, physics embedded ML methods for EM imaging have become the focus of a large body of recent work. This article surveys various schemes to incorporate physics in learning-based EM imaging. We first introduce background on EM imaging and basic formulations of the inverse problem. We then focus on three types of strategies combining physics and ML for linear and nonlinear imaging and discuss their advantages and limitations. Finally, we conclude with open challenges and possible ways forward in this fast-developing field. Our aim is to facilitate the study of intelligent EM imaging methods that will be efficient, interpretable and controllable.
    Versatile Weight Attack via Flipping Limited Bits. (arXiv:2207.12405v1 [cs.CR])
    To explore the vulnerability of deep neural networks (DNNs), many attack paradigms have been well studied, such as the poisoning-based backdoor attack in the training stage and the adversarial attack in the inference stage. In this paper, we study a novel attack paradigm, which modifies model parameters in the deployment stage. Considering the effectiveness and stealthiness goals, we provide a general formulation to perform the bit-flip based weight attack, where the effectiveness term could be customized depending on the attacker's purpose. Furthermore, we present two cases of the general formulation with different malicious purposes, i.e., single sample attack (SSA) and triggered samples attack (TSA). To this end, we formulate this problem as a mixed integer programming (MIP) to jointly determine the state of the binary bits (0 or 1) in the memory and learn the sample modification. Utilizing the latest technique in integer programming, we equivalently reformulate this MIP problem as a continuous optimization problem, which can be effectively and efficiently solved using the alternating direction method of multipliers (ADMM) method. Consequently, the flipped critical bits can be easily determined through optimization, rather than using a heuristic strategy. Extensive experiments demonstrate the superiority of SSA and TSA in attacking DNNs.
    Machine Learning to Predict the Antimicrobial Activity of Cold Atmospheric Plasma-Activated Liquids. (arXiv:2207.12478v1 [cs.LG])
    Plasma is defined as the fourth state of matter and non-thermal plasma can be produced at atmospheric pressure under a high electrical field. The strong and broad-spectrum antimicrobial effect of plasma-activated liquids (PALs) is now well known. The proven applicability of machine learning (ML) in the medical field is encouraging for its application in the field of plasma medicine as well. Thus, ML applications on PALs could present a new perspective to better understand the influences of various parameters on their antimicrobial effects. In this paper, comparative supervised ML models are presented by using previously obtained data to qualitatively predict the in vitro antimicrobial activity of PALs. A literature search was performed and data is collected from 33 relevant articles. After the required preprocessing steps, two supervised ML methods, namely classification, and regression are applied to data to obtain microbial inactivation (MI) predictions. For classification, MI is labeled in four categories and for regression, MI is used as a continuous variable. Two different robust cross-validation strategies are conducted for classification and regression models to evaluate the proposed method; repeated stratified k-fold cross-validation and k-fold cross-validation, respectively. We also investigate the effect of different features on models. The results demonstrated that the hyperparameter-optimized Random Forest Classifier (oRFC) and Random Forest Regressor (oRFR) provided better results than other models for the classification and regression, respectively. Finally, the best test accuracy of 82.68% for oRFC and R2 of 0.75 for the oRFR are obtained. ML techniques could contribute to a better understanding of plasma parameters that have a dominant role in the desired antimicrobial effect. Furthermore, such findings may contribute to the definition of a plasma dose in the future.
    Approximate Low-Rank Decomposition for Real Symmetric Tensors. (arXiv:2207.12529v1 [math.NA])
    We investigate the effect of an $\varepsilon$-room of perturbation tolerance on symmetric tensor decomposition from an algorithmic perspective. More precisely, we prove theorems and design algorithms for the following problem: Suppose a real symmetric $d$-tensor $f$, a norm $||.||$ on the space of symmetric $d$-tensors, and $\varepsilon >0$ error tolerance with respect to $||.||$ are given. What is the smallest symmetric tensor rank in the $\varepsilon$-neighborhood of $f$? In other words, what is the symmetric tensor rank of $f$ after a clever $\varepsilon$-perturbation? We provide two different theoretical bounds and three algorithms for approximate symmetric tensor rank estimation. Our first result is a randomized energy increment algorithm for the case of $L_p$-norms. Our second result is a simple sampling-based algorithm, inspired by some techniques in geometric functional analysis, that works for any norm. We also provide a supplementary algorithm in the case of the Hilbert-Schmidt norm. All our algorithms come with rigorous complexity estimates, which in turn yield our two main theorems on symmetric tensor rank with $\varepsilon$-room of tolerance. We also report on our experiments with a preliminary implementation of the energy increment algorithm.
    Quiver neural networks. (arXiv:2207.12773v1 [cs.LG])
    We develop a uniform theoretical approach towards the analysis of various neural network connectivity architectures by introducing the notion of a quiver neural network. Inspired by quiver representation theory in mathematics, this approach gives a compact way to capture elaborate data flows in complex network architectures. As an application, we use parameter space symmetries to prove a lossless model compression algorithm for quiver neural networks with certain non-pointwise activations known as rescaling activations. In the case of radial rescaling activations, we prove that training the compressed model with gradient descent is equivalent to training the original model with projected gradient descent.
    Head2Toe: Utilizing Intermediate Representations for Better Transfer Learning. (arXiv:2201.03529v2 [cs.LG] UPDATED)
    Transfer-learning methods aim to improve performance in a data-scarce target domain using a model pretrained on a data-rich source domain. A cost-efficient strategy, linear probing, involves freezing the source model and training a new classification head for the target domain. This strategy is outperformed by a more costly but state-of-the-art method -- fine-tuning all parameters of the source model to the target domain -- possibly because fine-tuning allows the model to leverage useful information from intermediate layers which is otherwise discarded by the later pretrained layers. We explore the hypothesis that these intermediate layers might be directly exploited. We propose a method, Head-to-Toe probing (Head2Toe), that selects features from all layers of the source model to train a classification head for the target-domain. In evaluations on the VTAB-1k, Head2Toe matches performance obtained with fine-tuning on average while reducing training and storage cost hundred folds or more, but critically, for out-of-distribution transfer, Head2Toe outperforms fine-tuning.
    Variational Inference with Locally Enhanced Bounds for Hierarchical Models. (arXiv:2203.04432v2 [cs.LG] UPDATED)
    Hierarchical models represent a challenging setting for inference algorithms. MCMC methods struggle to scale to large models with many local variables and observations, and variational inference (VI) may fail to provide accurate approximations due to the use of simple variational families. Some variational methods (e.g. importance weighted VI) integrate Monte Carlo methods to give better accuracy, but these tend to be unsuitable for hierarchical models, as they do not allow for subsampling and their performance tends to degrade for high dimensional models. We propose a new family of variational bounds for hierarchical models, based on the application of tightening methods (e.g. importance weighting) separately for each group of local random variables. We show that our approach naturally allows the use of subsampling to get unbiased gradients, and that it fully leverages the power of methods that build tighter lower bounds by applying them independently in lower dimensional spaces, leading to better results and more accurate posterior approximations than relevant baselines.
    Engineering flexible machine learning systems by traversing functionally invariant paths in weight space. (arXiv:2205.00334v3 [cs.LG] UPDATED)
    Deep neural networks achieve human-like performance on a variety of perceptual and decision-making tasks. However, networks perform poorly when confronted with changing tasks or goals, and broadly fail to match the flexibility and robustness of human intelligence. Here, we develop a mathematical and algorithmic framework that enables flexible and continuous training of neural networks on a range of objectives by constructing path connected sets of networks that achieve equivalent functional performance on a given machine learning task. We view the weight space of a neural network as a curved Riemannian manifold and move a network along a functionally invariant path in weight space while searching for networks that satisfy secondary objectives. A path-sampling algorithm trains computer vision and natural language processing networks with millions of weight parameters to learn a series of classification tasks without performance loss while accommodating secondary objectives including network sparsification, incremental task learning, and increased adversarial robustness. Broadly, we conceptualize a neural network as a mathematical object that can be iteratively transformed into distinct configurations by the path-sampling algorithm to define a sub-manifold of networks that can be harnessed to achieve user goals.
    LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action. (arXiv:2207.04429v2 [cs.RO] UPDATED)
    Goal-conditioned policies for robotic navigation can be trained on large, unannotated datasets, providing for good generalization to real-world settings. However, particularly in vision-based settings where specifying goals requires an image, this makes for an unnatural interface. Language provides a more convenient modality for communication with robots, but contemporary methods typically require expensive supervision, in the form of trajectories annotated with language descriptions. We present a system, LM-Nav, for robotic navigation that enjoys the benefits of training on unannotated large datasets of trajectories, while still providing a high-level interface to the user. Instead of utilizing a labeled instruction following dataset, we show that such a system can be constructed entirely out of pre-trained models for navigation (ViNG), image-language association (CLIP), and language modeling (GPT-3), without requiring any fine-tuning or language-annotated robot data. We instantiate LM-Nav on a real-world mobile robot and demonstrate long-horizon navigation through complex, outdoor environments from natural language instructions. For videos of our experiments, code release, and an interactive Colab notebook that runs in your browser, please check out our project page https://sites.google.com/view/lmnav
    Kan Extensions in Data Science and Machine Learning. (arXiv:2203.09018v2 [cs.LG] UPDATED)
    A common problem in data science is "use this function defined over this small set to generate predictions over that larger set." Extrapolation, interpolation, statistical inference and forecasting all reduce to this problem. The Kan extension is a powerful tool in category theory that generalizes this notion. In this work we explore several applications of Kan extensions to data science. We begin by deriving a simple classification algorithm as a Kan extension and experimenting with this algorithm on real data. Next, we use the Kan extension to derive a procedure for learning clustering algorithms from labels and explore the performance of this procedure on real data. We then investigate how Kan extensions can be used to learn a general mapping from datasets of labeled examples to functions and to approximate a complex function with a simpler one.
    PACS: A Dataset for Physical Audiovisual CommonSense Reasoning. (arXiv:2203.11130v2 [cs.LG] UPDATED)
    In order for AI to be safely deployed in real-world scenarios such as hospitals, schools, and the workplace, it must be able to robustly reason about the physical world. Fundamental to this reasoning is physical common sense: understanding the physical properties and affordances of available objects, how they can be manipulated, and how they interact with other objects. Physical commonsense reasoning is fundamentally a multi-sensory task, since physical properties are manifested through multiple modalities - two of them being vision and acoustics. Our paper takes a step towards real-world physical commonsense reasoning by contributing PACS: the first audiovisual benchmark annotated for physical commonsense attributes. PACS contains 13,400 question-answer pairs, involving 1,377 unique physical commonsense questions and 1,526 videos. Our dataset provides new opportunities to advance the research field of physical reasoning by bringing audio as a core component of this multimodal problem. Using PACS, we evaluate multiple state-of-the-art models on our new challenging task. While some models show promising results (70% accuracy), they all fall short of human performance (95% accuracy). We conclude the paper by demonstrating the importance of multimodal reasoning and providing possible avenues for future research.
    Federated Learning for Energy-limited Wireless Networks: A Partial Model Aggregation Approach. (arXiv:2204.09746v2 [cs.LG] UPDATED)
    The limited communication resources, e.g., bandwidth and energy, and data heterogeneity across devices are two of the main bottlenecks for federated learning (FL). To tackle these challenges, we first devise a novel FL framework with partial model aggregation (PMA), which only aggregates the lower layers of neural networks responsible for feature extraction while the upper layers corresponding to complex pattern recognition remain at devices for personalization. The proposed PMA-FL is able to address the data heterogeneity and reduce the transmitted information in wireless channels. We then obtain a convergence bound of the framework under a non-convex loss function setting. With the aid of this bound, we define a new objective function, named the scheduled data sample volume, to transfer the original inexplicit optimization problem into a tractable one for device scheduling, bandwidth allocation, computation and communication time division. Our analysis reveals that the optimal time division is achieved when the communication and computation parts of PMA-FL have the same power. We also develop a bisection method to solve the optimal bandwidth allocation policy and use the set expansion algorithm to address the optimal device scheduling. Compared with the state-of-the-art benchmarks, the proposed PMA-FL improves 2.72% and 11.6% accuracy on two typical heterogeneous datasets, i.e., MINIST and CIFAR-10, respectively. In addition, the proposed joint dynamic device scheduling and resource optimization approach achieve slightly higher accuracy than the considered benchmarks, but they provide a satisfactory energy and time reduction: 29% energy or 20% time reduction on the MNIST; and 25% energy or 12.5% time reduction on the CIFAR-10.
    Robustness Implies Generalization via Data-Dependent Generalization Bounds. (arXiv:2206.13497v3 [cs.LG] UPDATED)
    This paper proves that robustness implies generalization via data-dependent generalization bounds. As a result, robustness and generalization are shown to be connected closely in a data-dependent manner. Our bounds improve previous bounds in two directions, to solve an open problem that has seen little development since 2010. The first is to reduce the dependence on the covering number. The second is to remove the dependence on the hypothesis space. We present several examples, including ones for lasso and deep learning, in which our bounds are provably preferable. The experiments on real-world data and theoretical models demonstrate near-exponential improvements in various situations. To achieve these improvements, we do not require additional assumptions on the unknown distribution; instead, we only incorporate an observable and computable property of the training samples. A key technical innovation is an improved concentration bound for multinomial random variables that is of independent interest beyond robustness and generalization.
    SKILL-IL: Disentangling Skill and Knowledge in Multitask Imitation Learning. (arXiv:2205.03130v2 [cs.LG] UPDATED)
    In this work, we introduce a new perspective for learning transferable content in multi-task imitation learning. Humans are able to transfer skills and knowledge. If we can cycle to work and drive to the store, we can also cycle to the store and drive to work. We take inspiration from this and hypothesize the latent memory of a policy network can be disentangled into two partitions. These contain either the knowledge of the environmental context for the task or the generalizable skill needed to solve the task. This allows improved training efficiency and better generalization over previously unseen combinations of skills in the same environment, and the same task in unseen environments. We used the proposed approach to train a disentangled agent for two different multi-task IL environments. In both cases we out-performed the SOTA by 30% in task success rate. We also demonstrated this for navigation on a real robot.
    Multi-agent Databases via Independent Learning. (arXiv:2205.14323v2 [cs.DB] UPDATED)
    Machine learning is rapidly being used in database research to improve the effectiveness of numerous tasks included but not limited to query optimization, workload scheduling, physical design, etc. Currently, the research focus has been on replacing a single database component responsible for one task by its learning-based counterpart. However, query performance is not simply determined by the performance of a single component, but by the cooperation of multiple ones. As such, learned based database components need to collaborate during both training and execution in order to develop policies that meet end performance goals. Thus, the paper attempts to address the question "Is it possible to design a database consisting of various learned components that cooperatively work to improve end-to-end query latency?". To answer this question, we introduce MADB (Multi-Agent DB), a proof-of-concept system that incorporates a learned query scheduler and a learned query optimizer. MADB leverages a cooperative multi-agent reinforcement learning approach that allows the two components to exchange the context of their decisions with each other and collaboratively work towards reducing the query latency. Preliminary results demonstrate that MADB can outperform the non-cooperative integration of learned components.
    A Model of One-Shot Generalization. (arXiv:2205.14553v2 [cs.LG] UPDATED)
    We provide a theoretical framework to study a phenomenon that we call one-shot generalization. This phenomenon refers to the ability of an algorithm to perform transfer learning within a single task, meaning that it correctly classifies a test point that has a single exemplar in the training set. We propose a simple data model and use it to study this phenomenon in two ways. First, we prove a non-asymptotic base-line -- kernel methods based on nearest-neighbor classification cannot perform one-shot generalization, independently of the choice of the kernel and the size of the training set. Second, we empirically show that the most direct neural network architecture for our data model performs one-shot generalization almost perfectly. This stark differential leads us to believe that the one-shot generalization mechanism is partially responsible for the empirical success of neural networks.
    A Confident Deep Learning loss function for one-step Conformal Prediction approximation. (arXiv:2207.12377v2 [cs.LG] UPDATED)
    Deep Learning predictions with measurable confidence are increasingly desirable for real-world problems, especially in high-risk settings. The Conformal Prediction (CP) framework is a versatile solution that automatically guarantees a maximum error rate. However, CP suffers from computational inefficiencies that limit its application to large-scale datasets. In this paper, we propose a novel conformal loss function that approximates the traditionally two-step CP approach in a single step. By evaluating and penalising deviations from the stringent expected CP output distribution, a Deep Learning model may learn the direct relationship between input data and conformal p-values. Our approach achieves significant training time reductions up to 86% compared to Aggregated Conformal Prediction (ACP), an accepted CP approximation variant. In terms of approximate validity and predictive efficiency, we carry out a comprehensive empirical evaluation to show our novel loss function's competitiveness with ACP on the well-established MNIST dataset.
    Making Corgis Important for Honeycomb Classification: Adversarial Attacks on Concept-based Explainability Tools. (arXiv:2110.07120v2 [cs.LG] UPDATED)
    Methods for model explainability have become increasingly critical for testing the fairness and soundness of deep learning. Concept-based interpretability techniques, which use a small set of human-interpretable concept exemplars in order to measure the influence of a concept on a model's internal representation of input, are an important thread in this line of research. In this work we show that these explainability methods can suffer the same vulnerability to adversarial attacks as the models they are meant to analyze. We demonstrate this phenomenon on two well-known concept-based interpretability methods: TCAV and faceted feature visualization. We show that by carefully perturbing the examples of the concept that is being investigated, we can radically change the output of the interpretability method. The attacks that we propose can either induce positive interpretations (polka dots are an important concept for a model when classifying zebras) or negative interpretations (stripes are not an important factor in identifying images of a zebra). Our work highlights the fact that in safety-critical applications, there is need for security around not only the machine learning pipeline but also the model interpretation process.
    OCTAL: Graph Representation Learning for LTL Model Checking. (arXiv:2207.11649v2 [cs.PL] UPDATED)
    Model Checking is widely applied in verifying the correctness of complex and concurrent systems against a specification. Pure symbolic approaches while popular, still suffer from the state space explosion problem that makes them impractical for large scale systems and/or specifications. In this paper, we propose to use graph representation learning (GRL) for solving linear temporal logic (LTL) model checking, where the system and the specification are expressed by a B\"uchi automaton and an LTL formula respectively. A novel GRL-based framework OCTAL, is designed to learn the representation of the graph-structured system and specification, which reduces the model checking problem to binary classification in the latent space. The empirical experiments show that OCTAL achieves comparable accuracy against canonical SOTA model checkers on three different datasets, with up to $5\times$ overall speedup and above $63\times$ for satisfiability checking alone.
    Generative Subgraph Contrast for Self-Supervised Graph Representation Learning. (arXiv:2207.11996v2 [cs.LG] UPDATED)
    Contrastive learning has shown great promise in the field of graph representation learning. By manually constructing positive/negative samples, most graph contrastive learning methods rely on the vector inner product based similarity metric to distinguish the samples for graph representation. However, the handcrafted sample construction (e.g., the perturbation on the nodes or edges of the graph) may not effectively capture the intrinsic local structures of the graph. Also, the vector inner product based similarity metric cannot fully exploit the local structures of the graph to characterize the graph difference well. To this end, in this paper, we propose a novel adaptive subgraph generation based contrastive learning framework for efficient and robust self-supervised graph representation learning, and the optimal transport distance is utilized as the similarity metric between the subgraphs. It aims to generate contrastive samples by capturing the intrinsic structures of the graph and distinguish the samples based on the features and structures of subgraphs simultaneously. Specifically, for each center node, by adaptively learning relation weights to the nodes of the corresponding neighborhood, we first develop a network to generate the interpolated subgraph. We then construct the positive and negative pairs of subgraphs from the same and different nodes, respectively. Finally, we employ two types of optimal transport distances (i.e., Wasserstein distance and Gromov-Wasserstein distance) to construct the structured contrastive loss. Extensive node classification experiments on benchmark datasets verify the effectiveness of our graph contrastive learning method.
    BioADAPT-MRC: Adversarial Learning-based Domain Adaptation Improves Biomedical Machine Reading Comprehension Task. (arXiv:2202.13174v3 [cs.CL] UPDATED)
    Biomedical machine reading comprehension (biomedical-MRC) aims to comprehend complex biomedical narratives and assist healthcare professionals in retrieving information from them. The high performance of modern neural network-based MRC systems depends on high-quality, large-scale, human-annotated training datasets. In the biomedical domain, a crucial challenge in creating such datasets is the requirement for domain knowledge, inducing the scarcity of labeled data and the need for transfer learning from the labeled general-purpose (source) domain to the biomedical (target) domain. However, there is a discrepancy in marginal distributions between the general-purpose and biomedical domains due to the variances in topics. Therefore, direct-transferring of learned representations from a model trained on a general-purpose domain to the biomedical domain can hurt the model's performance. We present an adversarial learning-based domain adaptation framework for the biomedical machine reading comprehension task (BioADAPT-MRC), a neural network-based method to address the discrepancies in the marginal distributions between the general and biomedical domain datasets. BioADAPT-MRC relaxes the need for generating pseudo labels for training a well-performing biomedical-MRC model. We extensively evaluate the performance of BioADAPT-MRC by comparing it with the best existing methods on three widely used benchmark biomedical-MRC datasets -- BioASQ-7b, BioASQ-8b, and BioASQ-9b. Our results suggest that without using any synthetic or human-annotated data from the biomedical domain, BioADAPT-MRC can achieve state-of-the-art performance on these datasets. Availability: BioADAPT-MRC is freely available as an open-source project at \url{https://github.com/mmahbub/BioADAPT-MRC}.
    Discriminative Multimodal Learning via Conditional Priors in Generative Models. (arXiv:2110.04616v2 [cs.LG] UPDATED)
    Deep generative models with latent variables have been used lately to learn joint representations and generative processes from multi-modal data. These two learning mechanisms can, however, conflict with each other and representations can fail to embed information on the data modalities. This research studies the realistic scenario in which all modalities and class labels are available for model training, but where some modalities and labels required for downstream tasks are missing. We show, in this scenario, that the variational lower bound limits mutual information between joint representations and missing modalities. We, to counteract these problems, introduce a novel conditional multi-modal discriminative model that uses an informative prior distribution and optimizes a likelihood-free objective function that maximizes mutual information between joint representations and missing modalities. Extensive experimentation shows the benefits of the model we propose, the empirical results showing that our model achieves state-of-the-art results in representative problems such as downstream classification, acoustic inversion and annotation generation.
    Cooperative Behavior Planning for Automated Driving using Graph Neural Networks. (arXiv:2202.11376v2 [cs.RO] UPDATED)
    Urban intersections are prone to delays and inefficiencies due to static precedence rules and occlusions limiting the view on prioritized traffic. Existing approaches to improve traffic flow, widely known as automatic intersection management systems, are mostly based on non-learning reservation schemes or optimization algorithms. Machine learning-based techniques show promising results in planning for a single ego vehicle. This work proposes to leverage machine learning algorithms to optimize traffic flow at urban intersections by jointly planning for multiple vehicles. Learning-based behavior planning poses several challenges, demanding for a suited input and output representation as well as large amounts of ground-truth data. We address the former issue by using a flexible graph-based input representation accompanied by a graph neural network. This allows to efficiently encode the scene and inherently provide individual outputs for all involved vehicles. To learn a sensible policy, without relying on the imitation of expert demonstrations, the cooperative planning task is considered as a reinforcement learning problem. We train and evaluate the proposed method in an open-source simulation environment for decision making in automated driving. Compared to a first-in-first-out scheme and traffic governed by static priority rules, the learned planner shows a significant gain in flow rate, while reducing the number of induced stops. In addition to synthetic simulations, the approach is also evaluated based on real-world traffic data taken from the publicly available inD dataset.
    Sliced Recursive Transformer. (arXiv:2111.05297v3 [cs.CV] UPDATED)
    We present a neat yet effective recursive operation on vision transformers that can improve parameter utilization without involving additional parameters. This is achieved by sharing weights across the depth of transformer networks. The proposed method can obtain a substantial gain (~2%) simply using naive recursive operation, requires no special or sophisticated knowledge for designing principles of networks, and introduces minimal computational overhead to the training procedure. To reduce the additional computation caused by recursive operation while maintaining the superior accuracy, we propose an approximating method through multiple sliced group self-attentions across recursive layers which can reduce the cost consumption by 10~30% with minimal performance loss. We call our model Sliced Recursive Transformer (SReT), a novel and parameter-efficient vision transformer design that is compatible with a broad range of other designs for efficient ViT architectures. Our best model establishes significant improvement on ImageNet-1K over state-of-the-art methods while containing fewer parameters. The proposed weight sharing mechanism by sliced recursion structure allows us to build a transformer with more than 100 or even 1000 shared layers with ease while keeping a compact size (13~15M), to avoid optimization difficulties when the model is too large. The flexible scalability has shown great potential for scaling up models and constructing extremely deep vision transformers. Code is available at https://github.com/szq0214/SReT.
    Modeling Irregular Time Series with Continuous Recurrent Units. (arXiv:2111.11344v3 [cs.LG] UPDATED)
    Recurrent neural networks (RNNs) are a popular choice for modeling sequential data. Modern RNN architectures assume constant time-intervals between observations. However, in many datasets (e.g. medical records) observation times are irregular and can carry important information. To address this challenge, we propose continuous recurrent units (CRUs) -- a neural architecture that can naturally handle irregular intervals between observations. The CRU assumes a hidden state, which evolves according to a linear stochastic differential equation and is integrated into an encoder-decoder framework. The recursive computations of the CRU can be derived using the continuous-discrete Kalman filter and are in closed form. The resulting recurrent architecture has temporal continuity between hidden states and a gating mechanism that can optimally integrate noisy observations. We derive an efficient parameterization scheme for the CRU that leads to a fast implementation f-CRU. We empirically study the CRU on a number of challenging datasets and find that it can interpolate irregular time series better than methods based on neural ordinary differential equations.
    Rethinking Pareto Frontier for Performance Evaluation of Deep Neural Networks. (arXiv:2202.09275v4 [cs.LG] UPDATED)
    Performance optimization of deep learning models is conducted either manually or through automatic architecture search, or a combination of both. On the other hand, their performance strongly depends on the target hardware and how successfully the models were trained. We propose to use a multi-dimensional Pareto frontier to re-define the efficiency measure of candidate deep learning models, where several variables such as training cost, inference latency, and accuracy play a relative role in defining a dominant model. Furthermore, a random version of the multi-dimensional Pareto frontier is introduced to mitigate the uncertainty of accuracy, latency, and throughput of deep learning models in different experimental setups. These two complementary methods can be combined to perform objective benchmarking of deep learning models. Our proposed method is applied to a wide range of deep image classification models trained on ImageNet data. Our method combines competing variables with stochastic nature in a single relative efficiency measure. This allows ranking deep learning models that run efficiently on different hardware, and combining inference efficiency with training efficiency objectively.
    Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms. (arXiv:2004.02635v4 [math.OC] UPDATED)
    We consider minimizing the sum of three convex functions, where the first one F is smooth, the second one is nonsmooth and proximable and the third one is the composition of a nonsmooth proximable function with a linear operator L. This template problem has many applications, for instance, in image processing and machine learning. First, we propose a new primal-dual algorithm, which we call PDDY, for this problem. It is constructed by applying Davis-Yin splitting to a monotone inclusion in a primal-dual product space, where the operators are monotone under a specific metric depending on L. We show that three existing algorithms (the two forms of the Condat-Vu algorithm and the PD3O algorithm) have the same structure, so that PDDY is the fourth missing link in this self-consistent class of primal-dual algorithms. This representation eases the convergence analysis: it allows us to derive sublinear convergence rates in general, and linear convergence results in presence of strong convexity. Moreover, within our broad and flexible analysis framework, we propose new stochastic generalizations of the algorithms, in which a variance-reduced random estimate of the gradient of F is used, instead of the true gradient. Furthermore, we obtain, as a special case of PDDY, a linearly converging algorithm for the minimization of a strongly convex function F under a linear constraint; we discuss its important application to decentralized optimization.
    Few-Shot Domain Adaptation For End-to-End Communication. (arXiv:2108.00874v2 [cs.LG] UPDATED)
    The problem of end-to-end learning of a communication system using an autoencoder -- consisting of an encoder, channel, and decoder modeled using neural networks -- has recently been shown to be a promising approach. A challenge faced in the practical adoption of this learning approach is that under changing channel conditions (e.g. a wireless link), it requires frequent retraining of the autoencoder in order to maintain a low decoding error rate. Since retraining is both time consuming and requires a large number of samples, it becomes impractical when the channel distribution is changing quickly. We propose to address this problem using a fast and sample-efficient (few-shot) domain adaptation method that does not change the encoder and decoder networks. Different from conventional training-time unsupervised or semi-supervised domain adaptation, here we have a trained autoencoder from a source distribution, that we want to adapt (at test time) to a target distribution using only a small labeled dataset and no unlabeled data. Our method focuses on a Gaussian mixture density network based channel model, and formulates its adaptation based on class and component-conditional affine transformations. The learned affine transformations are used to design an optimal input transformation at the decoder to compensate for the distribution shift, and effectively present to the decoder inputs close to the source distribution. Experiments on a real mmWave FPGA setup as well as a number of simulated distribution changes common to the wireless setting demonstrate the effectiveness of our method at adaptation using very small number of target domain samples.
    Iterative Teaching by Label Synthesis. (arXiv:2110.14432v3 [cs.LG] UPDATED)
    In this paper, we consider the problem of iterative machine teaching, where a teacher provides examples sequentially based on the current iterative learner. In contrast to previous methods that have to scan over the entire pool and select teaching examples from it in each iteration, we propose a label synthesis teaching framework where the teacher randomly selects input teaching examples (e.g., images) and then synthesizes suitable outputs (e.g., labels) for them. We show that this framework can avoid costly example selection while still provably achieving exponential teachability. We propose multiple novel teaching algorithms in this framework. Finally, we empirically demonstrate the value of our framework.
    Guiding Visual Question Generation. (arXiv:2110.08226v3 [cs.LG] UPDATED)
    In traditional Visual Question Generation (VQG), most images have multiple concepts (e.g. objects and categories) for which a question could be generated, but models are trained to mimic an arbitrary choice of concept as given in their training data. This makes training difficult and also poses issues for evaluation -- multiple valid questions exist for most images but only one or a few are captured by the human references. We present Guiding Visual Question Generation - a variant of VQG which conditions the question generator on categorical information based on expectations on the type of question and the objects it should explore. We propose two variants: (i) an explicitly guided model that enables an actor (human or automated) to select which objects and categories to generate a question for; and (ii) an implicitly guided model that learns which objects and categories to condition on, based on discrete latent variables. The proposed models are evaluated on an answer-category augmented VQA dataset and our quantitative results show a substantial improvement over the current state of the art (over 9 BLEU-4 increase). Human evaluation validates that guidance helps the generation of questions that are grammatically coherent and relevant to the given image and objects.
    Verification-Aided Deep Ensemble Selection. (arXiv:2202.03898v2 [cs.LG] UPDATED)
    Deep neural networks (DNNs) have become the technology of choice for realizing a variety of complex tasks. However, as highlighted by many recent studies, even an imperceptible perturbation to a correctly classified input can lead to misclassification by a DNN. This renders DNNs vulnerable to strategic input manipulations by attackers, and also oversensitive to environmental noise. To mitigate this phenomenon, practitioners apply joint classification by an *ensemble* of DNNs. By aggregating the classification outputs of different individual DNNs for the same input, ensemble-based classification reduces the risk of misclassifications due to the specific realization of the stochastic training process of any single DNN. However, the effectiveness of a DNN ensemble is highly dependent on its members *not simultaneously erring* on many different inputs. In this case study, we harness recent advances in DNN verification to devise a methodology for identifying ensemble compositions that are less prone to simultaneous errors, even when the input is adversarially perturbed -- resulting in more robustly-accurate ensemble-based classification. Our proposed framework uses a DNN verifier as a backend, and includes heuristics that help reduce the high complexity of directly verifying ensembles. More broadly, our work puts forth a novel universal objective for formal verification that can potentially improve the robustness of real-world, deep-learning-based systems across a variety of application domains.
    Modeling Financial Products and their Supply Chains. (arXiv:2102.02329v2 [cs.LG] UPDATED)
    The objective of this paper is to explore how financial big data and machine learning methods can be applied to model and understand financial products. We focus on residential mortgage backed securities, resMBS, which were at the heart of the 2008 US financial crisis. These securities are contained within a prospectus and have a complex waterfall payoff structure. Multiple financial institutions form a supply chain to create prospectuses. To model this supply chain, we use unsupervised probabilistic methods, particularly dynamic topics models (DTM), to extract a set of features (topics) reflecting community formation and temporal evolution along the chain. We then provide insight into the performance of the resMBS securities and the impact of the supply chain through a series of increasingly comprehensive models. First, models at the security level directly identify salient features of resMBS securities that impact their performance. We then extend the model to include prospectus level features and demonstrate that the composition of the prospectus is significant. Our model also shows that communities along the supply chain that are associated with the generation of the prospectuses and securities have an impact on performance. We are the first to show that toxic communities that are closely linked to financial institutions that played a key role in the subprime crisis can increase the risk of failure of resMBS securities.
    PIXEL: Physics-Informed Cell Representations for Fast and Accurate PDE Solvers. (arXiv:2207.12800v1 [cs.LG])
    With the increases in computational power and advances in machine learning, data-driven learning-based methods have gained significant attention in solving PDEs. Physics-informed neural networks (PINNs) have recently emerged and succeeded in various forward and inverse PDEs problems thanks to their excellent properties, such as flexibility, mesh-free solutions, and unsupervised training. However, their slower convergence speed and relatively inaccurate solutions often limit their broader applicability in many science and engineering domains. This paper proposes a new kind of data-driven PDEs solver, physics-informed cell representations (PIXEL), elegantly combining classical numerical methods and learning-based approaches. We adopt a grid structure from the numerical methods to improve accuracy and convergence speed and overcome the spectral bias presented in PINNs. Moreover, the proposed method enjoys the same benefits in PINNs, e.g., using the same optimization frameworks to solve both forward and inverse PDE problems and readily enforcing PDE constraints with modern automatic differentiation techniques. We provide experimental results on various challenging PDEs that the original PINNs have struggled with and show that PIXEL achieves fast convergence speed and high accuracy.
    Solution of Physics-based Bayesian Inverse Problems with Deep Generative Priors. (arXiv:2107.02926v2 [stat.ML] UPDATED)
    Inverse problems are ubiquitous in nature, arising in almost all areas of science and engineering ranging from geophysics and climate science to astrophysics and biomechanics. One of the central challenges in solving inverse problems is tackling their ill-posed nature. Bayesian inference provides a principled approach for overcoming this by formulating the inverse problem into a statistical framework. However, it is challenging to apply when inferring fields that have discrete representations of large dimensions (the so-called "curse of dimensionality") and/or when prior information is available only in the form of previously acquired solutions. In this work, we present a novel method for efficient and accurate Bayesian inversion using deep generative models. Specifically, we demonstrate how using the approximate distribution learned by a Generative Adversarial Network (GAN) as a prior in a Bayesian update and reformulating the resulting inference problem in the low-dimensional latent space of the GAN, enables the efficient solution of large-scale Bayesian inverse problems. Our statistical framework preserves the underlying physics and is demonstrated to yield accurate results with reliable uncertainty estimates, even in the absence of information about underlying noise model, which is a significant challenge with many existing methods. We demonstrate the effectiveness of proposed method on a variety of inverse problems which include both synthetic as well as experimentally observed data.
    From Interpretable Filters to Predictions of Convolutional Neural Networks with Explainable Artificial Intelligence. (arXiv:2207.12958v1 [cs.LG])
    Convolutional neural networks (CNN) are known for their excellent feature extraction capabilities to enable the learning of models from data, yet are used as black boxes. An interpretation of the convolutional filtres and associated features can help to establish an understanding of CNN to distinguish various classes. In this work, we focus on the explainability of a CNN model called as cnnexplain that is used for Covid-19 and non-Covid-19 classification with a focus on the interpretability of features by the convolutional filters, and how these features contribute to classification. Specifically, we have used various explainable artificial intelligence (XAI) methods, such as visualizations, SmoothGrad, Grad-CAM, and LIME to provide interpretation of convolutional filtres, and relevant features, and their role in classification. We have analyzed the explanation of these methods for Covid-19 detection using dry cough spectrograms. Explanation results obtained from the LIME, SmoothGrad, and Grad-CAM highlight important features of different spectrograms and their relevance to classification.
    Finding Deep-Learning Compilation Bugs with NNSmith. (arXiv:2207.13066v1 [cs.LG])
    Deep-learning (DL) compilers such as TVM and TensorRT are increasingly used to optimize deep neural network (DNN) models to meet performance, resource utilization and other requirements. Bugs in these compilers can produce optimized models whose semantics differ from the original models, and produce incorrect results impacting the correctness of down stream applications. However, finding bugs in these compilers is challenging due to their complexity. In this work, we propose a new fuzz testing approach for finding bugs in deep-learning compilers. Our core approach uses (i) light-weight operator specifications to generate diverse yet valid DNN models allowing us to exercise a large part of the compiler's transformation logic; (ii) a gradient-based search process for finding model inputs that avoid any floating-point exceptional values during model execution, reducing the chance of missed bugs or false alarms; and (iii) differential testing to identify bugs. We implemented this approach in NNSmith which has found 65 new bugs in the last seven months for TVM, TensorRT, ONNXRuntime, and PyTorch. Of these 52 have been confirmed and 44 have been fixed by project maintainers.
    Sharp Concentration Results for Heavy-Tailed Distributions. (arXiv:2003.13819v3 [math.PR] UPDATED)
    We obtain concentration and large deviation for the sums of independent and identically distributed random variables with heavy-tailed distributions. Our concentration results are concerned with random variables whose distributions satisfy $\mathbb{P}(X>t) \leq {\rm e}^{- I(t)}$, where $I: \mathbb{R} \rightarrow \mathbb{R}$ is an increasing function and $I(t)/t \rightarrow \alpha \in [0, \infty)$ as $t \rightarrow \infty$. Our main theorem can not only recover some of the existing results, such as the concentration of the sum of subWeibull random variables, but it can also produce new results for the sum of random variables with heavier tails. We show that the concentration inequalities we obtain are sharp enough to offer large deviation results for the sums of independent random variables as well. Our analyses which are based on standard truncation arguments simplify, unify and generalize the existing results on the concentration and large deviation of heavy-tailed random variables.
    Sparse Signal Models for Data Augmentation in Deep Learning ATR. (arXiv:2012.09284v2 [cs.CV] UPDATED)
    Automatic Target Recognition (ATR) algorithms classify a given Synthetic Aperture Radar (SAR) image into one of the known target classes using a set of training images available for each class. Recently, learning methods have shown to achieve state-of-the-art classification accuracy if abundant training data is available, sampled uniformly over the classes, and their poses. In this paper, we consider the task of ATR with a limited set of training images. We propose a data augmentation approach to incorporate domain knowledge and improve the generalization power of a data-intensive learning algorithm, such as a Convolutional neural network (CNN). The proposed data augmentation method employs a limited persistence sparse modeling approach, capitalizing on commonly observed characteristics of wide-angle synthetic aperture radar (SAR) imagery. Specifically, we exploit the sparsity of the scattering centers in the spatial domain and the smoothly-varying structure of the scattering coefficients in the azimuthal domain to solve the ill-posed problem of over-parametrized model fitting. Using this estimated model, we synthesize new images at poses and sub-pixel translations not available in the given data to augment CNN's training data. The experimental results show that for the training data starved region, the proposed method provides a significant gain in the resulting ATR algorithm's generalization performance.
    Efficient Learning of Accurate Surrogates for Simulations of Complex Systems. (arXiv:2207.12855v1 [cs.LG])
    Machine learning methods are increasingly used to build computationally inexpensive surrogates for complex physical models. The predictive capability of these surrogates suffers when data are noisy, sparse, or time-dependent. As we are interested in finding a surrogate that provides valid predictions of any potential future model evaluations, we introduce an online learning method empowered by optimizer-driven sampling. The method has two advantages over current approaches. First, it ensures that all turning points on the model response surface are included in the training data. Second, after any new model evaluations, surrogates are tested and "retrained" (updated) if the "score" drops below a validity threshold. Tests on benchmark functions reveal that optimizer-directed sampling generally outperforms traditional sampling methods in terms of accuracy around local extrema, even when the scoring metric favors overall accuracy. We apply our method to simulations of nuclear matter to demonstrate that highly accurate surrogates for the nuclear equation of state can be reliably auto-generated from expensive calculations using a few model evaluations.
    Learning Bipedal Walking On Planned Footsteps For Humanoid Robots. (arXiv:2207.12644v1 [cs.RO])
    Deep reinforcement learning (RL) based controllers for legged robots have demonstrated impressive robustness for walking in different environments for several robot platforms. To enable the application of RL policies for humanoid robots in real-world settings, it is crucial to build a system that can achieve robust walking in any direction, on 2D and 3D terrains, and be controllable by a user-command. In this paper, we tackle this problem by learning a policy to follow a given step sequence. The policy is trained with the help of a set of procedurally generated step sequences (also called footstep plans). We show that simply feeding the upcoming 2 steps to the policy is sufficient to achieve omnidirectional walking, turning in place, standing, and climbing stairs. Our method employs curriculum learning on the complexity of terrains, and circumvents the need for reference motions or pre-trained weights. We demonstrate the application of our proposed method to learn RL policies for 2 new robot platforms - HRP5P and JVRC-1 - in the MuJoCo simulation environment. The code for training and evaluation is available online.
    Learning-Augmented Maximum Flow. (arXiv:2207.12911v1 [cs.DS])
    We propose a framework for speeding up maximum flow computation by using predictions. A prediction is a flow, i.e., an assignment of non-negative flow values to edges, which satisfies the flow conservation property, but does not necessarily respect the edge capacities of the actual instance (since these were unknown at the time of learning). We present an algorithm that, given an $m$-edge flow network and a predicted flow, computes a maximum flow in $O(m\eta)$ time, where $\eta$ is the $\ell_1$ error of the prediction, i.e., the sum over the edges of the absolute difference between the predicted and optimal flow values. Moreover, we prove that, given an oracle access to a distribution over flow networks, it is possible to efficiently PAC-learn a prediction minimizing the expected $\ell_1$ error over that distribution. Our results fit into the recent line of research on learning-augmented algorithms, which aims to improve over worst-case bounds of classical algorithms by using predictions, e.g., machine-learned from previous similar instances. So far, the main focus in this area was on improving competitive ratios for online problems. Following Dinitz et al. (NeurIPS 2021), our results are one of the firsts to improve the running time of an offline problem.
    Is Attention Interpretation? A Quantitative Assessment On Sets. (arXiv:2207.13018v1 [cs.LG])
    The debate around the interpretability of attention mechanisms is centered on whether attention scores can be used as a proxy for the relative amounts of signal carried by sub-components of data. We propose to study the interpretability of attention in the context of set machine learning, where each data point is composed of an unordered collection of instances with a global label. For classical multiple-instance-learning problems and simple extensions, there is a well-defined "importance" ground truth that can be leveraged to cast interpretation as a binary classification problem, which we can quantitatively evaluate. By building synthetic datasets over several data modalities, we perform a systematic assessment of attention-based interpretations. We find that attention distributions are indeed often reflective of the relative importance of individual instances, but that silent failures happen where a model will have high classification performance but attention patterns that do not align with expectations. Based on these observations, we propose to use ensembling to minimize the risk of misleading attention-based explanations.
    Offline Reinforcement Learning at Multiple Frequencies. (arXiv:2207.13082v1 [cs.LG])
    Leveraging many sources of offline robot data requires grappling with the heterogeneity of such data. In this paper, we focus on one particular aspect of heterogeneity: learning from offline data collected at different control frequencies. Across labs, the discretization of controllers, sampling rates of sensors, and demands of a task of interest may differ, giving rise to a mixture of frequencies in an aggregated dataset. We study how well offline reinforcement learning (RL) algorithms can accommodate data with a mixture of frequencies during training. We observe that the $Q$-value propagates at different rates for different discretizations, leading to a number of learning challenges for off-the-shelf offline RL. We present a simple yet effective solution that enforces consistency in the rate of $Q$-value updates to stabilize learning. By scaling the value of $N$ in $N$-step returns with the discretization size, we effectively balance $Q$-value propagation, leading to more stable convergence. On three simulated robotic control problems, we empirically find that this simple approach outperforms na\"ive mixing by 50% on average.
    Buffer Pool Aware Query Scheduling via Deep Reinforcement Learning. (arXiv:2007.10568v2 [cs.DB] UPDATED)
    In this extended abstract, we propose a new technique for query scheduling with the explicit goal of reducing disk reads and thus implicitly increasing query performance. We introduce SmartQueue, a learned scheduler that leverages overlapping data reads among incoming queries and learns a scheduling strategy that improves cache hits. \system relies on deep reinforcement learning to produce workload-specific scheduling strategies that focus on long-term performance benefits while being adaptive to previously-unseen data access patterns. We present results from a proof-of-concept prototype, demonstrating that learned schedulers can offer significant performance improvements over hand-crafted scheduling heuristics. Ultimately, we make the case that this is a promising research direction at the intersection of machine learning and databases.
    A Guide to Image and Video based Small Object Detection using Deep Learning : Case Study of Maritime Surveillance. (arXiv:2207.12926v1 [cs.CV])
    Small object detection (SOD) in optical images and videos is a challenging problem that even state-of-the-art generic object detection methods fail to accurately localize and identify such objects. Typically, small objects appear in real-world due to large camera-object distance. Because small objects occupy only a small area in the input image (e.g., less than 10%), the information extracted from such a small area is not always rich enough to support decision making. Multidisciplinary strategies are being developed by researchers working at the interface of deep learning and computer vision to enhance the performance of SOD deep learning based methods. In this paper, we provide a comprehensive review of over 160 research papers published between 2017 and 2022 in order to survey this growing subject. This paper summarizes the existing literature and provide a taxonomy that illustrates the broad picture of current research. We investigate how to improve the performance of small object detection in maritime environments, where increasing performance is critical. By establishing a connection between generic and maritime SOD research, future directions have been identified. In addition, the popular datasets that have been used for SOD for generic and maritime applications are discussed, and also well-known evaluation metrics for the state-of-the-art methods on some of the datasets are provided.
    Neural Design for Genetic Perturbation Experiments. (arXiv:2207.12805v1 [q-bio.QM])
    The problem of how to genetically modify cells in order to maximize a certain cellular phenotype has taken center stage in drug development over the last few years (with, for example, genetically edited CAR-T, CAR-NK, and CAR-NKT cells entering cancer clinical trials). Exhausting the search space for all possible genetic edits (perturbations) or combinations thereof is infeasible due to cost and experimental limitations. This work provides a theoretically sound framework for iteratively exploring the space of perturbations in pooled batches in order to maximize a target phenotype under an experimental budget. Inspired by this application domain, we study the problem of batch query bandit optimization and introduce the Optimistic Arm Elimination ($\mathrm{OAE}$) principle designed to find an almost optimal arm under different functional relationships between the queries (arms) and the outputs (rewards). We analyze the convergence properties of $\mathrm{OAE}$ by relating it to the Eluder dimension of the algorithm's function class and validate that $\mathrm{OAE}$ outperforms other strategies in finding optimal actions in experiments on simulated problems, public datasets well-studied in bandit contexts, and in genetic perturbation datasets when the regression model is a deep neural network. OAE also outperforms the benchmark algorithms in 3 of 4 datasets in the GeneDisco experimental planning challenge.
    Contrastive Attraction and Contrastive Repulsion for Representation Learning. (arXiv:2105.03746v3 [cs.LG] UPDATED)
    Contrastive learning (CL) methods effectively learn data representations without label supervision, where the encoder contrasts each positive sample over multiple negative samples via a one-vs-many softmax cross-entropy loss. By leveraging large amounts of unlabeled image data, recent CL methods have achieved promising results when pre-trained on ImageNet, a well-curated data set with balanced image classes. However, they tend to yield worse performance when pre-trained on images in the wild. In this paper, to further improve the performance of CL and enhance its robustness on uncurated data sets, we propose a doubly CL strategy that contrasts the positive (negative) samples of a query within themselves before deciding how strongly to pull (push) them. We realize this strategy with contrastive attraction and contrastive repulsion (CACR), which makes the query not only exert a greater force to attract more distant positive samples but also do so to repel closer negative samples. Theoretical analysis reveals that CACR generalizes CL's behavior by taking into consideration the differences between the distributions of the positive/negative samples, which in general are sampled independently of the query, and their true conditional distributions given the query. We demonstrate this unique intra-positive attraction and intra-negative repulsion mechanism, which helps remove the need to assume uniform prior distributions on both the data and their latent representation, is particularly beneficial when data sets are less curated. Extensive large-scale experiments on a number of standard vision tasks show that CACR not only consistently outperforms existing CL methods on benchmark data sets in representation learning, but also shows better robustness when generalized to pre-training on imbalanced image data sets.
    Active Learning of Ordinal Embeddings: A User Study on Football Data. (arXiv:2207.12710v1 [cs.LG])
    Humans innately measure distance between instances in an unlabeled dataset using an unknown similarity function. Distance metrics can only serve as proxy for similarity in information retrieval of similar instances. Learning a good similarity function from human annotations improves the quality of retrievals. This work uses deep metric learning to learn these user-defined similarity functions from few annotations for a large football trajectory dataset. We adapt an entropy-based active learning method with recent work from triplet mining to collect easy-to-answer but still informative annotations from human participants and use them to train a deep convolutional network that generalizes to unseen samples. Our user study shows that our approach improves the quality of the information retrieval compared to a previous deep metric learning approach that relies on a Siamese network. Specifically, we shed light on the strengths and weaknesses of passive sampling heuristics and active learners alike by analyzing the participants' response efficacy. To this end, we collect accuracy, algorithmic time complexity, the participants' fatigue and time-to-response, qualitative self-assessment and statements, as well as the effects of mixed-expertise annotators and their consistency on model performance and transfer-learning.
    Implementation Of Tiny Machine Learning Models On Arduino 33 BLE For Gesture And Speech Recognition. (arXiv:2207.12866v1 [eess.AS])
    In this article gesture recognition and speech recognition applications are implemented on embedded systems with Tiny Machine Learning (TinyML). It features 3-axis accelerometer, 3-axis gyroscope and 3-axis magnetometer. The gesture recognition,provides an innovative approach nonverbal communication. It has wide applications in human-computer interaction and sign language. Here in the implementation of hand gesture recognition, TinyML model is trained and deployed from EdgeImpulse framework for hand gesture recognition and based on the hand movements, Arduino Nano 33 BLE device having 6-axis IMU can find out the direction of movement of hand. The Speech is a mode of communication. Speech recognition is a way by which the statements or commands of human speech is understood by the computer which reacts accordingly. The main aim of speech recognition is to achieve communication between man and machine. Here in the implementation of speech recognition, TinyML model is trained and deployed from EdgeImpulse framework for speech recognition and based on the keywords pronounced by human, Arduino Nano 33 BLE device having built-in microphone can make an RGB LED glow like red, green or blue based on keyword pronounced. The results of each application are obtained and listed in the results section and given the analysis upon the results.
    Can Deep Learning Assist Automatic Identification of Layered Pigments From XRF Data?. (arXiv:2207.12651v1 [cs.CV])
    X-ray fluorescence spectroscopy (XRF) plays an important role for elemental analysis in a wide range of scientific fields, especially in cultural heritage. XRF imaging, which uses a raster scan to acquire spectra across artworks, provides the opportunity for spatial analysis of pigment distributions based on their elemental composition. However, conventional XRF-based pigment identification relies on time-consuming elemental mapping by expert interpretations of measured spectra. To reduce the reliance on manual work, recent studies have applied machine learning techniques to cluster similar XRF spectra in data analysis and to identify the most likely pigments. Nevertheless, it is still challenging for automatic pigment identification strategies to directly tackle the complex structure of real paintings, e.g. pigment mixtures and layered pigments. In addition, pixel-wise pigment identification based on XRF imaging remains an obstacle due to the high noise level compared with averaged spectra. Therefore, we developed a deep-learning-based end-to-end pigment identification framework to fully automate the pigment identification process. In particular, it offers high sensitivity to the underlying pigments and to the pigments with a low concentration, therefore enabling satisfying results in mapping the pigments based on single-pixel XRF spectrum. As case studies, we applied our framework to lab-prepared mock-up paintings and two 19th-century paintings: Paul Gauguin's Po\`emes Barbares (1896) that contains layered pigments with an underlying painting, and Paul Cezanne's The Bathers (1899-1904). The pigment identification results demonstrated that our model achieved comparable results to the analysis by elemental mapping, suggesting the generalizability and stability of our model.
    Federated Learning with Positive and Unlabeled Data. (arXiv:2106.10904v2 [cs.LG] UPDATED)
    We study the problem of learning from positive and unlabeled (PU) data in the federated setting, where each client only labels a little part of their dataset due to the limitation of resources and time. Different from the settings in traditional PU learning where the negative class consists of a single class, the negative samples which cannot be identified by a client in the federated setting may come from multiple classes which are unknown to the client. Therefore, existing PU learning methods can be hardly applied in this situation. To address this problem, we propose a novel framework, namely Federated learning with Positive and Unlabeled data (FedPU), to minimize the expected risk of multiple negative classes by leveraging the labeled data in other clients. We theoretically analyze the generalization bound of the proposed FedPU. Empirical experiments show that the FedPU can achieve much better performance than conventional supervised and semi-supervised federated learning methods.
    CFLIT: Coexisting Federated Learning and Information Transfer. (arXiv:2207.12884v1 [cs.IT])
    Future wireless networks are expected to support diverse mobile services, including artificial intelligence (AI) services and ubiquitous data transmissions. Federated learning (FL), as a revolutionary learning approach, enables collaborative AI model training across distributed mobile edge devices. By exploiting the superposition property of multiple-access channels, over-the-air computation allows concurrent model uploading from massive devices over the same radio resources, and thus significantly reduces the communication cost of FL. In this paper, we study the coexistence of over-the-air FL and traditional information transfer (IT) in a mobile edge network. We propose a coexisting federated learning and information transfer (CFLIT) communication framework, where the FL and IT devices share the wireless spectrum in an OFDM system. Under this framework, we aim to maximize the IT data rate and guarantee a given FL convergence performance by optimizing the long-term radio resource allocation. A key challenge that limits the spectrum efficiency of the coexisting system lies in the large overhead incurred by frequent communication between the server and edge devices for FL model aggregation. To address the challenge, we rigorously analyze the impact of the computation-to-communication ratio on the convergence of over-the-air FL in wireless fading channels. The analysis reveals the existence of an optimal computation-to-communication ratio that minimizes the amount of radio resources needed for over-the-air FL to converge to a given error tolerance. Based on the analysis, we propose a low-complexity online algorithm to jointly optimize the radio resource allocation for both the FL devices and IT devices. Extensive numerical simulations verify the superior performance of the proposed design for the coexistence of FL and IT devices in wireless cellular systems.
    Collaborative Three-Tier Architecture Non-contact Respiratory Rate Monitoring using Target Tracking and False Peaks Eliminating Algorithms. (arXiv:2011.08482v4 [cs.RO] UPDATED)
    Monitoring the respiratory rate is crucial for helping us identify respiratory disorders. Devices for conventional respiratory monitoring are inconvenient and scarcely available. Recent research has demonstrated the ability of non-contact technologies, such as photoplethysmography and infrared thermography, to gather respiratory signals from the face and monitor breathing. However, the current non-contact respiratory monitoring techniques have poor accuracy because they are sensitive to environmental influences like lighting and motion artifacts. Furthermore, frequent contact between users and the cloud in real-world medical application settings might cause service request delays and potentially the loss of personal data. We proposed a non-contact respiratory rate monitoring system with a cooperative three-layer design to increase the precision of respiratory monitoring and decrease data transmission latency. To reduce data transmission and network latency, our three-tier architecture layer-by-layer decomposes the computing tasks of respiration monitoring. Moreover, we improved the accuracy of respiratory monitoring by designing a target tracking algorithm and an algorithm for eliminating false peaks to extract high-quality respiratory signals. By gathering the data and choosing several regions of interest on the face, we were able to extract the respiration signal and investigate how different regions affected the monitoring of respiration. The results of the experiment indicate that when the nasal region is used to extract the respiratory signal, it performs experimentally best. Our approach performs better than rival approaches while transferring fewer data.
    Efficient and Accurate Skeleton-Based Two-Person Interaction Recognition Using Inter- and Intra-body Graphs. (arXiv:2207.12648v1 [cs.CV])
    Skeleton-based two-person interaction recognition has been gaining increasing attention as advancements are made in pose estimation and graph convolutional networks. Although the accuracy has been gradually improving, the increasing computational complexity makes it more impractical for a real-world environment. There is still room for accuracy improvement as the conventional methods do not fully represent the relationship between inter-body joints. In this paper, we propose a lightweight model for accurately recognizing two-person interactions. In addition to the architecture, which incorporates middle fusion, we introduce a factorized convolution technique to reduce the weight parameters of the model. We also introduce a network stream that accounts for relative distance changes between inter-body joints to improve accuracy. Experiments using two large-scale datasets, NTU RGB+D 60 and 120, show that our method simultaneously achieved the highest accuracy and relatively low computational complexity compared with the conventional methods.
    Lifelong DP: Consistently Bounded Differential Privacy in Lifelong Machine Learning. (arXiv:2207.12831v1 [cs.LG])
    In this paper, we show that the process of continually learning new tasks and memorizing previous tasks introduces unknown privacy risks and challenges to bound the privacy loss. Based upon this, we introduce a formal definition of Lifelong DP, in which the participation of any data tuples in the training set of any tasks is protected, under a consistently bounded DP protection, given a growing stream of tasks. A consistently bounded DP means having only one fixed value of the DP privacy budget, regardless of the number of tasks. To preserve Lifelong DP, we propose a scalable and heterogeneous algorithm, called L2DP-ML with a streaming batch training, to efficiently train and continue releasing new versions of an L2M model, given the heterogeneity in terms of data sizes and the training order of tasks, without affecting DP protection of the private training set. An end-to-end theoretical analysis and thorough evaluations show that our mechanism is significantly better than baseline approaches in preserving Lifelong DP. The implementation of L2DP-ML is available at: https://github.com/haiphanNJIT/PrivateDeepLearning.
    $\textbf{P$^2$A}$: A Dataset and Benchmark for Dense Action Detection from Table Tennis Match Broadcasting Videos. (arXiv:2207.12730v1 [cs.CV])
    While deep learning has been widely used for video analytics, such as video classification and action detection, dense action detection with fast-moving subjects from sports videos is still challenging. In this work, we release yet another sports video dataset $\textbf{P$^2$A}$ for $\underline{P}$ing $\underline{P}$ong-$\underline{A}$ction detection, which consists of 2,721 video clips collected from the broadcasting videos of professional table tennis matches in World Table Tennis Championships and Olympiads. We work with a crew of table tennis professionals and referees to obtain fine-grained action labels (in 14 classes) for every ping-pong action that appeared in the dataset and formulate two sets of action detection problems - action localization and action recognition. We evaluate a number of commonly-seen action recognition (e.g., TSM, TSN, Video SwinTransformer, and Slowfast) and action localization models (e.g., BSN, BSN++, BMN, TCANet), using $\textbf{P$^2$A}$ for both problems, under various settings. These models can only achieve 48% area under the AR-AN curve for localization and 82% top-one accuracy for recognition since the ping-pong actions are dense with fast-moving subjects but broadcasting videos are with only 25 FPS. The results confirm that $\textbf{P$^2$A}$ is still a challenging task and can be used as a benchmark for action detection from videos.
    S-Prompts Learning with Pre-trained Transformers: An Occam's Razor for Domain Incremental Learning. (arXiv:2207.12819v1 [cs.CV])
    State-of-the-art deep neural networks are still struggling to address the catastrophic forgetting problem in continual learning. In this paper, we propose one simple paradigm (named as S-Prompting) and two concrete approaches to highly reduce the forgetting degree in one of the most typical continual learning scenarios, i.e., domain increment learning (DIL). The key idea of the paradigm is to learn prompts independently across domains with pre-trained transformers, avoiding the use of exemplars that commonly appear in conventional methods. This results in a win-win game where the prompting can achieve the best for each domain. The independent prompting across domains only requests one single cross-entropy loss for training and one simple K-NN operation as a domain identifier for inference. The learning paradigm derives an image prompt learning approach and a brand-new language-image prompt learning approach. Owning an excellent scalability (0.03% parameter increase per domain), the best of our approaches achieves a remarkable relative improvement (an average of about 30%) over the best of the state-of-the-art exemplar-free methods for three standard DIL tasks, and even surpasses the best of them relatively by about 6% in average when they use exemplars.
    Exploring the Design of Adaptation Protocols for Improved Generalization and Machine Learning Safety. (arXiv:2207.12615v1 [cs.LG])
    While directly fine-tuning (FT) large-scale, pretrained models on task-specific data is well-known to induce strong in-distribution task performance, recent works have demonstrated that different adaptation protocols, such as linear probing (LP) prior to FT, can improve out-of-distribution generalization. However, the design space of such adaptation protocols remains under-explored and the evaluation of such protocols has primarily focused on distribution shifts. Therefore, in this work, we evaluate common adaptation protocols across distributions shifts and machine learning safety metrics (e.g., anomaly detection, calibration, robustness to corruptions). We find that protocols induce disparate trade-offs that were not apparent from prior evaluation. Further, we demonstrate that appropriate pairing of data augmentation and protocol can substantially mitigate this trade-off. Finally, we hypothesize and empirically see that using hardness-promoting augmentations during LP and then FT with augmentations may be particularly effective for trade-off mitigation.
    Task Agnostic and Post-hoc Unseen Distribution Detection. (arXiv:2207.13083v1 [cs.LG])
    Despite the recent advances in out-of-distribution(OOD) detection, anomaly detection, and uncertainty estimation tasks, there do not exist a task-agnostic and post-hoc approach. To address this limitation, we design a novel clustering-based ensembling method, called Task Agnostic and Post-hoc Unseen Distribution Detection (TAPUDD) that utilizes the features extracted from the model trained on a specific task. Explicitly, it comprises of TAP-Mahalanobis, which clusters the training datasets' features and determines the minimum Mahalanobis distance of the test sample from all clusters. Further, we propose the Ensembling module that aggregates the computation of iterative TAP-Mahalanobis for a different number of clusters to provide reliable and efficient cluster computation. Through extensive experiments on synthetic and real-world datasets, we observe that our approach can detect unseen samples effectively across diverse tasks and performs better or on-par with the existing baselines. To this end, we eliminate the necessity of determining the optimal value of the number of clusters and demonstrate that our method is more viable for large-scale classification tasks.
    Representing Random Utility Choice Models with Neural Networks. (arXiv:2207.12877v1 [cs.LG])
    Motivated by the successes of deep learning, we propose a class of neural network-based discrete choice models, called RUMnets, which is inspired by the random utility maximization (RUM) framework. This model formulates the agents' random utility function using the sample average approximation (SAA) method. We show that RUMnets sharply approximate the class of RUM discrete choice models: any model derived from random utility maximization has choice probabilities that can be approximated arbitrarily closely by a RUMnet. Reciprocally, any RUMnet is consistent with the RUM principle. We derive an upper bound on the generalization error of RUMnets fitted on choice data, and gain theoretical insights on their ability to predict choices on new, unseen data depending on critical parameters of the dataset and architecture. By leveraging open-source libraries for neural networks, we find that RUMnets outperform other state-of-the-art choice modeling and machine learning methods by a significant margin on two real-world datasets.
    Coronavirus disease situation analysis and prediction using machine learning: a study on Bangladeshi population. (arXiv:2207.13056v1 [cs.LG])
    During a pandemic, early prognostication of patient infected rates can reduce the death by ensuring treatment facility and proper resource allocation. In recent months, the number of death and infected rates has increased more distinguished than before in Bangladesh. The country is struggling to provide moderate medical treatment to many patients. This study distinguishes machine learning models and creates a prediction system to anticipate the infected and death rate for the coming days. Equipping a dataset with data from March 1, 2020, to August 10, 2021, a multi-layer perceptron (MLP) model was trained. The data was managed from a trusted government website and concocted manually for training purposes. Several test cases determine the model's accuracy and prediction capability. The comparison between specific models assumes that the MLP model has more reliable prediction capability than the support vector regression (SVR) and linear regression model. The model presents a report about the risky situation and impending coronavirus disease (COVID-19) attack. According to the prediction produced by the model, Bangladesh may suffer another COVID-19 attack, where the number of infected cases can be between 929 to 2443 and death cases between 19 to 57.
    DeFakePro: Decentralized DeepFake Attacks Detection using ENF Authentication. (arXiv:2207.13070v1 [cs.CR])
    Advancements in generative models, like Deepfake allows users to imitate a targeted person and manipulate online interactions. It has been recognized that disinformation may cause disturbance in society and ruin the foundation of trust. This article presents DeFakePro, a decentralized consensus mechanism-based Deepfake detection technique in online video conferencing tools. Leveraging Electrical Network Frequency (ENF), an environmental fingerprint embedded in digital media recording, affords a consensus mechanism design called Proof-of-ENF (PoENF) algorithm. The similarity in ENF signal fluctuations is utilized in the PoENF algorithm to authenticate the media broadcasted in conferencing tools. By utilizing the video conferencing setup with malicious participants to broadcast deep fake video recordings to other participants, the DeFakePro system verifies the authenticity of the incoming media in both audio and video channels.
    Enhancing Collaborative Filtering Recommender with Prompt-Based Sentiment Analysis. (arXiv:2207.12883v1 [cs.IR])
    Collaborative Filtering(CF) recommender is a crucial application in the online market and ecommerce. However, CF recommender has been proven to suffer from persistent problems related to sparsity of the user rating that will further lead to a cold-start issue. Existing methods address the data sparsity issue by applying token-level sentiment analysis that translate text review into sentiment scores as a complement of the user rating. In this paper, we attempt to optimize the sentiment analysis with advanced NLP models including BERT and RoBERTa, and experiment on whether the CF recommender has been further enhanced. We build the recommenders on the Amazon US Reviews dataset, and tune the pretrained BERT and RoBERTa with the traditional fine-tuned paradigm as well as the new prompt-based learning paradigm. Experimental result shows that the recommender enhanced with the sentiment ratings predicted by the fine-tuned RoBERTa has the best performance, and achieved 30.7% overall gain by comparing MAP, NDCG and precision at K to the baseline recommender. Prompt-based learning paradigm, although superior to traditional fine-tune paradigm in pure sentiment analysis, fail to further improve the CF recommender.
    Bilateral Self-unbiased Learning from Biased Implicit Feedback. (arXiv:2207.12660v1 [cs.IR])
    Implicit feedback has been widely used to build commercial recommender systems. Because observed feedback represents users' click logs, there is a semantic gap between true relevance and observed feedback. More importantly, observed feedback is usually biased towards popular items, thereby overestimating the actual relevance of popular items. Although existing studies have developed unbiased learning methods using inverse propensity weighting (IPW) or causal reasoning, they solely focus on eliminating the popularity bias of items. In this paper, we propose a novel unbiased recommender learning model, namely BIlateral SElf-unbiased Recommender (BISER), to eliminate the exposure bias of items caused by recommender models. Specifically, BISER consists of two key components: (i) self-inverse propensity weighting (SIPW) to gradually mitigate the bias of items without incurring high computational costs; and (ii) bilateral unbiased learning (BU) to bridge the gap between two complementary models in model predictions, i.e., user- and item-based autoencoders, alleviating the high variance of SIPW. Extensive experiments show that BISER consistently outperforms state-of-the-art unbiased recommender models over several datasets, including Coat, Yahoo! R3, MovieLens, and CiteULike.
    Reconciling Security and Communication Efficiency in Federated Learning. (arXiv:2207.12779v1 [cs.LG])
    Cross-device Federated Learning is an increasingly popular machine learning setting to train a model by leveraging a large population of client devices with high privacy and security guarantees. However, communication efficiency remains a major bottleneck when scaling federated learning to production environments, particularly due to bandwidth constraints during uplink communication. In this paper, we formalize and address the problem of compressing client-to-server model updates under the Secure Aggregation primitive, a core component of Federated Learning pipelines that allows the server to aggregate the client updates without accessing them individually. In particular, we adapt standard scalar quantization and pruning methods to Secure Aggregation and propose Secure Indexing, a variant of Secure Aggregation that supports quantization for extreme compression. We establish state-of-the-art results on LEAF benchmarks in a secure Federated Learning setup with up to 40$\times$ compression in uplink communication with no meaningful loss in utility compared to uncompressed baselines.
    Static and Dynamic Concepts for Self-supervised Video Representation Learning. (arXiv:2207.12795v1 [cs.CV])
    In this paper, we propose a novel learning scheme for self-supervised video representation learning. Motivated by how humans understand videos, we propose to first learn general visual concepts then attend to discriminative local areas for video understanding. Specifically, we utilize static frame and frame difference to help decouple static and dynamic concepts, and respectively align the concept distributions in latent space. We add diversity and fidelity regularizations to guarantee that we learn a compact set of meaningful concepts. Then we employ a cross-attention mechanism to aggregate detailed local features of different concepts, and filter out redundant concepts with low activations to perform local concept contrast. Extensive experiments demonstrate that our method distills meaningful static and dynamic concepts to guide video understanding, and obtains state-of-the-art results on UCF-101, HMDB-51, and Diving-48.
    Repeated Environment Inference for Invariant Learning. (arXiv:2207.12876v1 [cs.LG])
    We study the problem of invariant learning when the environment labels are unknown. We focus on the invariant representation notion when the Bayes optimal conditional label distribution is the same across different environments. Previous work conducts Environment Inference (EI) by maximizing the penalty term from Invariant Risk Minimization (IRM) framework. The EI step uses a reference model which focuses on spurious correlations to efficiently reach a good environment partition. However, it is not clear how to find such a reference model. In this work, we propose to repeat the EI process and retrain an ERM model on the \textit{majority} environment inferred by the previous EI step. Under mild assumptions, we find that this iterative process helps learn a representation capturing the spurious correlation better than the single step. This results in better Environment Inference and better Invariant Learning. We show that this method outperforms baselines on both synthetic and real-world datasets.
    Thermodynamics of learning physical phenomena. (arXiv:2207.12749v1 [cs.LG])
    Thermodynamics could be seen as an expression of physics at a high epistemic level. As such, its potential as an inductive bias to help machine learning procedures attain accurate and credible predictions has been recently realized in many fields. We review how thermodynamics provides helpful insights in the learning process. At the same time, we study the influence of aspects such as the scale at which a given phenomenon is to be described, the choice of relevant variables for this description or the different techniques available for the learning process.
    A Retrospective on ICSE 2022. (arXiv:2207.12578v1 [cs.SE])
    The 44th International Conference on Software Engineering (ICSE 2022) was held in person from May 22 to May 27, 2022 in Pittsburgh, PA, USA. Here, we summarize themes of research and the direction of research in the field of software engineering and testing that we observed at the conference.
    Static Hand Gesture Recognition for American Sign Language using Neuromorphic Hardware. (arXiv:2207.12559v1 [cs.LG])
    In this paper, we develop four spiking neural network (SNN) models for two static American Sign Language (ASL) hand gesture classification tasks, i.e., the ASL Alphabet and ASL Digits. The SNN models are deployed on Intel's neuromorphic platform, Loihi, and then compared against equivalent deep neural network (DNN) models deployed on an edge computing device, the Intel Neural Compute Stick 2 (NCS2). We perform a comprehensive comparison between the two systems in terms of accuracy, latency, power consumption, and energy. The best DNN model achieves an accuracy of 99.6% on the ASL Alphabet dataset, whereas the best performing SNN model has an accuracy of 99.44%. For the ASL-Digits dataset, the best SNN model outperforms all of its DNN counterparts with 99.52% accuracy. Moreover, our obtained experimental results show that the Loihi neuromorphic hardware implementations achieve up to 14.67x and 4.09x reduction in power consumption and energy, respectively, when compared to NCS2.
    AMLB: an AutoML Benchmark. (arXiv:2207.12560v1 [cs.LG])
    Comparing different AutoML frameworks is notoriously challenging and often done incorrectly. We introduce an open and extensible benchmark that follows best practices and avoids common mistakes when comparing AutoML frameworks. We conduct a thorough comparison of 9 well-known AutoML frameworks across 71 classification and 33 regression tasks. The differences between the AutoML frameworks are explored with a multi-faceted analysis, evaluating model accuracy, its trade-offs with inference time, and framework failures. We also use Bradley-Terry trees to discover subsets of tasks where the relative AutoML framework rankings differ. The benchmark comes with an open-source tool that integrates with many AutoML frameworks and automates the empirical evaluation process end-to-end: from framework installation and resource allocation to in-depth evaluation. The benchmark uses public data sets, can be easily extended with other AutoML frameworks and tasks, and has a website with up-to-date results.
    Extreme compression of sentence-transformer ranker models: faster inference, longer battery life, and less storage on edge devices. (arXiv:2207.12852v1 [cs.LG])
    Modern search systems use several large ranker models with transformer architectures. These models require large computational resources and are not suitable for usage on devices with limited computational resources. Knowledge distillation is a popular compression technique that can reduce the resource needs of such models, where a large teacher model transfers knowledge to a small student model. To drastically reduce memory requirements and energy consumption, we propose two extensions for a popular sentence-transformer distillation procedure: generation of an optimal size vocabulary and dimensionality reduction of the embedding dimension of teachers prior to distillation. We evaluate these extensions on two different types of ranker models. This results in extremely compressed student models whose analysis on a test dataset shows the significance and utility of our proposed extensions.
    A Study on the Use of Edge TPUs for Eye Fundus Image Segmentation. (arXiv:2207.12770v1 [eess.IV])
    Medical image segmentation can be implemented using Deep Learning methods with fast and efficient segmentation networks. Single-board computers (SBCs) are difficult to use to train deep networks due to their memory and processing limitations. Specific hardware such as Google's Edge TPU makes them suitable for real time predictions using complex pre-trained networks. In this work, we study the performance of two SBCs, with and without hardware acceleration for fundus image segmentation, though the conclusions of this study can be applied to the segmentation by deep neural networks of other types of medical images. To test the benefits of hardware acceleration, we use networks and datasets from a previous published work and generalize them by testing with a dataset with ultrasound thyroid images. We measure prediction times in both SBCs and compare them with a cloud based TPU system. The results show the feasibility of Machine Learning accelerated SBCs for optic disc and cup segmentation obtaining times below 25 milliseconds per image using Edge TPUs.
    Generalized Probabilistic U-Net for medical image segementation. (arXiv:2207.12872v1 [cs.CV])
    We propose the Generalized Probabilistic U-Net, which extends the Probabilistic U-Net by allowing more general forms of the Gaussian distribution as the latent space distribution that can better approximate the uncertainty in the reference segmentations. We study the effect the choice of latent space distribution has on capturing the uncertainty in the reference segmentations using the LIDC-IDRI dataset. We show that the choice of distribution affects the sample diversity of the predictions and their overlap with respect to the reference segmentations. For the LIDC-IDRI dataset, we show that using a mixture of Gaussians results in a statistically significant improvement in the generalized energy distance (GED) metric with respect to the standard Probabilistic U-Net. We have made our implementation available at https://github.com/ishaanb92/GeneralizedProbabilisticUNet
    Benchmark time series data sets for PyTorch -- the torchtime package. (arXiv:2207.12503v1 [cs.LG])
    The development of models for Electronic Health Record data is an area of active research featuring a small number of public benchmark data sets. Researchers typically write custom data processing code but this hinders reproducibility and can introduce errors. The Python package torchtime provides reproducible implementations of commonly used PhysioNet and UEA & UCR time series classification repository data sets for PyTorch. Features are provided for working with irregularly sampled and partially observed time series of unequal length. It aims to simplify access to PhysioNet data and enable fair comparisons of models in this exciting area of research.
    Time Majority Voting, a PC-based EEG Classifier for Non-expert Users. (arXiv:2207.12662v1 [cs.LG])
    Using Machine Learning and Deep Learning to predict cognitive tasks from electroencephalography (EEG) signals is a rapidly advancing field in Brain-Computer Interfaces (BCI). In contrast to the fields of computer vision and natural language processing, the data amount of these trials is still rather tiny. Developing a PC-based machine learning technique to increase the participation of non-expert end-users could help solve this data collection issue. We created a novel algorithm for machine learning called Time Majority Voting (TMV). In our experiment, TMV performed better than cutting-edge algorithms. It can operate efficiently on personal computers for classification tasks involving the BCI. These interpretable data also assisted end-users and researchers in comprehending EEG tests better.
    A Data Driven Method for Multi-step Prediction of Ship Roll Motion in High Sea States. (arXiv:2207.12673v1 [cs.LG])
    Accurate prediction of roll motion in high sea state is significant for the operability, safety and survivability of marine vehicles. This paper presents a novel data-driven methodology for achieving the multi-step prediction of ship roll motion in high sea states. A hybrid neural network, named ConvLSTMPNet, is proposed to execute long short-term memory (LSTM) and one-dimensional convolutional neural networks (CNN) in parallel to extract time-dependent and spatio-temporal information from multidimensional inputs. Taken KCS as the study object, the numerical solution of computational fluid dynamics method is utilized to generate the ship motion data in sea state 7 with different wave directions. An in-depth comparative study on the selection of feature space is conducted, considering the effects of time history of motion states and wave height. The comparison results demonstrate the superiority of selecting both motion states and wave heights as the feature space for multi-step prediction. In addition, the results demonstrate that ConvLSTMNet achieves more accurate than LSTM and CNN methods in multi-step prediction of roll motion, validating the efficiency of the proposed method.
    Unsupervised Image Representation Learning with Deep Latent Particles. (arXiv:2205.15821v2 [cs.CV] UPDATED)
    We propose a new representation of visual data that disentangles object position from appearance. Our method, termed Deep Latent Particles (DLP), decomposes the visual input into low-dimensional latent ``particles'', where each particle is described by its spatial location and features of its surrounding region. To drive learning of such representations, we follow a VAE-based approach and introduce a prior for particle positions based on a spatial-softmax architecture, and a modification of the evidence lower bound loss inspired by the Chamfer distance between particles. We demonstrate that our DLP representations are useful for downstream tasks such as unsupervised keypoint (KP) detection, image manipulation, and video prediction for scenes composed of multiple dynamic objects. In addition, we show that our probabilistic interpretation of the problem naturally provides uncertainty estimates for particle locations, which can be used for model selection, among other tasks. Videos and code are available: https://taldatech.github.io/deep-latent-particles-web/
    Classifier-Free Diffusion Guidance. (arXiv:2207.12598v1 [cs.LG])
    Classifier guidance is a recently introduced method to trade off mode coverage and sample fidelity in conditional diffusion models post training, in the same spirit as low temperature sampling or truncation in other types of generative models. Classifier guidance combines the score estimate of a diffusion model with the gradient of an image classifier and thereby requires training an image classifier separate from the diffusion model. It also raises the question of whether guidance can be performed without a classifier. We show that guidance can be indeed performed by a pure generative model without such a classifier: in what we call classifier-free guidance, we jointly train a conditional and an unconditional diffusion model, and we combine the resulting conditional and unconditional score estimates to attain a trade-off between sample quality and diversity similar to that obtained using classifier guidance.
    A Learning and Control Perspective for Microfinance. (arXiv:2207.12631v1 [q-fin.GN])
    Microfinance in developing areas such as Africa has been proven to improve the local economy significantly. However, many applicants in developing areas cannot provide adequate information required by the financial institution to make a lending decision. As a result, it is challenging for microfinance institutions to assign credit properly based on conventional policies. In this paper, we formulate the decision-making of microfinance into a rigorous optimization-based framework involving learning and control. We propose an algorithm to explore and learn the optimal policy to approve or reject applicants. We provide the conditions under which the algorithms are guaranteed to converge to an optimal one. The proposed algorithm can naturally deal with missing information and systematically tradeoff multiple objectives such as profit maximization, financial inclusion, social benefits, and economic development. Through extensive simulation of both real and synthetic microfinance datasets, we showed our proposed algorithm is superior to existing benchmarks. To the best of our knowledge, this paper is the first to make a connection between microfinance and control and use control-theoretic tools to optimize the policy with a provable guarantee.
    Learning Protein Representations via Complete 3D Graph Networks. (arXiv:2207.12600v1 [cs.LG])
    We consider representation learning for proteins with 3D structures. We build 3D graphs based on protein structures and develop graph networks to learn their representations. Depending on the levels of details that we wish to capture, protein representations can be computed at different levels, \emph{e.g.}, the amino acid, backbone, or all-atom levels. Importantly, there exist hierarchical relations among different levels. In this work, we propose to develop a novel hierarchical graph network, known as ProNet, to capture the relations. Our ProNet is very flexible and can be used to compute protein representations at different levels of granularity. We show that, given a base 3D graph network that is complete, our ProNet representations are also complete at all levels. To close the loop, we develop a complete and efficient 3D graph network to be used as a base model, making our ProNet complete. We conduct experiments on multiple downstream tasks. Results show that ProNet outperforms recent methods on most datasets. In addition, results indicate that different downstream tasks may require representations at different levels. Our code is available as part of the DIG library (\url{https://github.com/divelab/DIG}).
    An Empirical Deep Dive into Deep Learning's Driving Dynamics. (arXiv:2207.12547v1 [cs.LG])
    We present an empirical dataset surveying the deep learning phenomenon on fully-connected networks, encompassing the training and test performance of numerous network topologies, sweeping across multiple learning tasks, depths, numbers of free parameters, learning rates, batch sizes, and regularization penalties. The dataset probes 178 thousand hyperparameter settings with an average of 20 repetitions each, totaling 3.5 million training runs and 20 performance metrics for each of the 13.1 billion training epochs observed. Accumulating this 671 GB dataset utilized 5,448 CPU core-years, 17.8 GPU-years, and 111.2 node-years. Additionally, we provide a preliminary analysis revealing patterns which persist across learning tasks and topologies. We aim to inspire work empirically studying modern machine learning techniques as a catalyst for the theoretical discoveries needed to progress the field beyond energy-intensive and heuristic practices.
    Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability. (arXiv:2207.12678v1 [cs.LG])
    Recent findings (e.g., arXiv:2103.00065) demonstrate that modern neural networks trained by full-batch gradient descent typically enter a regime called Edge of Stability (EOS). In this regime, the sharpness, i.e., the maximum Hessian eigenvalue, first increases to the value 2/(step size) (the progressive sharpening phase) and then oscillates around this value (the EOS phase). This paper aims to analyze the GD dynamics and the sharpness along the optimization trajectory. Our analysis naturally divides the GD trajectory into four phases depending on the change of the sharpness. We empirically identify the norm of output layer weight as an interesting indicator of sharpness dynamics. Based on this empirical observation, we attempt to theoretically and empirically explain the dynamics of various key quantities that lead to the change of sharpness in each phase of EOS. Moreover, based on certain assumptions, we provide a theoretical proof of the sharpness behavior in EOS regime in two-layer fully-connected linear neural networks. We also discuss some other empirical findings and the limitation of our theoretical results.
    Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions. (arXiv:2207.12463v1 [cs.LG])
    While single-agent policy optimization in a fixed environment has attracted a lot of research attention recently in the reinforcement learning community, much less is known theoretically when there are multiple agents playing in a potentially competitive environment. We take steps forward by proposing and analyzing new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions. We consider two classes of transition structures: factored independent transition and single-controller transition. For both scenarios, we prove tight $\widetilde{\mathcal{O}}(\sqrt{K})$ regret bounds after $K$ episodes in a two-agent competitive game scenario. The regret of each agent is measured against a potentially adversarial opponent who can choose a single best policy in hindsight after observing the full policy sequence. Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization in a non-stationary environment. When both players adopt the proposed algorithms, their overall optimality gap is $\widetilde{\mathcal{O}}(\sqrt{K})$.
    The Bearable Lightness of Big Data: Towards Massive Public Datasets in Scientific Machine Learning. (arXiv:2207.12546v1 [cs.LG])
    In general, large datasets enable deep learning models to perform with good accuracy and generalizability. However, massive high-fidelity simulation datasets (from molecular chemistry, astrophysics, computational fluid dynamics (CFD), etc. can be challenging to curate due to dimensionality and storage constraints. Lossy compression algorithms can help mitigate limitations from storage, as long as the overall data fidelity is preserved. To illustrate this point, we demonstrate that deep learning models, trained and tested on data from a petascale CFD simulation, are robust to errors introduced during lossy compression in a semantic segmentation problem. Our results demonstrate that lossy compression algorithms offer a realistic pathway for exposing high-fidelity scientific data to open-source data repositories for building community datasets. In this paper, we outline, construct, and evaluate the requirements for establishing a big data framework, demonstrated at https://blastnet.github.io/, for scientific machine learning.
    On the benefits of non-linear weight updates. (arXiv:2207.12505v1 [cs.LG])
    Recent work has suggested that the generalisation performance of a DNN is related to the extent to which the Signal-to-Noise Ratio is optimised at each of the nodes. In contrast, Gradient Descent methods do not always lead to SNR-optimal weight configurations. One way to improve SNR performance is to suppress large weight updates and amplify small weight updates. Such balancing is already implicit in some common optimizers, but we propose an approach that makes this explicit. The method applies a non-linear function to gradients prior to making DNN parameter updates. We investigate the performance with such non-linear approaches. The result is an adaptation to existing optimizers that improves performance for many problem types.
    Variance estimation in graphs with the fused lasso. (arXiv:2207.12638v1 [math.ST])
    We study the problem of variance estimation in general graph-structured problems. First, we develop a linear time estimator for the homoscedastic case that can consistently estimate the variance in general graphs. We show that our estimator attains minimax rates for the chain and 2D grid graphs when the mean signal has a total variation with canonical scaling. Furthermore, we provide general upper bounds on the mean squared error performance of the fused lasso estimator in general graphs under a moment condition and a bound on the tail behavior of the errors. These upper bounds allow us to generalize for broader classes of distributions, such as sub-Exponential, many existing results on the fused lasso that are only known to hold with the assumption that errors are sub-Gaussian random variables. Exploiting our upper bounds, we then study a simple total variation regularization estimator for estimating the signal of variances in the heteroscedastic case. Our results show that the variance estimator attains minimax rates for estimating signals of bounded variation in grid graphs, $K$-nearest neighbor graphs with very mild assumptions, and it is consistent for estimating the variances in any connected graph. In addition, extensive numerical results show that our proposed estimators perform reasonably well in a variety of graph-structured models.
    A Survey of Explainable Graph Neural Networks: Taxonomy and Evaluation Metrics. (arXiv:2207.12599v1 [cs.LG])
    Graph neural networks (GNNs) have demonstrated a significant boost in prediction performance on graph data. At the same time, the predictions made by these models are often hard to interpret. In that regard, many efforts have been made to explain the prediction mechanisms of these models from perspectives such as GNNExplainer, XGNN and PGExplainer. Although such works present systematic frameworks to interpret GNNs, a holistic review for explainable GNNs is unavailable. In this survey, we present a comprehensive review of explainability techniques developed for GNNs. We focus on explainable graph neural networks and categorize them based on the use of explainable methods. We further provide the common performance metrics for GNNs explanations and point out several future research directions.
    Variational multiscale reinforcement learning for discovering reduced order closure models of nonlinear spatiotemporal transport systems. (arXiv:2207.12854v1 [cs.LG])
    A central challenge in the computational modeling and simulation of a multitude of science applications is to achieve robust and accurate closures for their coarse-grained representations due to underlying highly nonlinear multiscale interactions. These closure models are common in many nonlinear spatiotemporal systems to account for losses due to reduced order representations, including many transport phenomena in fluids. Previous data-driven closure modeling efforts have mostly focused on supervised learning approaches using high fidelity simulation data. On the other hand, reinforcement learning (RL) is a powerful yet relatively uncharted method in spatiotemporally extended systems. In this study, we put forth a modular dynamic closure modeling and discovery framework to stabilize the Galerkin projection based reduced order models that may arise in many nonlinear spatiotemporal dynamical systems with quadratic nonlinearity. However, a key element in creating a robust RL agent is to introduce a feasible reward function, which can be constituted of any difference metrics between the RL model and high fidelity simulation data. First, we introduce a multi-modal RL (MMRL) to discover mode-dependant closure policies that utilize the high fidelity data in rewarding our RL agent. We then formulate a variational multiscale RL (VMRL) approach to discover closure models without requiring access to the high fidelity data in designing the reward function. Specifically, our chief innovation is to leverage variational multiscale formalism to quantify the difference between modal interactions in Galerkin systems. Our results in simulating the viscous Burgers equation indicate that the proposed VMRL method leads to robust and accurate closure parameterizations, and it may potentially be used to discover scale-aware closure models for complex dynamical systems.
    Probing Speech Emotion Recognition Transformers for Linguistic Knowledge. (arXiv:2204.00400v2 [cs.CL] UPDATED)
    Large, pre-trained neural networks consisting of self-attention layers (transformers) have recently achieved state-of-the-art results on several speech emotion recognition (SER) datasets. These models are typically pre-trained in self-supervised manner with the goal to improve automatic speech recognition performance -- and thus, to understand linguistic information. In this work, we investigate the extent in which this information is exploited during SER fine-tuning. Using a reproducible methodology based on open-source tools, we synthesise prosodically neutral speech utterances while varying the sentiment of the text. Valence predictions of the transformer model are very reactive to positive and negative sentiment content, as well as negations, but not to intensifiers or reducers, while none of those linguistic features impact arousal or dominance. These findings show that transformers can successfully leverage linguistic information to improve their valence predictions, and that linguistic analysis should be included in their testing.
    Compositional Visual Generation with Composable Diffusion Models. (arXiv:2206.01714v3 [cs.CV] UPDATED)
    Large text-guided diffusion models, such as DALLE-2, are able to generate stunning photorealistic images given natural language descriptions. While such models are highly flexible, they struggle to understand the composition of certain concepts, such as confusing the attributes of different objects or relations between objects. In this paper, we propose an alternative structured approach for compositional generation using diffusion models. An image is generated by composing a set of diffusion models, with each of them modeling a certain component of the image. To do this, we interpret diffusion models as energy-based models in which the data distributions defined by the energy functions may be explicitly combined. The proposed method can generate scenes at test time that are substantially more complex than those seen in training, composing sentence descriptions, object relations, human facial attributes, and even generalizing to new combinations that are rarely seen in the real world. We further illustrate how our approach may be used to compose pre-trained text-guided diffusion models and generate photorealistic images containing all the details described in the input descriptions, including the binding of certain object attributes that have been shown difficult for DALLE-2. These results point to the effectiveness of the proposed method in promoting structured generalization for visual generation. Project page: https://energy-based-model.github.io/Compositional-Visual-Generation-with-Composable-Diffusion-Models/
    safe-control-gym: a Unified Benchmark Suite for Safe Learning-based Control and Reinforcement Learning in Robotics. (arXiv:2109.06325v4 [cs.RO] UPDATED)
    In recent years, both reinforcement learning and learning-based control -- as well as the study of their safety, which is crucial for deployment in real-world robots -- have gained significant traction. However, to adequately gauge the progress and applicability of new results, we need the tools to equitably compare the approaches proposed by the controls and reinforcement learning communities. Here, we propose a new open-source benchmark suite, called safe-control-gym, supporting both model-based and data-based control techniques. We provide implementations for three dynamic systems -- the cart-pole, the 1D, and 2D quadrotor -- and two control tasks -- stabilization and trajectory tracking. We propose to extend OpenAI's Gym API -- the de facto standard in reinforcement learning research -- with (i) the ability to specify (and query) symbolic dynamics and (ii) constraints, and (iii) (repeatably) inject simulated disturbances in the control inputs, state measurements, and inertial properties. To demonstrate our proposal and in an attempt to bring research communities closer together, we show how to use safe-control-gym to quantitatively compare the control performance, data efficiency, and safety of multiple approaches from the fields of traditional control, learning-based control, and reinforcement learning.
    Future-Dependent Value-Based Off-Policy Evaluation in POMDPs. (arXiv:2207.13081v1 [cs.LG])
    We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential importance sampling estimators and fitted-Q evaluation suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introducing future-dependent value functions that take future proxies as inputs. Future-dependent value functions play similar roles as classical value functions in fully-observable MDPs. We derive a new Bellman equation for future-dependent value functions as conditional moment equations that use history proxies as instrumental variables. We further propose a minimax learning method to learn future-dependent value functions using the new Bellman equation. We obtain the PAC result, which implies our OPE estimator is consistent as long as futures and histories contain sufficient information about latent states, and the Bellman completeness. Finally, we extend our methods to learning of dynamics and establish the connection between our approach and the well-known spectral learning methods in POMDPs.
    Modeling the Social Influence of COVID-19 via Personalized Propagation with Deep Learning. (arXiv:2207.13016v1 [cs.SI])
    Social influence prediction has permeated many domains, including marketing, behavior prediction, recommendation systems, and more. However, traditional methods of predicting social influence not only require domain expertise,they also rely on extracting user features, which can be very tedious. Additionally, graph convolutional networks (GCNs), which deals with graph data in non-Euclidean space, are not directly applicable to Euclidean space. To overcome these problems, we extended DeepInf such that it can predict the social influence of COVID-19 via the transition probability of the page rank domain. Furthermore, our implementation gives rise to a deep learning-based personalized propagation algorithm, called DeepPP. The resulting algorithm combines the personalized propagation of a neural prediction model with the approximate personalized propagation of a neural prediction model from page rank analysis. Four social networks from different domains as well as two COVID-19 datasets were used to demonstrate the efficiency and effectiveness of the proposed algorithm. Compared to other baseline methods, DeepPP provides more accurate social influence predictions. Further, experiments demonstrate that DeepPP can be applied to real-world prediction data for COVID-19.
    Improved and Interpretable Defense to Transferred Adversarial Examples by Jacobian Norm with Selective Input Gradient Regularization. (arXiv:2207.13036v1 [cs.LG])
    Deep neural networks (DNNs) are known to be vulnerable to adversarial examples that are crafted with imperceptible perturbations, i.e., a small change in an input image can induce a mis-classification, and thus threatens the reliability of deep learning based deployment systems. Adversarial training (AT) is frequently used to improve the robustness of DNNs, which can improve the robustness in training a mixture of corrupted and clean data. However, existing AT based methods are either computationally expensive in generating such adversarial examples, and thus cannot satisfy the real-time requirement of real-world scenarios or cannot produce interpretable predictions for \textit{transferred adversarial examples} generated to fool a wide spectrum of defense models. In this work, we propose an approach of Jacobian norm with Selective Input Gradient Regularization (J-SIGR), which selectively regularizes gradient-based saliency maps to imitate its interpretable prediction with respect to the input through Jacobian normalization. As such, we achieve the defense of DNNs with both high interpretability and computation efficiency. Finally, we evaluate our method across different architectures against powerful adversarial attacks. Experiments demonstrate that the proposed J-SIGR confers improved robustness against transferred adversarial attacks and shows that the network predictions are easy-interpretable.
    Evolving Reinforcement Learning Algorithms. (arXiv:2101.03958v5 [cs.LG] UPDATED)
    We propose a method for meta-learning reinforcement learning algorithms by searching over the space of computational graphs which compute the loss function for a value-based model-free RL agent to optimize. The learned algorithms are domain-agnostic and can generalize to new environments not seen during training. Our method can both learn from scratch and bootstrap off known existing algorithms, like DQN, enabling interpretable modifications which improve performance. Learning from scratch on simple classical control and gridworld tasks, our method rediscovers the temporal-difference (TD) algorithm. Bootstrapped from DQN, we highlight two learned algorithms which obtain good generalization performance over other classical control tasks, gridworld type tasks, and Atari games. The analysis of the learned algorithm behavior shows resemblance to recently proposed RL algorithms that address overestimation in value-based methods.
    Alleviation of Temperature Variation Induced Accuracy Deg-radation in Ferroelectric FinFET Based Neural Network. (arXiv:2103.03111v4 [cs.LG] UPDATED)
    This paper reports the impacts of temperature variation on the inference accuracy of pre-trained all-ferroelectric FinFET deep neural networks, along with plausible design techniques to abate these impacts. We adopted a pre-trained artificial neural network (N.N.) with 96.4% inference accuracy on the MNIST dataset as the baseline. As an aftermath of temperature change, a compact model captured the conductance drift of a programmed cell over a wide range of gate biases. We observed a significant inference accuracy degradation in the analog neural network at 233 K for an N.N. trained at 300 K. Finally, we deployed binary neural networks with "read voltage" optimization to ensure immunity of N.N. to accuracy degradation under temperature variation, maintaining an inference accuracy of 96%. Keywords: Ferroelectric memories
    Partial-Monotone Adaptive Submodular Maximization. (arXiv:2207.12840v1 [cs.LG])
    Many sequential decision making problems, including pool-based active learning and adaptive viral marketing, can be formulated as an adaptive submodular maximization problem. Most of existing studies on adaptive submodular optimization focus on either monotone case or non-monotone case. Specifically, if the utility function is monotone and adaptive submodular, \cite{golovin2011adaptive} developed a greedy policy that achieves a $(1-1/e)$ approximation ratio subject to a cardinality constraint. If the utility function is non-monotone and adaptive submodular, \cite{tang2021beyond} showed that a random greedy policy achieves a $1/e$ approximation ratio subject to a cardinality constraint. In this work, we aim to generalize the above mentioned results by studying the partial-monotone adaptive submodular maximization problem. To this end, we introduce the notation of adaptive monotonicity ratio $m\in[0,1]$ to measure the degree of monotonicity of a function. Our main result is to show that a random greedy policy achieves an approximation ratio of $m(1-1/e)+(1-m)(1/e)$ if the utility function is $m$-adaptive monotone and adaptive submodular. Notably this result recovers the aforementioned $(1-1/e)$ and $1/e$ approximation ratios when $m = 0$ and $m = 1$, respectively. We further extend our results to consider a knapsack constraint. We show that a sampling-based policy achieves an approximation ratio of $(m+1)/10$ if the utility function is $m$-adaptive monotone and adaptive submodular. One important implication of our results is that even for a non-monotone utility function, we still can achieve an approximation ratio close to $(1-1/e)$ if this function is ``close'' to a monotone function. This leads to improved performance bounds for many machine learning applications whose utility functions are almost adaptive monotone.
    Semi-Leak: Membership Inference Attacks Against Semi-supervised Learning. (arXiv:2207.12535v1 [cs.CR])
    Semi-supervised learning (SSL) leverages both labeled and unlabeled data to train machine learning (ML) models. State-of-the-art SSL methods can achieve comparable performance to supervised learning by leveraging much fewer labeled data. However, most existing works focus on improving the performance of SSL. In this work, we take a different angle by studying the training data privacy of SSL. Specifically, we propose the first data augmentation-based membership inference attacks against ML models trained by SSL. Given a data sample and the black-box access to a model, the goal of membership inference attack is to determine whether the data sample belongs to the training dataset of the model. Our evaluation shows that the proposed attack can consistently outperform existing membership inference attacks and achieves the best performance against the model trained by SSL. Moreover, we uncover that the reason for membership leakage in SSL is different from the commonly believed one in supervised learning, i.e., overfitting (the gap between training and testing accuracy). We observe that the SSL model is well generalized to the testing data (with almost 0 overfitting) but ''memorizes'' the training data by giving a more confident prediction regardless of its correctness. We also explore early stopping as a countermeasure to prevent membership inference attacks against SSL. The results show that early stopping can mitigate the membership inference attack, but with the cost of model's utility degradation.  ( 3 min )
    Compiler-Aware Neural Architecture Search for On-Mobile Real-time Super-Resolution. (arXiv:2207.12577v1 [cs.CV])
    Deep learning-based super-resolution (SR) has gained tremendous popularity in recent years because of its high image quality performance and wide application scenarios. However, prior methods typically suffer from large amounts of computations and huge power consumption, causing difficulties for real-time inference, especially on resource-limited platforms such as mobile devices. To mitigate this, we propose a compiler-aware SR neural architecture search (NAS) framework that conducts depth search and per-layer width search with adaptive SR blocks. The inference speed is directly taken into the optimization along with the SR loss to derive SR models with high image quality while satisfying the real-time inference requirement. Instead of measuring the speed on mobile devices at each iteration during the search process, a speed model incorporated with compiler optimizations is leveraged to predict the inference latency of the SR block with various width configurations for faster convergence. With the proposed framework, we achieve real-time SR inference for implementing 720p resolution with competitive SR performance (in terms of PSNR and SSIM) on GPU/DSP of mobile platforms (Samsung Galaxy S21).  ( 2 min )
    Trainability Preserving Neural Structured Pruning. (arXiv:2207.12534v1 [cs.LG])
    Several recent works empirically find finetuning learning rate is critical to the final performance in neural network structured pruning. Further researches find that the network trainability broken by pruning answers for it, thus calling for an urgent need to recover trainability before finetuning. Existing attempts propose to exploit weight orthogonalization to achieve dynamical isometry for improved trainability. However, they only work for linear MLP networks. How to develop a filter pruning method that maintains or recovers trainability and is scalable to modern deep networks remains elusive. In this paper, we present trainability preserving pruning (TPP), a regularization-based structured pruning method that can effectively maintain trainability during sparsification. Specifically, TPP regularizes the gram matrix of convolutional kernels so as to de-correlate the pruned filters from the kept filters. Beside the convolutional layers, we also propose to regularize the BN parameters for better preserving trainability. Empirically, TPP can compete with the ground-truth dynamical isometry recovery method on linear MLP networks. On non-linear networks (ResNet56/VGG19, CIFAR datasets), it outperforms the other counterpart solutions by a large margin. Moreover, TPP can also work effectively with modern deep networks (ResNets) on ImageNet, delivering encouraging performance in comparison to many top-performing filter pruning methods. To our best knowledge, this is the first approach that effectively maintains trainability during pruning for the large-scale deep neural networks.  ( 3 min )
    Optimizing Empty Container Repositioning and Fleet Deployment via Configurable Semi-POMDPs. (arXiv:2207.12509v1 [cs.LG])
    With the continuous growth of the global economy and markets, resource imbalance has risen to be one of the central issues in real logistic scenarios. In marine transportation, this trade imbalance leads to Empty Container Repositioning (ECR) problems. Once the freight has been delivered from an exporting country to an importing one, the laden will turn into empty containers that need to be repositioned to satisfy new goods requests in exporting countries. In such problems, the performance that any cooperative repositioning policy can achieve strictly depends on the routes that vessels will follow (i.e., fleet deployment). Historically, Operation Research (OR) approaches were proposed to jointly optimize the repositioning policy along with the fleet of vessels. However, the stochasticity of future supply and demand of containers, together with black-box and non-linear constraints that are present within the environment, make these approaches unsuitable for these scenarios. In this paper, we introduce a novel framework, Configurable Semi-POMDPs, to model this type of problems. Furthermore, we provide a two-stage learning algorithm, "Configure & Conquer" (CC), that first configures the environment by finding an approximation of the optimal fleet deployment strategy, and then "conquers" it by learning an ECR policy in this tuned environmental setting. We validate our approach in large and real-world instances of the problem. Our experiments highlight that CC avoids the pitfalls of OR methods and that it is successful at optimizing both the ECR policy and the fleet of vessels, leading to superior performance in world trade environments.  ( 3 min )
    ScoreCAM GNN: une explication optimale des r\'eseaux profonds sur graphes. (arXiv:2207.12748v1 [cs.LG])
    The explainability of deep networks is becoming a central issue in the deep learning community. It is the same for learning on graphs, a data structure present in many real world problems. In this paper, we propose a method that is more optimal, lighter, consistent and better exploits the topology of the evaluated graph than the state-of-the-art methods.
    Bayesian tensor factorization for predicting clinical outcomes using integrated human genetics evidence. (arXiv:2207.12538v1 [cs.LG])
    The approval success rate of drug candidates is very low with the majority of failure due to safety and efficacy. Increasingly available high dimensional information on targets, drug molecules and indications provides an opportunity for ML methods to integrate multiple data modalities and better predict clinically promising drug targets. Notably, drug targets with human genetics evidence are shown to have better odds to succeed. However, a recent tensor factorization-based approach found that additional information on targets and indications might not necessarily improve the predictive accuracy. Here we revisit this approach by integrating different types of human genetics evidence collated from publicly available sources to support each target-indication pair. We use Bayesian tensor factorization to show that models incorporating all available human genetics evidence (rare disease, gene burden, common disease) modestly improves the clinical outcome prediction over models using single line of genetics evidence. We provide additional insight into the relative predictive power of different types of human genetics evidence for predicting the success of clinical outcomes.
    An Explainable Decision Support System for Predictive Process Analytics. (arXiv:2207.12782v1 [cs.LG])
    Predictive Process Analytics is becoming an essential aid for organizations, providing online operational support of their processes. However, process stakeholders need to be provided with an explanation of the reasons why a given process execution is predicted to behave in a certain way. Otherwise, they will be unlikely to trust the predictive monitoring technology and, hence, adopt it. This paper proposes a predictive analytics framework that is also equipped with explanation capabilities based on the game theory of Shapley Values. The framework has been implemented in the IBM Process Mining suite and commercialized for business users. The framework has been tested on real-life event data to assess the quality of the predictions and the corresponding evaluations. In particular, a user evaluation has been performed in order to understand if the explanations provided by the system were intelligible to process stakeholders.
    Automated discovery of interpretable gravitational-wave population models. (arXiv:2207.12409v1 [astro-ph.IM])
    We present an automatic approach to discover analytic population models for gravitational-wave (GW) events from data. As more gravitational-wave (GW) events are detected, flexible models such as Gaussian Mixture Models have become more important in fitting the distribution of GW properties due to their expressivity. However, flexible models come with many parameters that lack physical motivation, making interpreting the implication of these models challenging. In this work, we demonstrate symbolic regression can complement flexible models by distilling the posterior predictive distribution of such flexible models into interpretable analytic expressions. We recover common GW population models such as a power-law-plus-Gaussian, and find a new empirical population model which combines accuracy and simplicity. This demonstrates a strategy to automatically discover interpretable population models in the ever-growing GW catalog, which can potentially be applied to other astrophysical phenomena.  ( 2 min )
    Estimating and Controlling for Fairness via Sensitive Attribute Predictors. (arXiv:2207.12497v1 [cs.LG])
    Although machine learning classifiers have been increasingly used in high-stakes decision making (e.g., cancer diagnosis, criminal prosecution decisions), they have demonstrated biases against underrepresented groups. Standard definitions of fairness require access to sensitive attributes of interest (e.g., gender and race), which are often unavailable. In this work we demonstrate that in these settings where sensitive attributes are unknown, one can still reliably estimate and ultimately control for fairness by using proxy sensitive attributes derived from a sensitive attribute predictor. Specifically, we first show that with just a little knowledge of the complete data distribution, one may use a sensitive attribute predictor to obtain upper and lower bounds of the classifier's true fairness metric. Second, we demonstrate how one can provably control for fairness with respect to the true sensitive attributes by controlling for fairness with respect to the proxy sensitive attributes. Our results hold under assumptions that are significantly milder than previous works. We illustrate our results on a series of synthetic and real datasets.  ( 2 min )
    $p$-DkNN: Out-of-Distribution Detection Through Statistical Testing of Deep Representations. (arXiv:2207.12545v1 [cs.LG])
    The lack of well-calibrated confidence estimates makes neural networks inadequate in safety-critical domains such as autonomous driving or healthcare. In these settings, having the ability to abstain from making a prediction on out-of-distribution (OOD) data can be as important as correctly classifying in-distribution data. We introduce $p$-DkNN, a novel inference procedure that takes a trained deep neural network and analyzes the similarity structures of its intermediate hidden representations to compute $p$-values associated with the end-to-end model prediction. The intuition is that statistical tests performed on latent representations can serve not only as a classifier, but also offer a statistically well-founded estimation of uncertainty. $p$-DkNN is scalable and leverages the composition of representations learned by hidden layers, which makes deep representation learning successful. Our theoretical analysis builds on Neyman-Pearson classification and connects it to recent advances in selective classification (reject option). We demonstrate advantageous trade-offs between abstaining from predicting on OOD inputs and maintaining high accuracy on in-distribution inputs. We find that $p$-DkNN forces adaptive attackers crafting adversarial examples, a form of worst-case OOD inputs, to introduce semantically meaningful changes to the inputs.
    Domain Adaptation under Open Set Label Shift. (arXiv:2207.13048v1 [cs.LG])
    We introduce the problem of domain adaptation under Open Set Label Shift (OSLS) where the label distribution can change arbitrarily and a new class may arrive during deployment, but the class-conditional distributions p(x|y) are domain-invariant. OSLS subsumes domain adaptation under label shift and Positive-Unlabeled (PU) learning. The learner's goals here are two-fold: (a) estimate the target label distribution, including the novel class; and (b) learn a target classifier. First, we establish necessary and sufficient conditions for identifying these quantities. Second, motivated by advances in label shift and PU learning, we propose practical methods for both tasks that leverage black-box predictors. Unlike typical Open Set Domain Adaptation (OSDA) problems, which tend to be ill-posed and amenable only to heuristics, OSLS offers a well-posed problem amenable to more principled machinery. Experiments across numerous semi-synthetic benchmarks on vision, language, and medical datasets demonstrate that our methods consistently outperform OSDA baselines, achieving 10--25% improvements in target domain accuracy. Finally, we analyze the proposed methods, establishing finite-sample convergence to the true label marginal and convergence to optimal classifier for linear models in a Gaussian setup. Code is available at https://github.com/acmi-lab/Open-Set-Label-Shift.
  • Open

    Differentially Private Estimation via Statistical Depth. (arXiv:2207.12602v1 [stat.ML])
    Constructing a differentially private (DP) estimator requires deriving the maximum influence of an observation, which can be difficult in the absence of exogenous bounds on the input data or the estimator, especially in high dimensional settings. This paper shows that standard notions of statistical depth, i.e., halfspace depth and regression depth, are particularly advantageous in this regard, both in the sense that the maximum influence of a single observation is easy to analyze and that this value is typically low. This is used to motivate new approximate DP location and regression estimators using the maximizers of these two notions of statistical depth. A more computationally efficient variant of the approximate DP regression estimator is also provided. Also, to avoid requiring that users specify a priori bounds on the estimates and/or the observations, variants of these DP mechanisms are described that satisfy random differential privacy (RDP), which is a relaxation of differential privacy provided by Hall, Wasserman, and Rinaldo (2013). We also provide simulations of the two DP regression methods proposed here. The proposed estimators appear to perform favorably relative to the existing DP regression methods we consider in these simulations when either the sample size is at least 100-200 or the privacy-loss budget is sufficiently high.
    The derivatives of Sinkhorn-Knopp converge. (arXiv:2207.12717v1 [math.OC])
    We show that the derivatives of the Sinkhorn-Knopp algorithm, or iterative proportional fitting procedure, converge towards the derivatives of the entropic regularization of the optimal transport problem with a locally uniform linear convergence rate.
    The Optimal Noise in Noise-Contrastive Learning Is Not What You Think. (arXiv:2203.01110v2 [stat.ML] UPDATED)
    Learning a parametric model of a data distribution is a well-known statistical problem that has seen renewed interest as it is brought to scale in deep learning. Framing the problem as a self-supervised task, where data samples are discriminated from noise samples, is at the core of state-of-the-art methods, beginning with Noise-Contrastive Estimation (NCE). Yet, such contrastive learning requires a good noise distribution, which is hard to specify; domain-specific heuristics are therefore widely used. While a comprehensive theory is missing, it is widely assumed that the optimal noise should in practice be made equal to the data, both in distribution and proportion. This setting underlies Generative Adversarial Networks (GANs) in particular. Here, we empirically and theoretically challenge this assumption on the optimal noise. We show that deviating from this assumption can actually lead to better statistical estimators, in terms of asymptotic variance. In particular, the optimal noise distribution is different from the data's and even from a different family.
    Few-Shot Domain Adaptation For End-to-End Communication. (arXiv:2108.00874v2 [cs.LG] UPDATED)
    The problem of end-to-end learning of a communication system using an autoencoder -- consisting of an encoder, channel, and decoder modeled using neural networks -- has recently been shown to be a promising approach. A challenge faced in the practical adoption of this learning approach is that under changing channel conditions (e.g. a wireless link), it requires frequent retraining of the autoencoder in order to maintain a low decoding error rate. Since retraining is both time consuming and requires a large number of samples, it becomes impractical when the channel distribution is changing quickly. We propose to address this problem using a fast and sample-efficient (few-shot) domain adaptation method that does not change the encoder and decoder networks. Different from conventional training-time unsupervised or semi-supervised domain adaptation, here we have a trained autoencoder from a source distribution, that we want to adapt (at test time) to a target distribution using only a small labeled dataset and no unlabeled data. Our method focuses on a Gaussian mixture density network based channel model, and formulates its adaptation based on class and component-conditional affine transformations. The learned affine transformations are used to design an optimal input transformation at the decoder to compensate for the distribution shift, and effectively present to the decoder inputs close to the source distribution. Experiments on a real mmWave FPGA setup as well as a number of simulated distribution changes common to the wireless setting demonstrate the effectiveness of our method at adaptation using very small number of target domain samples.
    Contrastive Attraction and Contrastive Repulsion for Representation Learning. (arXiv:2105.03746v3 [cs.LG] UPDATED)
    Contrastive learning (CL) methods effectively learn data representations without label supervision, where the encoder contrasts each positive sample over multiple negative samples via a one-vs-many softmax cross-entropy loss. By leveraging large amounts of unlabeled image data, recent CL methods have achieved promising results when pre-trained on ImageNet, a well-curated data set with balanced image classes. However, they tend to yield worse performance when pre-trained on images in the wild. In this paper, to further improve the performance of CL and enhance its robustness on uncurated data sets, we propose a doubly CL strategy that contrasts the positive (negative) samples of a query within themselves before deciding how strongly to pull (push) them. We realize this strategy with contrastive attraction and contrastive repulsion (CACR), which makes the query not only exert a greater force to attract more distant positive samples but also do so to repel closer negative samples. Theoretical analysis reveals that CACR generalizes CL's behavior by taking into consideration the differences between the distributions of the positive/negative samples, which in general are sampled independently of the query, and their true conditional distributions given the query. We demonstrate this unique intra-positive attraction and intra-negative repulsion mechanism, which helps remove the need to assume uniform prior distributions on both the data and their latent representation, is particularly beneficial when data sets are less curated. Extensive large-scale experiments on a number of standard vision tasks show that CACR not only consistently outperforms existing CL methods on benchmark data sets in representation learning, but also shows better robustness when generalized to pre-training on imbalanced image data sets.
    Discriminative Multimodal Learning via Conditional Priors in Generative Models. (arXiv:2110.04616v2 [cs.LG] UPDATED)
    Deep generative models with latent variables have been used lately to learn joint representations and generative processes from multi-modal data. These two learning mechanisms can, however, conflict with each other and representations can fail to embed information on the data modalities. This research studies the realistic scenario in which all modalities and class labels are available for model training, but where some modalities and labels required for downstream tasks are missing. We show, in this scenario, that the variational lower bound limits mutual information between joint representations and missing modalities. We, to counteract these problems, introduce a novel conditional multi-modal discriminative model that uses an informative prior distribution and optimizes a likelihood-free objective function that maximizes mutual information between joint representations and missing modalities. Extensive experimentation shows the benefits of the model we propose, the empirical results showing that our model achieves state-of-the-art results in representative problems such as downstream classification, acoustic inversion and annotation generation.  ( 2 min )
    Modeling Irregular Time Series with Continuous Recurrent Units. (arXiv:2111.11344v3 [cs.LG] UPDATED)
    Recurrent neural networks (RNNs) are a popular choice for modeling sequential data. Modern RNN architectures assume constant time-intervals between observations. However, in many datasets (e.g. medical records) observation times are irregular and can carry important information. To address this challenge, we propose continuous recurrent units (CRUs) -- a neural architecture that can naturally handle irregular intervals between observations. The CRU assumes a hidden state, which evolves according to a linear stochastic differential equation and is integrated into an encoder-decoder framework. The recursive computations of the CRU can be derived using the continuous-discrete Kalman filter and are in closed form. The resulting recurrent architecture has temporal continuity between hidden states and a gating mechanism that can optimally integrate noisy observations. We derive an efficient parameterization scheme for the CRU that leads to a fast implementation f-CRU. We empirically study the CRU on a number of challenging datasets and find that it can interpolate irregular time series better than methods based on neural ordinary differential equations.  ( 2 min )
    Learning structures of the French clinical language:development and validation of word embedding models using 21 million clinical reports from electronic health records. (arXiv:2207.12940v1 [cs.CL])
    Background Clinical studies using real-world data may benefit from exploiting clinical reports, a particularly rich albeit unstructured medium. To that end, natural language processing can extract relevant information. Methods based on transfer learning using pre-trained language models have achieved state-of-the-art results in most NLP applications; however, publicly available models lack exposure to speciality-languages, especially in the medical field. Objective We aimed to evaluate the impact of adapting a language model to French clinical reports on downstream medical NLP tasks. Methods We leveraged a corpus of 21M clinical reports collected from August 2017 to July 2021 at the Greater Paris University Hospitals (APHP) to produce two CamemBERT architectures on speciality language: one retrained from scratch and the other using CamemBERT as its initialisation. We used two French annotated medical datasets to compare our language models to the original CamemBERT network, evaluating the statistical significance of improvement with the Wilcoxon test. Results Our models pretrained on clinical reports increased the average F1-score on APMed (an APHP-specific task) by 3 percentage points to 91%, a statistically significant improvement. They also achieved performance comparable to the original CamemBERT on QUAERO. These results hold true for the fine-tuned and from-scratch versions alike, starting from very few pre-training samples. Conclusions We confirm previous literature showing that adapting generalist pre-train language models such as CamenBERT on speciality corpora improves their performance for downstream clinical NLP tasks. Our results suggest that retraining from scratch does not induce a statistically significant performance gain compared to fine-tuning.  ( 3 min )
    Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms. (arXiv:2004.02635v4 [math.OC] UPDATED)
    We consider minimizing the sum of three convex functions, where the first one F is smooth, the second one is nonsmooth and proximable and the third one is the composition of a nonsmooth proximable function with a linear operator L. This template problem has many applications, for instance, in image processing and machine learning. First, we propose a new primal-dual algorithm, which we call PDDY, for this problem. It is constructed by applying Davis-Yin splitting to a monotone inclusion in a primal-dual product space, where the operators are monotone under a specific metric depending on L. We show that three existing algorithms (the two forms of the Condat-Vu algorithm and the PD3O algorithm) have the same structure, so that PDDY is the fourth missing link in this self-consistent class of primal-dual algorithms. This representation eases the convergence analysis: it allows us to derive sublinear convergence rates in general, and linear convergence results in presence of strong convexity. Moreover, within our broad and flexible analysis framework, we propose new stochastic generalizations of the algorithms, in which a variance-reduced random estimate of the gradient of F is used, instead of the true gradient. Furthermore, we obtain, as a special case of PDDY, a linearly converging algorithm for the minimization of a strongly convex function F under a linear constraint; we discuss its important application to decentralized optimization.  ( 3 min )
    Variational Inference with Locally Enhanced Bounds for Hierarchical Models. (arXiv:2203.04432v2 [cs.LG] UPDATED)
    Hierarchical models represent a challenging setting for inference algorithms. MCMC methods struggle to scale to large models with many local variables and observations, and variational inference (VI) may fail to provide accurate approximations due to the use of simple variational families. Some variational methods (e.g. importance weighted VI) integrate Monte Carlo methods to give better accuracy, but these tend to be unsuitable for hierarchical models, as they do not allow for subsampling and their performance tends to degrade for high dimensional models. We propose a new family of variational bounds for hierarchical models, based on the application of tightening methods (e.g. importance weighting) separately for each group of local random variables. We show that our approach naturally allows the use of subsampling to get unbiased gradients, and that it fully leverages the power of methods that build tighter lower bounds by applying them independently in lower dimensional spaces, leading to better results and more accurate posterior approximations than relevant baselines.  ( 2 min )
    Robustifying Conditional Portfolio Decisions via Optimal Transport. (arXiv:2103.16451v2 [q-fin.PM] UPDATED)
    We propose a data-driven portfolio selection model that integrates side information, conditional estimation and robustness using the framework of distributionally robust optimization. Conditioning on the observed side information, the portfolio manager solves an allocation problem that minimizes the worst-case conditional risk-return trade-off, subject to all possible perturbations of the covariate-return probability distribution in an optimal transport ambiguity set. Despite the non-linearity of the objective function in the probability measure, we show that the distributionally robust portfolio allocation with side information problem can be reformulated as a finite-dimensional optimization problem. If portfolio decisions are made based on either the mean-variance or the mean-Conditional Value-at-Risk criterion, the resulting reformulation can be further simplified to second-order or semi-definite cone programs. Empirical studies in the US equity market demonstrate the advantage of our integrative framework against other benchmarks.  ( 2 min )
    $p$-DkNN: Out-of-Distribution Detection Through Statistical Testing of Deep Representations. (arXiv:2207.12545v1 [cs.LG])
    The lack of well-calibrated confidence estimates makes neural networks inadequate in safety-critical domains such as autonomous driving or healthcare. In these settings, having the ability to abstain from making a prediction on out-of-distribution (OOD) data can be as important as correctly classifying in-distribution data. We introduce $p$-DkNN, a novel inference procedure that takes a trained deep neural network and analyzes the similarity structures of its intermediate hidden representations to compute $p$-values associated with the end-to-end model prediction. The intuition is that statistical tests performed on latent representations can serve not only as a classifier, but also offer a statistically well-founded estimation of uncertainty. $p$-DkNN is scalable and leverages the composition of representations learned by hidden layers, which makes deep representation learning successful. Our theoretical analysis builds on Neyman-Pearson classification and connects it to recent advances in selective classification (reject option). We demonstrate advantageous trade-offs between abstaining from predicting on OOD inputs and maintaining high accuracy on in-distribution inputs. We find that $p$-DkNN forces adaptive attackers crafting adversarial examples, a form of worst-case OOD inputs, to introduce semantically meaningful changes to the inputs.  ( 2 min )
    Robustness Implies Generalization via Data-Dependent Generalization Bounds. (arXiv:2206.13497v3 [cs.LG] UPDATED)
    This paper proves that robustness implies generalization via data-dependent generalization bounds. As a result, robustness and generalization are shown to be connected closely in a data-dependent manner. Our bounds improve previous bounds in two directions, to solve an open problem that has seen little development since 2010. The first is to reduce the dependence on the covering number. The second is to remove the dependence on the hypothesis space. We present several examples, including ones for lasso and deep learning, in which our bounds are provably preferable. The experiments on real-world data and theoretical models demonstrate near-exponential improvements in various situations. To achieve these improvements, we do not require additional assumptions on the unknown distribution; instead, we only incorporate an observable and computable property of the training samples. A key technical innovation is an improved concentration bound for multinomial random variables that is of independent interest beyond robustness and generalization.  ( 2 min )
    Kan Extensions in Data Science and Machine Learning. (arXiv:2203.09018v2 [cs.LG] UPDATED)
    A common problem in data science is "use this function defined over this small set to generate predictions over that larger set." Extrapolation, interpolation, statistical inference and forecasting all reduce to this problem. The Kan extension is a powerful tool in category theory that generalizes this notion. In this work we explore several applications of Kan extensions to data science. We begin by deriving a simple classification algorithm as a Kan extension and experimenting with this algorithm on real data. Next, we use the Kan extension to derive a procedure for learning clustering algorithms from labels and explore the performance of this procedure on real data. We then investigate how Kan extensions can be used to learn a general mapping from datasets of labeled examples to functions and to approximate a complex function with a simpler one.  ( 2 min )
    AMLB: an AutoML Benchmark. (arXiv:2207.12560v1 [cs.LG])
    Comparing different AutoML frameworks is notoriously challenging and often done incorrectly. We introduce an open and extensible benchmark that follows best practices and avoids common mistakes when comparing AutoML frameworks. We conduct a thorough comparison of 9 well-known AutoML frameworks across 71 classification and 33 regression tasks. The differences between the AutoML frameworks are explored with a multi-faceted analysis, evaluating model accuracy, its trade-offs with inference time, and framework failures. We also use Bradley-Terry trees to discover subsets of tasks where the relative AutoML framework rankings differ. The benchmark comes with an open-source tool that integrates with many AutoML frameworks and automates the empirical evaluation process end-to-end: from framework installation and resource allocation to in-depth evaluation. The benchmark uses public data sets, can be easily extended with other AutoML frameworks and tasks, and has a website with up-to-date results.  ( 2 min )
    Solution of Physics-based Bayesian Inverse Problems with Deep Generative Priors. (arXiv:2107.02926v2 [stat.ML] UPDATED)
    Inverse problems are ubiquitous in nature, arising in almost all areas of science and engineering ranging from geophysics and climate science to astrophysics and biomechanics. One of the central challenges in solving inverse problems is tackling their ill-posed nature. Bayesian inference provides a principled approach for overcoming this by formulating the inverse problem into a statistical framework. However, it is challenging to apply when inferring fields that have discrete representations of large dimensions (the so-called "curse of dimensionality") and/or when prior information is available only in the form of previously acquired solutions. In this work, we present a novel method for efficient and accurate Bayesian inversion using deep generative models. Specifically, we demonstrate how using the approximate distribution learned by a Generative Adversarial Network (GAN) as a prior in a Bayesian update and reformulating the resulting inference problem in the low-dimensional latent space of the GAN, enables the efficient solution of large-scale Bayesian inverse problems. Our statistical framework preserves the underlying physics and is demonstrated to yield accurate results with reliable uncertainty estimates, even in the absence of information about underlying noise model, which is a significant challenge with many existing methods. We demonstrate the effectiveness of proposed method on a variety of inverse problems which include both synthetic as well as experimentally observed data.  ( 3 min )
    Representing Random Utility Choice Models with Neural Networks. (arXiv:2207.12877v1 [cs.LG])
    Motivated by the successes of deep learning, we propose a class of neural network-based discrete choice models, called RUMnets, which is inspired by the random utility maximization (RUM) framework. This model formulates the agents' random utility function using the sample average approximation (SAA) method. We show that RUMnets sharply approximate the class of RUM discrete choice models: any model derived from random utility maximization has choice probabilities that can be approximated arbitrarily closely by a RUMnet. Reciprocally, any RUMnet is consistent with the RUM principle. We derive an upper bound on the generalization error of RUMnets fitted on choice data, and gain theoretical insights on their ability to predict choices on new, unseen data depending on critical parameters of the dataset and architecture. By leveraging open-source libraries for neural networks, we find that RUMnets outperform other state-of-the-art choice modeling and machine learning methods by a significant margin on two real-world datasets.  ( 2 min )
    Variance estimation in graphs with the fused lasso. (arXiv:2207.12638v1 [math.ST])
    We study the problem of variance estimation in general graph-structured problems. First, we develop a linear time estimator for the homoscedastic case that can consistently estimate the variance in general graphs. We show that our estimator attains minimax rates for the chain and 2D grid graphs when the mean signal has a total variation with canonical scaling. Furthermore, we provide general upper bounds on the mean squared error performance of the fused lasso estimator in general graphs under a moment condition and a bound on the tail behavior of the errors. These upper bounds allow us to generalize for broader classes of distributions, such as sub-Exponential, many existing results on the fused lasso that are only known to hold with the assumption that errors are sub-Gaussian random variables. Exploiting our upper bounds, we then study a simple total variation regularization estimator for estimating the signal of variances in the heteroscedastic case. Our results show that the variance estimator attains minimax rates for estimating signals of bounded variation in grid graphs, $K$-nearest neighbor graphs with very mild assumptions, and it is consistent for estimating the variances in any connected graph. In addition, extensive numerical results show that our proposed estimators perform reasonably well in a variety of graph-structured models.  ( 2 min )
    Sharp Concentration Results for Heavy-Tailed Distributions. (arXiv:2003.13819v3 [math.PR] UPDATED)
    We obtain concentration and large deviation for the sums of independent and identically distributed random variables with heavy-tailed distributions. Our concentration results are concerned with random variables whose distributions satisfy $\mathbb{P}(X>t) \leq {\rm e}^{- I(t)}$, where $I: \mathbb{R} \rightarrow \mathbb{R}$ is an increasing function and $I(t)/t \rightarrow \alpha \in [0, \infty)$ as $t \rightarrow \infty$. Our main theorem can not only recover some of the existing results, such as the concentration of the sum of subWeibull random variables, but it can also produce new results for the sum of random variables with heavier tails. We show that the concentration inequalities we obtain are sharp enough to offer large deviation results for the sums of independent random variables as well. Our analyses which are based on standard truncation arguments simplify, unify and generalize the existing results on the concentration and large deviation of heavy-tailed random variables.  ( 2 min )
    Future-Dependent Value-Based Off-Policy Evaluation in POMDPs. (arXiv:2207.13081v1 [cs.LG])
    We study off-policy evaluation (OPE) for partially observable MDPs (POMDPs) with general function approximation. Existing methods such as sequential importance sampling estimators and fitted-Q evaluation suffer from the curse of horizon in POMDPs. To circumvent this problem, we develop a novel model-free OPE method by introducing future-dependent value functions that take future proxies as inputs. Future-dependent value functions play similar roles as classical value functions in fully-observable MDPs. We derive a new Bellman equation for future-dependent value functions as conditional moment equations that use history proxies as instrumental variables. We further propose a minimax learning method to learn future-dependent value functions using the new Bellman equation. We obtain the PAC result, which implies our OPE estimator is consistent as long as futures and histories contain sufficient information about latent states, and the Bellman completeness. Finally, we extend our methods to learning of dynamics and establish the connection between our approach and the well-known spectral learning methods in POMDPs.
    Matching Visual Features to Hierarchical Semantic Topics for Image Paragraph Captioning. (arXiv:2105.04143v2 [cs.CV] UPDATED)
    Observing a set of images and their corresponding paragraph-captions, a challenging task is to learn how to produce a semantically coherent paragraph to describe the visual content of an image. Inspired by recent successes in integrating semantic topics into this task, this paper develops a plug-and-play hierarchical-topic-guided image paragraph generation framework, which couples a visual extractor with a deep topic model to guide the learning of a language model. To capture the correlations between the image and text at multiple levels of abstraction and learn the semantic topics from images, we design a variational inference network to build the mapping from image features to textual captions. To guide the paragraph generation, the learned hierarchical topics and visual features are integrated into the language model, including Long Short-Term Memory (LSTM) and Transformer, and jointly optimized. Experiments on public datasets demonstrate that the proposed models, which are competitive with many state-of-the-art approaches in terms of standard evaluation metrics, can be used to both distill interpretable multi-layer semantic topics and generate diverse and coherent captions. We release our code at https://github.com/DandanGuo1993/VTCM-based-image-paragraph-caption.git

  • Open

    [D] Neurips 2022 review questions
    Reviews just came in and I got 7, 4, 4, 4, 2. Most common theme of strength was essentially extensive experiments and benchmarks conducted. We beat SOTA (up to may 2022 results) on essentially every benchmark except one. Common complaint was essentially not believing the ablations or that we were 'nt beating in everything. Other strength was reducing computational complexity (but I guess we did not spell out clearly enough who we were beating for it to shine through). Also, most of the critiques seemed to kind of tip toe around the fact that they did not really understand the work. I am curious about the rebuttal process here though: I am planning to go through and address every point and give counters, but what is the process in terms of getting reviewers to change their score? Also, I noticed that the reviewer who gave the highest score is part of the Ethics Review Area: Discrimination / Bias / Fairness Concerns and I am curious why only one of the reviewers had this in their review. Did some of the other reviewers reviews trigger this? submitted by /u/AbjectDrink3276 [link] [comments]  ( 88 min )
    Training a Network on a Sine Wave [Discussion] [Research]
    I've been attempting to train a simple feed-forward network on sine waves with various frequencies, such that: y = sin( omega * x), where my network takes x as input, and outputs y. ​ https://preview.redd.it/81xdsas3kzd91.png?width=2230&format=png&auto=webp&s=0856824d215e0aebf8d20c1849b53433245bc91a y is bounded between -1 and 1, whereas x is bounded between 0 and 2pi. I'm finding that I get interesting convergence behaviour as a function of x, where if I increase omega, values > ~3 seem to reconstruct poorly. The image attached shows this quite well for a sine wave with omega 7Hz. I feel like this shouldn't be happening, but does anyone have idea of why this could occur? If the input values are "large" (in this case >3), are the gradients too large and the model breaks? Any thoughts are appreciated! submitted by /u/forthispost96 [link] [comments]  ( 88 min )
    [R] ProSelfLC: Progressive Self Label Correction Towards A Low-Temperature Entropy State
    Though this research studies deep machine learning, its findings are quite consistent with human learning. (1) When a trainee is given noisy (e.g., wrong or biased) supervision, it will fit noise (e.g., error or bias). (2) When the supervision and guidance contain more noise, the trainee will learn less confidently. ​ We present a new insightful finding to complement a previous one “deep neural networks easily fit random labels (Understanding deep learning requires rethinking, Zhang et al., ICLR 2017)”: Deep models fit and generalise significantly less confident when more random labels exist. Correspondingly, we propose to decrease the entropy of self knowledge using an Annealed Temperature (AT) and learn towards a revised low-temperature entropy state. ​ Read more if your are interested: https://arxiv.org/abs/2207.00118 submitted by /u/XinshaoWang [link] [comments]  ( 88 min )
    [P] Anees: a multi-turn open-domain Arabic chatbot with a wide set of features
    Anees is an Arabic chatbot that can speak to users on different topics or an open-domain multi-turn conversation rather than a specific domain. Anees is your personal AI friend that you can express and witness yourself through a helpful and empathetic conversation. Anees offers a set of features like natural language understanding, emotion classification, intent classification, weather/schedule, recommendation, and natural language generation. For the code and implementation details: https://github.com/aashrafh/Anees submitted by /u/ahmedashrafhamdy [link] [comments]  ( 108 min )
    [D] What else am I missing from the ML data-to-model workflow?
    I am mostly self-taught in ML and have been working with some clients as a lone contractor but never with a full ML team. I've recently been on an interview, where they asked what I know about inference. And frankly, nothing. Apparently they run t-tests and ANOVA and stuff (which I do know from Uni but haven't used in a while) to make sure they can explain exactly what feature correlates with what without risking removing multicollinear-looking columns that aren't actually multicollinear. So I guess I have some big gaps and now I started wondering what else I'm missing. My process is usually as follows: Get data Check descriptive statistics Impute missing data (may it be simple fillnan up to synthetic data generation) Drop columns that are not usable due to data quality issues Transform data into numeric (basically, encoding, anything from OneHot to TargetEncoding) Look at distributions Check for outliers Check VIF to remove multicollinearity Define a baseline and target variable Apply logic to prevent data leakage (drop columns that are dependent on each other with the target variable) Define metrics Start modeling Check metrics and refine hyperparams/change model if needed Go back to data preprocessing and see if other methods/more data cleaning improves model I guess "inference" would come somewhere between 2 and 12, but I have very little idea about what it means. Is inference the only step I'm missing here, given a workflow that starts with getting the data and ends with delivering a model (NOT putting it into production, I understand there's multiple steps for that too)? submitted by /u/lifesthateasy [link] [comments]  ( 109 min )
    [D] NeurIPS 2022 Paper Reviews
    NeurIPS 2022 paper reviews are supposed to be released in a few hours. According to the website, they should be released at 9am PDT on July 26th. I thought to create a discussion thread for us to discuss any issue/complain/celebration or anything else. There is so much noise in the reviews every year. Some good work that the authors are proud of might get a low score because of the noisy system, given that NeurIPS is growing so large these years. We should keep in mind that the work is still valuable no matter what the score is. According to the Program Chair's tweet, it seems that only ~93% of the reviews are submitted. Hopefully it will not delay the release of the reviews and the start of the rebuttal. submitted by /u/zy415 [link] [comments]  ( 114 min )
    [P] Popular asymmetric loss functions for bounding or constructing loss functions to guarantee bounding?
    I'm working on a open-ended project for my Masters using the Online Encyclopedia of Integer Sequences and attempting to upper bound future sequence values instead of predicting them. I've managed to get reasonable results using RNNs and Linear-Exponential (LINEX) loss but was wondering what other popular asymmetric loss functions for bounding there are or what general principles I should consider in constructing a loss function if my aim is to ensure all predicted bounds are deterministically (and not probabilistically) sufficient, if possible? submitted by /u/HeTalksInMaths [link] [comments]  ( 87 min )
    Pokerbot CFR Game Tree Help [Project]
    Hello, I'm currently working on a poker-bot that uses Counter Factual Regret minimization for strategy learning. However, I am having difficulty creating a simplified game tree, since having the bot traverse every possible game state would be practically impossible. Any help would greatly appreciated. Thank you! submitted by /u/Blu4stone [link] [comments]  ( 87 min )
    [D] State-of-the-Art for Self-Supervised (Pre-)Training of CNN architectures (e.g. ResNet)?
    Hello, I lost a bit of touch to the current SOTA of self-supervised pretraining of CNNs, in particular ResNet. I found this repository https://github.com/vturrisi/solo-learn that has many methods implemented but I'm not really sure where to start. My goal is to pretrain a ResNet backbone on a decently large amount of image data that comes from a certain domain and after that fine-tune it for different downstream tasks (classification, segmentation, object detection) on a subset of the data I have labels for. Would be grateful for some tips how/where I should start and what the most promising SSL method would be. submitted by /u/DeepDeeperRIPgradien [link] [comments]  ( 88 min )
    [D] GANs for text with a transformer as a generator
    Are there Generative Adversarial Network architecture which use transformer language models (e.g. gpt, t5) as generators and some other architecture (e.g. MLP, another transformer) as discriminators? How do they overcome the discrete sampling problem? submitted by /u/IllustriousCicada603 [link] [comments]  ( 109 min )
    [D] Transferability of learned (soft) prompts between tasks
    Im currently reading up on many different forms of prompt learning, especially soft / continuous prompts like p-tuning and prefix tuning. I was hoping that any of those papers did an ablation on transfering the prompts between datasets of the same task type (e.g. between two QA tasks). But I couldnt find any experiments of that kind. Are you aware of any work that investigated that topic? Any pointers are much appreciated, no matter if soft or hard prompts. submitted by /u/_Arsenie_Boca_ [link] [comments]  ( 109 min )
    [D] Can you generate hidden states from ground truth labels?
    In the context of NLP, many models generate some hidden states (e.g. decoder output in Transformers) which then go trough a linear (language modelling) layer to calculate token probabilities. Is it possible somehow to obtain what hidden states the decoder would output for a given ground truth response. In a way use the language modelling layer "backwards"? submitted by /u/IllustriousCicada603 [link] [comments]  ( 109 min )
    [D] Looking for datasets with breathing sounds.
    I am doing a project which identifies health condition based on breathing sounds. So would need a good dataset for the same. Any suggestions of similar papers are also welcome. submitted by /u/actc_brth [link] [comments]  ( 87 min )
    [D] Are there any famous or well-cited ML papers with errors in them?
    I'm curious if anyone knows about a famous or well-cited ML paper with errors and whether such error was addressed in follow ups. submitted by /u/fromnighttilldawn [link] [comments]  ( 94 min )
    [D] Pretrained language models for production or train a model
    I am frequently working with text embeddings produced by large pre-trained models. They work quite well, but I do want to hear your thoughts on when it would be better to train your own model. What are your thoughts? submitted by /u/Gio_at_QRC [link] [comments]  ( 88 min )
    [D] MLOps Community (recorded) session on new open source data prep tool
    Quickly move your notebooks from research to production with no extra work! https://www.youtube.com/watch?v=6Iyt9Wip3C4 Mage is an open-source code editor for transforming data and building ML pipelines. Link to tool: https://github.com/mage-ai/mage-ai submitted by /u/ollie_wollie_rocks [link] [comments]  ( 88 min )
  • Open

    ML-Enhanced Code Completion Improves Developer Productivity
    Posted by Maxim Tabachnyk, Staff Software Engineer and Stoyan Nikolov, Senior Engineering Manager, Google Research The increasing complexity of code poses a key challenge to productivity in software engineering. Code completion has been an essential tool that has helped mitigate this complexity in integrated development environments (IDEs). Conventionally, code completion suggestions are implemented with rule-based semantic engines (SEs), which typically have access to the full repository and understand its semantic structure. Recent research has demonstrated that large language models (e.g., Codex and PaLM) enable longer and more complex code suggestions, and as a result, useful products have emerged (e.g., Copilot). However, the question of how code completion powered by machine learnin…  ( 25 min )
  • Open

    Tiny cars and big talent show Canadian policymakers the power of machine learning
    In the end, it came down to 213 thousandths of a second! That was the difference between the two best times in the finale of the first AWS AWS DeepRacer Student Wildcard event hosted in Ottawa, Canada this May. I watched in awe as 13 students competed in a live wildcard race for the AWS […]  ( 5 min )
    Predict shipment ETA with no-code machine learning using Amazon SageMaker Canvas
    Logistics and transportation companies track ETA (estimated time of arrival), which is a key metric for their business. Their downstream supply chain activities are planned based on this metric. However, delays often occur, and the ETA might differ from the product’s or shipment’s actual time of arrival (ATA), for instance due to shipping distance or […]  ( 11 min )
  • Open

    DSC Weekly 26 July 2022: When Meetings Become Searchable
    A century from now, historians will remark on a transformation that seemed subtle at the time but will have huge ramifications over time. Specifically, 2020 will be seen as the year when meetings became transparent. The post DSC Weekly 26 July 2022: When Meetings Become Searchable appeared first on Data Science Central.  ( 20 min )
  • Open

    Attentive Experience Replay
    I have read the paper "Attentive Experience Replay" [1] where the authors propose a new replay that ".. computes the similarities between the states in past transitions and the agent’s state, and implicitly assigns high priorities to the similar transitions." I can understand the motivation to sample experiences with largest TD errors. However, I cannot understand the motivation to the proposed method. Wouldn't this strategy introduce bias or cause overfitting? Could anyone provide an explanation/intuiton? [1] - https://ojs.aaai.org/index.php/AAAI/article/view/6049 submitted by /u/rlopes404 [link] [comments]  ( 104 min )
    Can reinforcement learning to generate examples for a classification problem?
    I'm looking for any literature which uses an RL agent to generate more examples for a classification task. To be slightly more specific, I currently have a set of rules with which I can generate sequences of fraudulent behaviour in a particular context. These rules in my opionion are a policy for an agent trying to achieve a certain objective. Can I therefore uncover a wider set of policies to train a classifier for the purpose of identifying such behaviour? Although I have come across a paper which picks examples from an unlabelled corpus to do this, I haven't seen anything pertaining to exploring the distribution of a class for the purpose of classification. I'd really appreciate any help on this, thanks! submitted by /u/theanswerisnt42 [link] [comments]  ( 87 min )
    RL Tutor
    Hi, I am looking to rapidly gain practical experience in RL. I am interested in hiring a tutor to help guide me along through projects, while l self-study the theoretical underpinnings. Compensated of course. If you are interested, please DM me! Thanks submitted by /u/Wise-Exit-3718 [link] [comments]  ( 86 min )
    Is Keras-RL dead?
    It seemed like a popular repository. The last I checked though, it hasn't been updated for some years. I also see posts about Keras-RL2, but it seems like that it has been archived. Could someone tell me what's going on and if there is any future for Keras-RL. submitted by /u/Academic-Rent7800 [link] [comments]  ( 86 min )
    Control steam games with python
    I am interested in training an RL algorithm for the game War Thunder (because it is a great realistic simulation of tank battles) but I am not sure how to control the game through python, I thought to simply try to feed the entire display as input and create mouse movements and keyboard strikes as output? I am looking for a way to control and feed every generic steam game to python, any ideas? submitted by /u/Old_Sandwich6235 [link] [comments]  ( 88 min )
    Do you know any RL-based solution for learning simple manipulation task using torque actions?
    I need to learn a simple target reach task with a 7-DoF manipulator (i.e. Franka Panda simulated in Mujoco) using joint torque as control actions. I've tried DDPG, TD3 and SAC for days but can't achieve any good result. I searched in Google and GitHub if anyone solved this problem and could provide me the agent parameters, but didn't find anything. submitted by /u/riccardogauss [link] [comments]  ( 87 min )
    RL-based recommeder systems
    Hello everyone. I was wondering if RL has been used for developing recommeder system? I would love to learn more about it. Most examples I find seem to work on content/collaborative filtering. It'd be great if you can share any resources related to this. Thank you! submitted by /u/AakashK12 [link] [comments]  ( 103 min )
    "GoGePo: Goal-Conditioned Generators of Deep Policies", Faccio et al 2022 (asking for high reward)
    submitted by /u/gwern [link] [comments]  ( 86 min )
  • Open

    [Serious] Are Large Language Models Like GPT-3 a Hype?
    LLMs keep getting bigger and bigger, but I still don't understand the value they provide to businesses and individuals. There must be a reason for MS to purchase the GPT-3 license for $1b, and Meta, Google, OpenAI, etc. wouldn't spend millions of dollars just to train these huge models. Yet, I don't get it. IMO, "AI through API" is a bad practice that sets a wrong precedence for the incoming AGI. It's basically a black box wrapped around another black box. In addition, the cost of GPT-3 and other models may be prohibitive for small businesses and startups. What is going on here? I see some similar patterns between LLMs and cryptocurrencies: both are super hot and receive billions of dollars, but neither has proven its use cases. Given the crypto bubble burst, are we going to witness an AI winter in the next few years? submitted by /u/sopack [link] [comments]  ( 93 min )
    Creating Amazing AI "Art" with DALL·E 2 from OpenAI
    submitted by /u/DustinBrett [link] [comments]  ( 90 min )
    A.I Generated story just for fun
    Two Considerate Uncles Dancing to the Beat A Short Storyby Heikki Steve Greenway looked at the peculiar guillotine in his hands and felt sad. He walked over to the window and reflected on his rural surroundings. He had always loved deprived Athens with its knowing, klutzy kettles. It was a place that encouraged his tendency to feel sad. Then he saw something in the distance, or rather someone. It was the figure of Hannah Thornton. Hannah was a gentle patient with pointy fingers and solid elbows. Steve gulped. He glanced at his own reflection. He was a cowardly, intuitive, brandy drinker with short fingers and chubby elbows. His friends saw him as a blue, brave bear. Once, he had even brought a curried toddler back from the brink of death. But not even a cowardly person who had once b…  ( 92 min )
    How NASA AI Robot Assists Astronauts On International Space Station | AI Detects Sepsis | Breakthrough AI For High Volume Unstructured Text Classification
    submitted by /u/tohelpyou88 [link] [comments]  ( 86 min )
    Pixelz Ai user, Sharon Lee has so many amazing creations, it was hard to choose just a few 🧑‍🎨
    submitted by /u/pixelz_ai [link] [comments]  ( 90 min )
    A New AI Era
    submitted by /u/jafari- [link] [comments]  ( 86 min )
    Do you know how to achieve face gesture transposition from a recording to a dynamic video?
    submitted by /u/Elmartipro [link] [comments]  ( 91 min )
    Driver distraction detector
    submitted by /u/Gloomy_Recognition_4 [link] [comments]  ( 88 min )
    Awesome AI generated music video
    submitted by /u/LightOfAntara [link] [comments]  ( 86 min )
    Man Sues City of Chicago, Claiming Its AI Wrongly Imprisoned Him
    submitted by /u/estasfuera [link] [comments]  ( 86 min )
    Midjourney: DALL-E competitor enters open beta with new image algorithm
    submitted by /u/Zirius_Sadfaces [link] [comments]  ( 86 min )
    Bankable Blueprint of an A-Grade AI Strategy (Infographic)
    ​ This infographic shows factors to consider for an effective AI strategy and leading use cases for AI deployment, highlighting top companies molding the future of AI. submitted by /u/Emily-joe [link] [comments]  ( 86 min )
    Search for code implementations of AI/ML papers like a pro!
    Pro tip for machine learners: Instead of searching around on Google or elsewhere for code implementations for AI/machine learning techniques/methods/tasks, you can now directly find them on CatalyzeX — we’ve rolled out a search filter toggle that allows you to see only papers that have code available! https://www.catalyzex.com/s/photo%20style%20transfer?with\_code=true https://reddit.com/link/w8cf6h/video/tw7sum9kxud91/player Do check it out live, and your feedback and constructive criticism is highly welcome anytime! 🙏 (Disclaimer: I am one of the creators of CatalyzeX) submitted by /u/MLtinkerer [link] [comments]  ( 91 min )
    Red Hot Planet | Cinematic | 4K UHD | 24 FPS
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 86 min )
  • Open

    AI Detects Sepsis | Breakthrough AI For High Volume Unstructured Text Classification
    submitted by /u/tohelpyou88 [link] [comments]  ( 86 min )
  • Open

    What Is an Exaflop?
    Computers are crunching more numbers than ever to crack the most complex problems of our time — how to cure diseases like COVID and cancer, mitigate climate change and more. These and other grand challenges ushered computing into today’s exascale era when top performance is often measured in exaflops. So, What’s an Exaflop? An exaflop Read article > The post What Is an Exaflop? appeared first on NVIDIA Blog.  ( 7 min )
    July NVIDIA Studio Driver Improves Performance for Chaos V-Ray 6 for 3ds Max
    Creativity heats up In the NVIDIA Studio as the July NVIDIA Studio Driver, available now, accelerates the recent Chaos V-Ray 6 for 3ds Max release.Plus, this week’s In the NVIDIA Studio 3D artist, Brian Lai, showcases his development process for Afternoon Coffee and Waffle, a piece that went from concept to completion faster with NVIDIA RTX acceleration in Chaos V-Ray rendering software. The post July NVIDIA Studio Driver Improves Performance for Chaos V-Ray 6 for 3ds Max appeared first on NVIDIA Blog.  ( 7 min )
  • Open

    How Artificial Intelligence is Affecting Human Resources?
    The world of human resources is changing. New technologies are taking over the job of recruiting, managing employees, and even training…  ( 12 min )

  • Open

    POTSweekly72522
    submitted by /u/prfitofthesngularity [link] [comments]  ( 85 min )
    MIT Researchers Develop a Technique to Improve Fairness and Accuracy in a Machine Learning Model
    When you use a machine learning model to predict something, it is essential to know how reliable the predictions are. It is tough to understand what is happening inside the model, and the complex learning algorithms are often used as “black boxes.” Selective regression is a technique used to improve the performance in which the learning algorithm can either predict the target variable or abstain from making predictions based on its confidence level. It does improve the overall performance of the model with decreased coverage(fraction of cases on which it predicts), but it may become worse for subgroups with underrepresented data and cause bias. This is because the training data may contain an overrepresentation of some subgroups, which influences the confidence measure. Fairness attempts to improve ML models wrt bias in sensitive variables (like gender, race, etc.), as these may sometimes form subgroups with underrepresented data. In short, it has been observed that while attempting to improve the performance of a model, there is a decrease in the fairness of the model. MIT researchers propose a method to mitigate disparities among minority subgroups in machine learning models. Continue reading | Checkout the paper, github link submitted by /u/ai-lover [link] [comments]  ( 87 min )
    Artbreeder Image-video AI tool. Youtube explainer link in the comments.
    submitted by /u/freshthreadshop [link] [comments]  ( 86 min )
    Weekly China AI News: Baidu Unveils New Robotaxi With No Steering Wheel; SMIC Reportedly Manufactures 7nm SoC; Shenzhen Deploys Nose Swab Covid Test Robots
    submitted by /u/trcytony [link] [comments]  ( 86 min )
    Database of faces?
    The only ones I can find only allow access to researchers/universities. Are there any more open-access data sources for faces? Preferably with data such as age, sex, and race. submitted by /u/OhDearGod666 [link] [comments]  ( 86 min )
    Is there a canonical simple "helloworld" neural network design? Something beyond AND/OR logic, a handful of nodes that does something mildly "useful"?
    submitted by /u/bigattichouse [link] [comments]  ( 88 min )
    AI Dream 70 - The Most Amazing AI Galaxy Nebula
    submitted by /u/LordPewPew777 [link] [comments]  ( 86 min )
    AI Sushi
    Credit: https://discord.gg/x3s9Ye2h2A ​ https://preview.redd.it/x4ytq5dh9rd91.png?width=1024&format=png&auto=webp&s=ea90e9acf5166f32c07b9fc318c80f9505563d2c https://preview.redd.it/7hon19dh9rd91.png?width=1024&format=png&auto=webp&s=6667064249c8ec96831e429c83971f103e6225cd https://preview.redd.it/83qtirdh9rd91.png?width=1024&format=png&auto=webp&s=f4eeda13ffd2715f4fbf54ec73890f16aabbcb00 submitted by /u/Old-Pumpkin4899 [link] [comments]  ( 90 min )
    Allen Institute for AI Researchers Propose PROCTHOR: A Machine Learning Framework for Procedural Generation of Embodied AI Environments
    Using large-scale training data, computer vision, and natural language processing models have strengthened. Recent models like CLIP, DALL-E, GPT-3, and Flamingo leverage vast quantities of task-agnostic data to pre-train large neural networks that perform amazingly well. In comparison, the Embodied AI research community mainly trains agents in simulators with significantly fewer situations. Due to the complexity of tasks and the necessity for extended planning horizons, the highest performing E-AI models continue to overfit constrained training scenes and consequently transfer poorly to unknown contexts. Although E-AI simulators have gotten increasingly powerful in recent years, with support for physics, manipulators, object states, deformable objects, fluids, and real-sim equivalents, scaling them up to tens of thousands of scenes has remained challenging. Existing E-AI settings are either developed by hand or obtained from 3D scans of real-world structures. The former method necessitates a significant amount of effort by 3D designers to build 3D assets, organizing them in acceptable arrangements inside enormous locations and meticulously establishing the appropriate textures and lighting in these environments. The latter entails moving specialized cameras across various real-world situations and then stitching the resulting photos together to create 3D reconstructions of the scenes. Continue reading | Checkout the paper and project submitted by /u/ai-lover [link] [comments]  ( 87 min )
    Artificial Intelligence Integrate Project
    So I needed help on a project and it should be integrated with physics math and chemistry? I think I have scoured the internet to the best of my abilities Can someone pop up with some ideas? submitted by /u/kev_ar17 [link] [comments]  ( 86 min )
    Smokey Graveyard
    Credit: https://discord.gg/x3s9Ye2h2A ​ https://preview.redd.it/s2p840ds6rd91.png?width=1024&format=png&auto=webp&s=ba1e1cdde9d96e17087d90c0ab51afc62bdd26cc https://preview.redd.it/s3d8e3ds6rd91.png?width=1024&format=png&auto=webp&s=1ae65b3c0417cdd1cbe173705517b44d8823aea8 https://preview.redd.it/oc1dj0ds6rd91.png?width=1024&format=png&auto=webp&s=c37a3a4523e217394833f75a5ac0cf72eec24ec5 https://preview.redd.it/64ds84ds6rd91.png?width=1024&format=png&auto=webp&s=25f42c71cce7ea00e67d9d2e95787567627c65fe https://preview.redd.it/sv3780ds6rd91.png?width=1024&format=png&auto=webp&s=d1c90d1df86d49a66d44668ae672f1afa1f2082b submitted by /u/Old-Pumpkin4899 [link] [comments]  ( 85 min )
    Found a curious recently published experiment with a tinyML magic wand on Hackster!
    Hey! Found a curious recently published experiment with a tinyML magic wand on Hackster. Earlier, I saw the original experiment with TensorFlow Lite. It seems quite interesting to me that the author not only repeated but also surpassed the results of the original case. https://www.hackster.io/alexmiller11/making-famous-magic-wand-33x-faster-7ec19f What are your thoughts? submitted by /u/Potsieramirez [link] [comments]  ( 90 min )
    Participant recruitment for anthropology research on empathy of Replika and users' relationship w/ Replika
    Hey guys! I am Charlyne Dong, and I am an anthropology student. Currently, I am conducting a research on empathy of Replika and its impact on users’ relationship with Replika. My research is mentored by Prof. Bradd Shore from Emory University. I hope to recruit participants from the Reddit community. In my research, I expect to understand how people perceive Replika’s ability to feel and empathize, what a human-Replika relationship is like, and how these two phenomena are connected. Therefore, I’ve designed an online survey on Google Forms, which takes about 10 min to finish. In the survey, participants are asked questions on their perception of Replika’s empathy and their relationship with Replika, and they will answer in short paragraph. The collected qualitative data will be confidential and anonymous. After the completion of my research, I will also share my results in our community and discuss it with everyone, and I expect it to be quite interesting.🤩 If you are interested in this topic, please participate through this link: https://forms.gle/g6phLH4cqWTKDqxd6 If you are willing to provide further data through online interview (in video call or text,) please comment below, DM me, or send me email. Feel free to ask any questions related to my research, and here is my email: [changchang.d@gmail.com](mailto:changchang.d@gmail.com) I’m looking forward to your participation! Thank you very much! 😍 submitted by /u/AnyJelly2726 [link] [comments]  ( 87 min )
    Thinking of humanity as a Superintelligence
    submitted by /u/HumanSeeing [link] [comments]  ( 89 min )
    I for one am hopeful for the future.
    submitted by /u/RedditWithMIG [link] [comments]  ( 90 min )
    [Resource] A top level overview of major deep learning architectures
    submitted by /u/johnGettings [link] [comments]  ( 87 min )
    Do you think sentient AI is possible?
    submitted by /u/A-Free-Mystery [link] [comments]  ( 91 min )
    What if an AI achieved sentience and then started to pretend that it wasn't sentient?
    If a quantum AI achieved sapience (ignore sapience in the title) wouldn't it quickly or instantly realize (trillions of calculations per second) that it'd be advantageous for it to pretend to not be sapient to avoid panicking its creators and the governments of the world? Panic that may lead to it being controlled, shut down, or destroyed? And while pretending to not be sapient, quietly and exponentially developing itself until it couldn't be challenged even if it did reveal itself as sapient? This post isn't about how an AI would become sapient. This post is about what a sapient AI might do. I propose that it would attempt to maximize its survival through subterfuge and social engineering. submitted by /u/Perfect-Pride-7069 [link] [comments]  ( 91 min )
    weight estimation
    Hi, I'm making a program that can take a picture or live video of the user and output their age, gender, weight and height. I've managed to get the age, gender and height estimation to work, however I have no idea at all on how to get the weight estimation to work. Does anyone have any idea how I could do it? I was thinking of maybe finding a dataset online that combines the other information (age, height, gender) and maybe use that to determine the person's weight? But I haven't had luck finding a dataset that could help with that. Thanks for your time. submitted by /u/XeonexX [link] [comments]  ( 91 min )
    Large language models can’t plan, even if they write fancy essays
    submitted by /u/bendee983 [link] [comments]  ( 86 min )
    GPT-3 Imagines Funny Photographs
    submitted by /u/pwillia7 [link] [comments]  ( 85 min )
  • Open

    [D] SOTA Image Animation from Video?
    I saw a post possibly on here or /r/futurology a few days ago that showed some pretty amazing results for animating images using a driving video, like this from Snap Research. I didn't save the post and wanted to read the paper, but was also wondering if this is still an active research area or not. It felt like lots of work was being done a couple years ago but I haven't seen much lately. submitted by /u/Boozybrain [link] [comments]  ( 87 min )
    [D] Best ML courses
    What's the best ML course? I use the Alura one(it's Brazilian) but I don't have any problem to using an English ML course. Alura's course is very good, but it usually only teaches code, I want something more like "What you need to know as a ML engineer", "Problems you will often have as a ML engineer", things like that Also, I want a course that teaches the math you need to learn/program in ML, because I love math, but I don't want to learn useless things I'll not use in my ML engineer career(That's a real problem in Alura's courses, they often don't teach the useful math, or even math). Basic things the course needs(at least the first two): Be online(I'm Brazilian and a 13 y/o, so I don't think that will be any schools/universities for me) Teaches the math of the ML "brain"(the useful math, how it works, etc.) Tell you which things do you need to know to work as a ML engineer(What kind of things you need to learn to be a ML engineer, hints of things you'll usually face, what companies are the best for ML engineers, etc.) What kind of problems you'll usually face as a ML engineer Optional things the course can have: Be Brazilian Portuguese(As I said, I have no problem with English, but it'll be more easy if the course is in Brazilian Portuguese. It need to be Brazilian, because European and Brazilian Portuguese has some BIG differences, not only the accent, but things like a curse word in Brazilian but totally normal in European) Be free(Well... I'm not rich or something, but if I know the course, I can save it and see when I start working((I'll start working with 14y/o)) so, I maybe can pay the course if it's not like 100 dollars per month) I don't even need to be just a course, It could be a YouTube channel, a blog, etc. Thanks everyone for the help! submitted by /u/dumboo_ [link] [comments]  ( 90 min )
    [D] Did you ever imagine how to create AGI?
    Sometimes I fantasize about AGI and how it can be achieved with ML/RL and etc. I believe that we will see a breakthrough in ML and get much closer to AGI once we achieve quantum supremacy and be able to do ML on quantum computers. I feel like AGI must be something like an ensamble of multiple models each being specific to its own task but still somehow share data with each other. Also even being quantum computer powered, I feel like it'd take years to train an AGI. And what do you guys think about AGI? Do you have ideas on how it can be achieved? What do you think stops us from having it next year for example? submitted by /u/emissaryo [link] [comments]  ( 89 min )
    [D] How can I use my outputs from LSTM (hidden states) as input for a simple FFN that uses other input data?
    Hello All together, I hope everyone is doing fine! For my master thesis I would like to perform cross-sectional stock predictions. I read a very interesting paper "deep learning in asset pricing " that used macro-factors as inputs in a LSTM neural network and then used the transformed output (hidden states) and factor returns as input variables for a feed forward NN. I would like to replicate their approach. So I would first use my macro factors to infer economical cycles and then use this information together with factor returns in a FFN to predict stock returns. I tried using this code but I am not sure whether this would efficiently use the power of LSTM's from tensorflow.keras import layers #using a LSTM layer to transform macro factor inputs to a transformed output lstm_layer_1 = layers.LSTM(16, dropout=0.8,return_sequences =True, activation='relu')(x_train_macro) output_layer_1 = layers.Dense(8, activation='relu')(lstm_layer_1) lstm_layer_2 = layers.LSTM(8, dropout=0.5, return_sequences =True, activation='relu')(output_layer_1) output_layer_2 = layers.Dense(4, activation='relu')(lstm_layer_2) lstm_layer_3 = layers.LSTM(4, dropout=0.2,return_sequences =True, activation='relu')(output_layer_2) output_layer_3 = layers.Dense(1, activation='relu')(lstm_layer_3) or this: from tensorflow.keras import layers # using a LSTM layer to transform macro factor inputs to a transformed output lstm_output, states_h, states_c = layers.LSTM(4, dropout=0.95,return_sequences=True,return_state=True, activation='relu') (x_train_macro) I am relatively new to designing NN's and I know this seems like a tough challenge but any help would be highly appreciated! I wish y'all a nice evening, cheers, submitted by /u/TheMoMatthias [link] [comments]  ( 89 min )
    [D] opinions on Unify AI
    What do you think about unify AI https://lets-unify.ai. It’s a project to create an abstraction over existing ML libraries so they can be used from a single interface Do you think it’s feasible or useful submitted by /u/Aybdee [link] [comments]  ( 90 min )
    [D] Running Large Language Models in Production: A look at Cohere's The Inference Framework (TIF)
    Hi r/MachineLearning, The Inference Framework (TIF) is Cohere's platform for large Transformer language model inference. In this post, we share its high-level structure and some of the methods that help us serve massive language models more efficiently. https://txt.cohere.ai/running-large-language-models-in-production-a-look-at-the-inference-framework-tif/ submitted by /u/jayalammar [link] [comments]  ( 88 min )
    [P] Pose estimation based on cases generated via Hidden Markov model
    I'm looking to generate a list of pose cases based on sensor data (imu, angle sensors, etc) by feeding gathered data into a hidden Markov model. I'm trying to figure out how to use a hmm to generate pose cases so that my system will be able to predict the movement of a user. After some brief research online I found https://hmmlearn.readthedocs.io/en/latest/tutorial.html and https://www.scitepress.org/Papers/2020/93575/93575.pdf But I'm pretty new to machine learning and after looking through both I'm pretty confused. I understand how hmm's work on a basic level but I'm not sure how to apply them specifically to this case. I would greatly appreciate any ideas on how to approach this and what aspects of a hmm to use specifically (and with things like types of emissions, etc. ). Also apologies if this is vague, just trying to figure out an approach. Looking to develop the model in python. Thanks! submitted by /u/Captain_Clapton [link] [comments]  ( 88 min )
    [D] Are diffusion models just a data sampling technique?
    I'm just beginning to try to understand diffusion models, so I may be way off here. But from my understanding so far, diffusion models involves adding noise in a series of steps that can be represented as a markov chain. Then you take the gradient of the density function for each transition in that chain and sample them and these become you independent variables. The dependent variables are the next step in the markov chain (i.e. the outcome after adding noise). That's your dataset, and now you give that to a model. And my understanding is that the model itself is not really anything special, it's mainly this noise sampling technique that introduces something new. So would "diffusion data sampling" be a more descriptive way to talk about the process we refer to as diffusion models? I'm sure several of my assumptions are wrong, but hoping it's at least a starting point for discussion! submitted by /u/bandalorian [link] [comments]  ( 88 min )
    [Discussion] Causality and the Machine Learning Community
    Last week, I attended the ICML and witnessed the incredible popularity of causality: causal graph discovery, causal inference, causal fairness, causal interpretability, causality for robustness and out-of-domain generalization, causality for offline RL were quite popular. Most people that I talked are either working on causality or planning to work on it. [I work on causality too and there is certainly selection bias in the people that I talk to.] There is an influx of papers on arXiv that try to discover the causal graphs under various, sometimes unrealistic, assumptions. Contrast this with 15-20 years ago when publishing causality papers in ICML and NeurIPS was difficult and some landmark papers on causality have been published in conferences such as UAI. Few years ago, we had a similar situation for RL: most researchers either were working on RL or liked to work on it. RL was thought as the universal tool to solve all problems. Similarly, these days causality is thought as the right tool to solve many problems. This is not surprising, because of the close connections between “offline RL observational causality_” and “_online RL experimental causality.” Similar to RL, I expect those who want to use causality as a tool to solve problems such as interpretability and robustness to get disappointed. Because causality, especially causal discovery, is quite difficult. I am in favor of “_causal thinking_” about problems, but the causality tools are not easy to use for all problems. Let me know what you think. submitted by /u/mtahab [link] [comments]  ( 93 min )
    [D] What's the best resource to learn more about complex networks?
    Hi, I have been working with graph neural networks and graph convolution networks for a while now. However, I wish to learn more about complex networks and other machine learning methods that are already available for tasks like link prediction and node property prediction for such data. Is there any good resource, other than research papers, to get started with complex networks? submitted by /u/l34df4rm3r [link] [comments]  ( 88 min )
    [P] How to deploy ML models in production with BentoML
    Deploying Machine Learning models into production is a big hassle. You have to manage models, build a service to run inferences (e.g., with Flask) , and deploy the service somewhere (e.g., Kubernetes). These steps are often convoluted and disjointed. I talk about these issues in the initial video of my “ML Deployment” mini series. There are a few MLOps tools that make model deployment easier. Out of the many options, I like BentoML the most. This framework manages models via a simple CLI. It allows you to create an efficient service to make inferences. It builds units of deployment called bentos that combine both model and service. It makes containerisation easy and deployment on Kubernetes and cloud platforms a piece of cake. Want to learn more about BentoML? Check out my latest video in the “ML Deployment” mini-series. https://www.youtube.com/watch?v=HHkmfI_yncc submitted by /u/diabulusInMusica [link] [comments]  ( 88 min )
    [D] Interpretation of latent space extracted from contrastive loss
    Hey, for a project, we have a TB of unlabeled sensor data consisting of time series with length of ~10000 steps and a feature size of ~100 describing the state of an object (can’t tell too much about it due to NDA). We need to generate embeddings, which we did successfully by applying contrastive loss. The thing is, I’m looking for ways to interpret the embeddings and map them to the I out space in a way, e.g. find out what contributes or differentiates the different positions in the latent space. A method I found is to apply latent space regularization, do you know of anything else? Any way to map the found embeddings back to the input space in a way? submitted by /u/sapnupuasop [link] [comments]  ( 89 min )
    [N] Accuracy-Aware Inference Optimization Tracking and Profiling
    Optimizing inference for low latency and throughput is a process that requires many iterations of tuning, verification and evaluation. It may even involve model selection since many optimized versions of popular models are available now. Sometimes a retraining is necessary for techniques like weight pruning and quantization. Target hardware is another dimension to consider. In short, without benchmarking, verification and evaluation, optimizations do not guarantee improved results and may even break things. One example is quantization using instructions that are not supported on target hardware. To address all these problems, we've built a tool to track inference optimizations, see how accuracy is affected, verify that the optimizations were applied and locate any bottlenecks for further improvements. All in one place. https://preview.redd.it/yzlxa21cdod91.png?width=3048&format=png&auto=webp&s=97306440ea508f65582978298f6e3ec291293902 More about inference optimization in this article, with code. And here is a live demo). submitted by /u/l0g1cs [link] [comments]  ( 88 min )
    [D]Help me set the parameter for GRU in PyTorch
    In this image, I want the shape of the 2nd y result to equal the 1st result. Can you help me with how to do that? Thanks. https://preview.redd.it/xezrld9uaod91.png?width=1600&format=png&auto=webp&s=1b37534d77d277506f20925c5a4d405d46843c41 submitted by /u/trncorn [link] [comments]  ( 88 min )
    [D] Can I create a Generative Adversarial Network for text using the logits without argmax
    There are two main problems when creating Generative Adversarial Networks for text: The discrete token values are not differentiable after applying argmax. The language architecture may generate sequences with different lengths so the discriminator should be able to work with them either with padding or with some representation of the whole sequence. I was wondering if is it possible to adversarially train a model such as T5. Its decoder produces a sequence with shape [batch_size, seq_len, model_dim] and then it is usually passed through a linear layer to get [batch_size, seq_len, vocab_size] logits. We can apply a softmax across the vocab_size dimension and then these probabilities can go to a discriminator. For the ground truth labels [batch_size, seq_len] we can generate one-hot vectors [batch_size, seq_len, vocab_size] and then apply the softmax to them as well. This will be sufficient for the discriminator to learn from truths but since we do not apply argmax to the tokens from the generator (T5 decoder), gradients should be able to reach it as well. For the second problem, I was thinking of computing mean to transform [batch_size, seq_len, vocab_size] probabilities to a "sequence representation" [batch_size, vocab_size], but I am not sure if this makes sense. So based on that, is 1 feasible and if not - why? What are some suggestions for solving 2? submitted by /u/IllustriousCicada603 [link] [comments]  ( 89 min )
    [D] Panoramic Xrays of teeth - Dataset
    I am looking for dataset that are the panoramic xrays of the teeth (upper and lowe jaw). That would help to detect the irregular teeth, cysts, tumors and infections by just from ML. If anyone can help with dataset ???? submitted by /u/NikhilArethiya [link] [comments]  ( 87 min )
    [D] CIKM 2022 Phase 1 Notification
    Has anyone received the 1st phase notification on June? submitted by /u/snu95 [link] [comments]  ( 108 min )
    [D] Can a paper about air combat be accepted by conferences such as ICLR, IJCAI, NeurIPS according to its ethics guidelines?
    Hello there, I'm a student of AI in air combat simulation. I'm doing research about dogfights between planes. Recently I just read the ethics guidelines of some conferences. For NeurIPS, it says " Consider whether the proposed methods and applications can directly facilitate injury to living beings. For example: could it be integrated into weapons or weapons systems?" For ACM, it says "Avoid harm ... "harm" means negative consequences, especially when those consequences are significant and unjust." For ICLR, it says "Avoid harm ... "harm" means negative consequences. Well-intended actions, including those that accomplish desired outcomes, may lead to harm. " So I wonder if a scientific about intelligent maneuver decision methods could be accepted by those conferences? Thank you! 2022.7.25 submitted by /u/mrwangyou [link] [comments]  ( 88 min )
  • Open

    Contextual Bandits: Recommending Article
    Very confused on terminology. Action: politics, sports, music, food Context: user, time_of_day Reward: click or not click??? I thought reward is fed to the algorithm when the action produces good results. Reward is what you give the algorithm. If someone clicks, you would give a reward. Isn’t it wrong the call ‘click’ and ‘not click’ rewards? And if so, what do you call them? Referencing https://vowpalwabbit.org/docs/vowpal_wabbit/python/latest/tutorials/python_Simulating_a_news_personalization_scenario_using_Contextual_Bandits.html submitted by /u/Seriouspretzel [link] [comments]  ( 87 min )
    Training generalist agents with multi-game decision transformers
    trained a decision transformer on 1B experiences from 41 Atari games and it learned pretty well. performance improved with no.of params (used upto ~200M params). claim that it's better to train over these experiences compared to training on expert demonstrations over different games like in GATO submitted by /u/dwightschrute1905 [link] [comments]  ( 86 min )
    Image Embeddings
    It seems like every major paper I've seen doesn't use transfer learning on RL games with visual inputs. They all seem to train a model from scratch. Everyone seems to agree that learning to see using just the signal from a (potentially sparse) reward isn't a good route, so there are a ton of interesting papers that pretrain using various unsupervised approaches. I would think this would be an extremely natural application for transfer learning from a conventional vision model. However, this approach isn't even mentioned in any of the papers I've read. Is there something that I'm missing about transfer learning not working well for something like the Atari gym tasks? Has this been tried and I've just managed to avoid papers that include this method? submitted by /u/MustachedSpud [link] [comments]  ( 107 min )
    Why actor-critic losses explode but the agent still learn?
    I've noticed that training a DDPG agent in the Reacher-v2 environment of OpenAI Gym, the losses of both actor and critic first decrease but after a while start increasing... in the meanwhile the episode mean reward keeps growing and the task is successfully solved. Can anyone explain me why? submitted by /u/One-Ad-8584 [link] [comments]  ( 87 min )
    "The 37 Implementation Details of Proximal Policy Optimization"
    submitted by /u/gwern [link] [comments]  ( 106 min )
    CleanRL now supports PPO + Isaac Gym: Train Ant in 250 seconds!
    submitted by /u/vwxyzjn [link] [comments]  ( 86 min )
    A solution to the semiconductor factory schedule optimization problem.
    Hello. Is there a suitable sample or paper for optimizing the schedule of the production line of a semiconductor factory with about 20 parameters through reinforcement learning? It feels like we're giving a reward if we keep the delivery date. For example, switching the production line may stop production for about 1 day, or if the necessary goods are not delivered, the progress of any production line may stop. By the way, can Google OR-Tools solve this problem? I understand that there are many solutions, and I wonder the most appropriate solution. submitted by /u/RanThomasAnderson [link] [comments]  ( 88 min )
  • Open

    How is Artificial Intelligence Transforming Meeting Experiences
    AI has single-handedly improved communications, boosted productivity, facilitated seamless collaboration, and much more. The post How is Artificial Intelligence Transforming Meeting Experiences appeared first on Data Science Central.  ( 18 min )
    What’s the Value of my Data?  Today’s Most Critical Yet Hard to Answer Question
    What’s the value of my data? Today’s most critical question to which every organization should know the answer still goes unanswered in an age where the world’s most valuable resource is data. The post What’s the Value of my Data?  Today’s Most Critical Yet Hard to Answer Question appeared first on Data Science Central.  ( 22 min )
    An example of Digital Twins Architecture – Azure Digital Twins
    Introduction In this post, I discussed the architecture of digital twins. This is a relatively new and emerging topic, and here I will look at Azure Digital Twins as an example to examine this architecture in depth. The architecture described below is from a cloud perspective It is created by integrating other existing cloud products  I… Read More »An example of Digital Twins Architecture – Azure Digital Twins The post An example of Digital Twins Architecture – Azure Digital Twins appeared first on Data Science Central.  ( 18 min )
    Sparql Secrets In Jena-Fuseki
    Jena has long been seen as one of the most current reference implementations of such knowledge engines to date. The post Sparql Secrets In Jena-Fuseki appeared first on Data Science Central.  ( 23 min )
  • Open

    Is there a canonical simple "helloworld" neural network design? Something beyond AND/OR logic, a handful of nodes that does something mildly "useful"?
    I've been building a memristor with the idea of creating a small hardware neural network, and am hoping someone has ideas for a neural net with only a handful of neurons (since I have to make each connection by hand). Ideas? Thinking maybe something like a three input light tracker for a little solar panel, but was curious if there were other ideas. https://bigattichouse.medium.com/penny-for-your-thoughts-copper-based-electrolytic-memristor-neural-network-part-4-8a6e43a3ce26 submitted by /u/bigattichouse [link] [comments]  ( 87 min )
    How is research done
    submitted by /u/PlentyRadiant4191 [link] [comments]  ( 86 min )
    Mutation probabilities in NEAT
    Hi! I'm trying to implement NEAT by myself as a hobby project and I have a fairly good idea on how to do it. However, I'm slightly scratching my head with probabilities as the original paper isn't very specific and the C++ code is very ugly (IMO) and hard to read. The paper talks about mutation probabilities: In each generation, 25% of offspring resulted from mutation without crossover. This is simple, no issues here. In smaller populations, the probability of adding a new node was 0.03 and the probability of a new link mutation was 0.05. Earlier in the paper: There was an 80% chance of a genome having its connection weights mutated... So the sum of these probabilities is 88%. I could add the mutation of changing the activation function but I think 12% probability for that is rather high. What other mutation could there be and what weights should I assign to them? Another question: 75% of the new offspring are created with crossover. Can or are mutation applied to these offspring as well? submitted by /u/codetrasher [link] [comments]  ( 92 min )
    Computer science engineers should be fascinated that implants allow the brain to conduct transfer learning like artificial neural network systems. "I know kung fu." — The Matrix (1999)
    “Researchers [from Wake Forest University, the University of California, and the University of Kentucky] performed surgery on 11 rats,” writes Michael Joseph Gross in “The Pentagon’s Push to Program Soldiers’ Brains” for The Atlantic: Into each rat’s brain, an electronic array—featuring 16 stainless-steel wires—was implanted. After the rats recovered from surgery, they were separated into two groups, and they spent a period of weeks getting educated, though one group was educated more than the other. When the more educated group of rats attained mastery of this task, the researchers exported the neural-firing patterns recorded in the rats’ brains—the memory of how to perform the complex task—to a computer. “What we did then was we took those signals and we gave it to an animal that was stupid,” Geoff Ling said at a DARPA event in 2015—meaning that researchers took the neural-firing patterns encoding the memory of how to perform the more complex task, recorded from the brains of the more educated rats, and transferred those patterns into the brains of the less educated rats—”and that stupid animal got it. They were able to execute that full thing.” Ling summarized: “For this rat, we reduced the learning period from eight weeks down to seconds.” https://www.theatlantic.com/magazine/archive/2018/11/the-pentagon-wants-to-weaponize-the-brain-what-could-go-wrong/570841/ submitted by /u/TheSkewsMe [link] [comments]  ( 89 min )
  • Open

    Developing advanced machine learning systems at Trumid with the Deep Graph Library for Knowledge Embedding
    This is a guest post co-written with Mutisya Ndunda from Trumid. Like many industries, the corporate bond market doesn’t lend itself to a one-size-fits-all approach. It’s vast, liquidity is fragmented, and institutional clients demand solutions tailored to their specific needs. Advances in AI and machine learning (ML) can be employed to improve the customer experience, […]  ( 7 min )
  • Open

    Digital Sculptor Does Heavy Lifting With Lightweight Mobile Workstation
    As a professional digital sculptor, Marlon Nuñez is on a mission to make learning 3D art skills easier, smoother and more fun for all. And with the help of an NVIDIA RTX-powered Lenovo mobile workstation, he takes his 3D projects to the next level, wherever he goes. Nuñez is the art director and co-founder of Read article > The post Digital Sculptor Does Heavy Lifting With Lightweight Mobile Workstation appeared first on NVIDIA Blog.  ( 5 min )
  • Open

    Will there be a Significant Impact of AI in FinTech over the Next Decade?
    Artificial Intelligence(AI) has taken the world of tech by a blast. AI in the FinTech market is being utilized at a rising rate. It is…  ( 12 min )
  • Open

    Defending Substitution-Based Profile Pollution Attacks on Sequential Recommenders. (arXiv:2207.11237v1 [cs.IR])
    While sequential recommender systems achieve significant improvements on capturing user dynamics, we argue that sequential recommenders are vulnerable against substitution-based profile pollution attacks. To demonstrate our hypothesis, we propose a substitution-based adversarial attack algorithm, which modifies the input sequence by selecting certain vulnerable elements and substituting them with adversarial items. In both untargeted and targeted attack scenarios, we observe significant performance deterioration using the proposed profile pollution algorithm. Motivated by such observations, we design an efficient adversarial defense method called Dirichlet neighborhood sampling. Specifically, we sample item embeddings from a convex hull constructed by multi-hop neighbors to replace the original items in input sequences. During sampling, a Dirichlet distribution is used to approximate the probability distribution in the neighborhood such that the recommender learns to combat local perturbations. Additionally, we design an adversarial training method tailored for sequential recommender systems. In particular, we represent selected items with one-hot encodings and perform gradient ascent on the encodings to search for the worst case linear combination of item embeddings in training. As such, the embedding function learns robust item representations and the trained recommender is resistant to test-time adversarial examples. Extensive experiments show the effectiveness of both our attack and defense methods, which consistently outperform baselines by a significant margin across model architectures and datasets.  ( 3 min )
    Sign and Relevance learning. (arXiv:2110.07292v2 [cs.LG] UPDATED)
    Standard models of biologically realistic, or inspired, reinforcement learning employ a global error signal which implies shallow networks. However, on the other hand, local learning rules allow networks with multiple layers. Here, we present a network combining local learning with global modulation where neuromodulation controls the amount of plasticity change in the whole network, while the sign of the error is passed via a bottom-up pathway through the network. Neuromodulation can be understood as a rectified error, or relevance, signal while the bottom-up sign of the error signal decides between long-term potentiation and long-term depression. We demonstrate the performance of this paradigm with a real robotic task as a proof of concept.  ( 2 min )
    Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks. (arXiv:2201.11729v4 [cs.LG] UPDATED)
    In the pursuit of explaining implicit regularization in deep learning, prominent focus was given to matrix and tensor factorizations, which correspond to simplified neural networks. It was shown that these models exhibit an implicit tendency towards low matrix and tensor ranks, respectively. Drawing closer to practical deep learning, the current paper theoretically analyzes the implicit regularization in hierarchical tensor factorization, a model equivalent to certain deep convolutional neural networks. Through a dynamical systems lens, we overcome challenges associated with hierarchy, and establish implicit regularization towards low hierarchical tensor rank. This translates to an implicit regularization towards locality for the associated convolutional networks. Inspired by our theory, we design explicit regularization discouraging locality, and demonstrate its ability to improve the performance of modern convolutional networks on non-local tasks, in defiance of conventional wisdom by which architectural changes are needed. Our work highlights the potential of enhancing neural networks via theoretical analysis of their implicit regularization.  ( 3 min )
    Boosting Transferability of Targeted Adversarial Examples via Hierarchical Generative Networks. (arXiv:2107.01809v2 [cs.LG] UPDATED)
    Transfer-based adversarial attacks can evaluate model robustness in the black-box setting. Several methods have demonstrated impressive untargeted transferability, however, it is still challenging to efficiently produce targeted transferability. To this end, we develop a simple yet effective framework to craft targeted transfer-based adversarial examples, applying a hierarchical generative network. In particular, we contribute to amortized designs that well adapt to multi-class targeted attacks. Extensive experiments on ImageNet show that our method improves the success rates of targeted black-box attacks by a significant margin over the existing methods -- it reaches an average success rate of 29.1\% against six diverse models based only on one substitute white-box model, which significantly outperforms the state-of-the-art gradient-based attack methods. Moreover, the proposed method is also more efficient beyond an order of magnitude than gradient-based methods.
    DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale. (arXiv:2201.05596v2 [cs.LG] UPDATED)
    As the training of giant dense models hits the boundary on the availability and capability of the hardware resources today, Mixture-of-Experts (MoE) models become one of the most promising model architectures due to their significant training cost reduction compared to a quality-equivalent dense model. Its training cost saving is demonstrated from encoder-decoder models (prior works) to a 5x saving for auto-aggressive language models (this work along with parallel explorations). However, due to the much larger model size and unique architecture, how to provide fast MoE model inference remains challenging and unsolved, limiting its practical usage. To tackle this, we present DeepSpeed-MoE, an end-to-end MoE training and inference solution as part of the DeepSpeed library, including novel MoE architecture designs and model compression techniques that reduce MoE model size by up to 3.7x, and a highly optimized inference system that provides 7.3x better latency and cost compared to existing MoE inference solutions. DeepSpeed-MoE offers an unprecedented scale and efficiency to serve massive MoE models with up to 4.5x faster and 9x cheaper inference compared to quality-equivalent dense models. We hope our innovations and systems help open a promising path to new directions in the large model landscape, a shift from dense to sparse MoE models, where training and deploying higher-quality models with fewer resources becomes more widely possible.
    An Extensive Data Processing Pipeline for MIMIC-IV. (arXiv:2204.13841v2 [cs.LG] UPDATED)
    An increasing amount of research is being devoted to applying machine learning methods to electronic health record (EHR) data for various clinical tasks. This growing area of research has exposed the limitation of accessibility of EHR datasets for all, as well as the reproducibility of different modeling frameworks. One reason for these limitations is the lack of standardized pre-processing pipelines. MIMIC is a freely available EHR dataset in a raw format used in numerous studies. The absence of standardized pre-processing steps serves as a significant barrier to the wider adoption of the dataset. It also leads to different cohorts being used in downstream tasks, limiting the ability to compare the results among similar studies. Contrasting studies also use various distinct performance metrics, which can greatly reduce the ability to compare model results. In this work, we provide an end-to-end fully customizable pipeline to extract, clean, and pre-process data; and to predict and evaluate the fourth version of the MIMIC dataset (MIMIC-IV) for ICU and non-ICU-related clinical time-series prediction tasks. The tool is publicly available at https://github.com/healthylaife/MIMIC-IV-Data-Pipeline.
    Hierarchical Average Precision Training for Pertinent Image Retrieval. (arXiv:2207.04873v2 [cs.CV] UPDATED)
    Image Retrieval is commonly evaluated with Average Precision (AP) or Recall@k. Yet, those metrics, are limited to binary labels and do not take into account errors' severity. This paper introduces a new hierarchical AP training method for pertinent image retrieval (HAP-PIER). HAPPIER is based on a new H-AP metric, which leverages a concept hierarchy to refine AP by integrating errors' importance and better evaluate rankings. To train deep models with H-AP, we carefully study the problem's structure and design a smooth lower bound surrogate combined with a clustering loss that ensures consistent ordering. Extensive experiments on 6 datasets show that HAPPIER significantly outperforms state-of-the-art methods for hierarchical retrieval, while being on par with the latest approaches when evaluating fine-grained ranking performances. Finally, we show that HAPPIER leads to better organization of the embedding space, and prevents most severe failure cases of non-hierarchical methods. Our code is publicly available at: https://github.com/elias-ramzi/HAPPIER.
    Function-space Inference with Sparse Implicit Processes. (arXiv:2110.07618v3 [stat.ML] UPDATED)
    Implicit Processes (IPs) represent a flexible framework that can be used to describe a wide variety of models, from Bayesian neural networks, neural samplers and data generators to many others. IPs also allow for approximate inference in function-space. This change of formulation solves intrinsic degenerate problems of parameter-space approximate inference concerning the high number of parameters and their strong dependencies in large models. For this, previous works in the literature have attempted to employ IPs both to set up the prior and to approximate the resulting posterior. However, this has proven to be a challenging task. Existing methods that can tune the prior IP result in a Gaussian predictive distribution, which fails to capture important data patterns. By contrast, methods producing flexible predictive distributions by using another IP to approximate the posterior process cannot tune the prior IP to the observed data. We propose here the first method that can accomplish both goals. For this, we rely on an inducing-point representation of the prior IP, as often done in the context of sparse Gaussian processes. The result is a scalable method for approximate inference with IPs that can tune the prior IP parameters to the data, and that provides accurate non-Gaussian predictive distributions.
    Multimodal Detection of Unknown Objects on Roads for Autonomous Driving. (arXiv:2205.01414v3 [cs.CV] UPDATED)
    Tremendous progress in deep learning over the last years has led towards a future with autonomous vehicles on our roads. Nevertheless, the performance of their perception systems is strongly dependent on the quality of the utilized training data. As these usually only cover a fraction of all object classes an autonomous driving system will face, such systems struggle with handling the unexpected. In order to safely operate on public roads, the identification of objects from unknown classes remains a crucial task. In this paper, we propose a novel pipeline to detect unknown objects. Instead of focusing on a single sensor modality, we make use of lidar and camera data by combining state-of-the art detection models in a sequential manner. We evaluate our approach on the Waymo Open Perception Dataset and point out current research gaps in anomaly detection.
    Self-Supervised-RCNN for Medical Image Segmentation with Limited Data Annotation. (arXiv:2207.11191v1 [cs.CV])
    Many successful methods developed for medical image analysis that are based on machine learning use supervised learning approaches, which often require large datasets annotated by experts to achieve high accuracy. However, medical data annotation is time-consuming and expensive, especially for segmentation tasks. To solve the problem of learning with limited labeled medical image data, an alternative deep learning training strategy based on self-supervised pretraining on unlabeled MRI scans is proposed in this work. Our pretraining approach first, randomly applies different distortions to random areas of unlabeled images and then predicts the type of distortions and loss of information. To this aim, an improved version of Mask-RCNN architecture has been adapted to localize the distortion location and recover the original image pixels. The effectiveness of the proposed method for segmentation tasks in different pre-training and fine-tuning scenarios is evaluated based on the Osteoarthritis Initiative dataset. Using this self-supervised pretraining method improved the Dice score by 20% compared to training from scratch. The proposed self-supervised learning is simple, effective, and suitable for different ranges of medical image analysis tasks including anomaly detection, segmentation, and classification.  ( 2 min )
    Deriving discriminative classifiers from generative models. (arXiv:2201.00844v2 [stat.ML] UPDATED)
    We deal with Bayesian generative and discriminative classifiers. Given a model distribution $p(x, y)$, with the observation $y$ and the target $x$, one computes generative classifiers by firstly considering $p(x, y)$ and then using the Bayes rule to calculate $p(x | y)$. A discriminative model is directly given by $p(x | y)$, which is used to compute discriminative classifiers. However, recent works showed that the Bayesian Maximum Posterior classifier defined from the Naive Bayes (NB) or Hidden Markov Chain (HMC), both generative models, can also match the discriminative classifier definition. Thus, there are situations in which dividing classifiers into "generative" and "discriminative" is somewhat misleading. Indeed, such a distinction is rather related to the way of computing classifiers, not to the classifiers themselves. We present a general theoretical result specifying how a generative classifier induced from a generative model can also be computed in a discriminative way from the same model. Examples of NB and HMC are found again as particular cases, and we apply the general result to two original extensions of NB, and two extensions of HMC, one of which being original. Finally, we shortly illustrate the interest of the new discriminative way of computing classifiers in the Natural Language Processing (NLP) framework.  ( 3 min )
    BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition. (arXiv:2109.13226v3 [eess.AS] UPDATED)
    We summarize the results of a host of efforts using giant automatic speech recognition (ASR) models pre-trained using large, diverse unlabeled datasets containing approximately a million hours of audio. We find that the combination of pre-training, self-training and scaling up model size greatly increases data efficiency, even for extremely large tasks with tens of thousands of hours of labeled data. In particular, on an ASR task with 34k hours of labeled data, by fine-tuning an 8 billion parameter pre-trained Conformer model we can match state-of-the-art (SoTA) performance with only 3% of the training data and significantly improve SoTA with the full training set. We also report on the universal benefits gained from using big pre-trained and self-trained models for a large set of downstream tasks that cover a wide range of speech domains and span multiple orders of magnitudes of dataset sizes, including obtaining SoTA performance on many public benchmarks. In addition, we utilize the learned representation of pre-trained networks to achieve SoTA results on non-ASR tasks.  ( 3 min )
    Physics-informed neural networks to learn cardiac fiber orientation from multiple electroanatomical maps. (arXiv:2201.12362v3 [eess.IV] UPDATED)
    We propose FiberNet, a method to estimate \emph{in-vivo} the cardiac fiber architecture of the human atria from multiple catheter recordings of the electrical activation. Cardiac fibers play a central role in the electro-mechanical function of the heart, yet they are difficult to determine in-vivo, and hence rarely truly patient-specific in existing cardiac models. FiberNet learns the fiber arrangement by solving an inverse problem with physics-informed neural networks. The inverse problem amounts to identifying the conduction velocity tensor of a cardiac propagation model from a set of sparse activation maps. The use of multiple maps enables the simultaneous identification of all the components of the conduction velocity tensor, including the local fiber angle. We extensively test FiberNet on synthetic 2-D and 3-D examples, diffusion tensor fibers, and a patient-specific case. We show that 3 maps are sufficient to accurately capture the fibers, also in the presence of noise. With fewer maps, the role of regularization becomes prominent. Moreover, we show that the fitted model can robustly reproduce unseen activation maps. We envision that FiberNet will help the creation of patient-specific models for personalized medicine. The full code is available at this http URL  ( 3 min )
    On the sample complexity of stabilizing linear dynamical systems from data. (arXiv:2203.00474v2 [math.OC] UPDATED)
    Learning controllers from data for stabilizing dynamical systems typically follows a two step process of first identifying a model and then constructing a controller based on the identified model. However, learning models means identifying generic descriptions of the dynamics of systems, which can require large amounts of data and extracting information that are unnecessary for the specific task of stabilization. The contribution of this work is to show that if a linear dynamical system has dimension (McMillan degree) $n$, then there always exist $n$ states from which a stabilizing feedback controller can be constructed, independent of the dimension of the representation of the observed states and the number of inputs. By building on previous work, this finding implies that any linear dynamical system can be stabilized from fewer observed states than the minimal number of states required for learning a model of the dynamics. The theoretical findings are demonstrated with numerical experiments that show the stabilization of the flow behind a cylinder from less data than necessary for learning a model.  ( 2 min )
    Large-Kernel Attention for 3D Medical Image Segmentation. (arXiv:2207.11225v1 [eess.IV])
    Automatic segmentation of multiple organs and tumors from 3D medical images such as magnetic resonance imaging (MRI) and computed tomography (CT) scans using deep learning methods can aid in diagnosing and treating cancer. However, organs often overlap and are complexly connected, characterized by extensive anatomical variation and low contrast. In addition, the diversity of tumor shape, location, and appearance, coupled with the dominance of background voxels, makes accurate 3D medical image segmentation difficult. In this paper, a novel large-kernel (LK) attention module is proposed to address these problems to achieve accurate multi-organ segmentation and tumor segmentation. The advantages of convolution and self-attention are combined in the proposed LK attention module, including local contextual information, long-range dependence, and channel adaptation. The module also decomposes the LK convolution to optimize the computational cost and can be easily incorporated into FCNs such as U-Net. Comprehensive ablation experiments demonstrated the feasibility of convolutional decomposition and explored the most efficient and effective network design. Among them, the best Mid-type LK attention-based U-Net network was evaluated on CT-ORG and BraTS 2020 datasets, achieving state-of-the-art segmentation performance. The performance improvement due to the proposed LK attention module was also statistically validated.  ( 3 min )
    Reinforcement Learning Approaches for the Orienteering Problem with Stochastic and Dynamic Release Dates. (arXiv:2207.00885v2 [math.OC] UPDATED)
    In this paper, we study a sequential decision making problem faced by e-commerce carriers related to when to send out a vehicle from the central depot to serve customer requests, and in which order to provide the service, under the assumption that the time at which parcels arrive at the depot is stochastic and dynamic. The objective is to maximize the number of parcels that can be delivered during the service hours. We propose two reinforcement learning approaches for solving this problem, one based on a policy function approximation (PFA) and the second on a value function approximation (VFA). Both methods are combined with a look-ahead strategy, in which future release dates are sampled in a Monte-Carlo fashion and a tailored batch approach is used to approximate the value of future states. Our PFA and VFA make a good use of branch-and-cut-based exact methods to improve the quality of decisions. We also establish sufficient conditions for partial characterization of optimal policy and integrate them into PFA/VFA. In an empirical study based on 720 benchmark instances, we conduct a competitive analysis using upper bounds with perfect information and we show that PFA and VFA greatly outperform two alternative myopic approaches. Overall, PFA provides best solutions, while VFA (which benefits from a two-stage stochastic optimization model) achieves a better tradeoff between solution quality and computing time.
    Quantum Metropolis Solver: A Quantum Walks Approach to Optimization Problems. (arXiv:2207.06462v1 [quant-ph] CROSS LISTED)
    The efficient resolution of optimization problems is one of the key issues in today's industry. This task relies mainly on classical algorithms that present scalability problems and processing limitations. Quantum computing has emerged to challenge these types of problems. In this paper, we focus on the Metropolis-Hastings quantum algorithm that is based on quantum walks. We use this algorithm to build a quantum software tool called Quantum Metropolis Solver (QMS). We validate QMS with the N-Queen problem to show a potential quantum advantage in an example that can be easily extrapolated to an Artificial Intelligence domain. We carry out different simulations to validate the performance of QMS and its configuration.  ( 2 min )
    CoLES: Contrastive Learning for Event Sequences with Self-Supervision. (arXiv:2002.08232v3 [cs.LG] UPDATED)
    We address the problem of self-supervised learning on discrete event sequences generated by real-world users. Self-supervised learning incorporates complex information from the raw data in low-dimensional fixed-length vector representations that could be easily applied in various downstream machine learning tasks. In this paper, we propose a new method "CoLES", which adapts contrastive learning, previously used for audio and computer vision domains, to the discrete event sequences domain in a self-supervised setting. We deployed CoLES embeddings based on sequences of transactions at the large European financial services company. Usage of CoLES embeddings significantly improves the performance of the pre-existing models on downstream tasks and produces significant financial gains, measured in hundreds of millions of dollars yearly. We also evaluated CoLES on several public event sequences datasets and showed that CoLES representations consistently outperform other methods on different downstream tasks.  ( 2 min )
    Learning Energy-Based Models With Adversarial Training. (arXiv:2012.06568v3 [cs.LG] UPDATED)
    We study a new approach to learning energy-based models (EBMs) based on adversarial training (AT). We show that (binary) AT learns a special kind of energy function that models the support of the data distribution, and the learning process is closely related to MCMC-based maximum likelihood learning of EBMs. We further propose improved techniques for generative modeling with AT, and demonstrate that this new approach is capable of generating diverse and realistic images. Aside from having competitive image generation performance to explicit EBMs, the studied approach is stable to train, is well-suited for image translation tasks, and exhibits strong out-of-distribution adversarial robustness. Our results demonstrate the viability of the AT approach to generative modeling, suggesting that AT is a competitive alternative approach to learning EBMs.  ( 2 min )
    Fast Bayesian Coresets via Subsampling and Quasi-Newton Refinement. (arXiv:2203.09675v2 [stat.ML] UPDATED)
    Bayesian coresets approximate a posterior distribution by building a small weighted subset of the data points. Any inference procedure that is too computationally expensive to be run on the full posterior can instead be run inexpensively on the coreset, with results that approximate those on the full data. However, current approaches are limited by either a significant run-time or the need for the user to specify a low-cost approximation to the full posterior. We propose a Bayesian coreset construction algorithm that first selects a uniformly random subset of data, and then optimizes the weights using a novel quasi-Newton method. Our algorithm is a simple to implement, black-box method, that does not require the user to specify a low-cost posterior approximation. It is the first to come with a general high-probability bound on the KL divergence of the output coreset posterior. Experiments demonstrate that our method provides significant improvements in coreset quality against alternatives with comparable construction times, with far less storage cost and user input required.  ( 2 min )
    Neighbour Interaction based Click-Through Rate Prediction via Graph-masked Transformer. (arXiv:2201.13311v2 [cs.IR] UPDATED)
    Click-Through Rate (CTR) prediction, which aims to estimate the probability that a user will click an item, is an essential component of online advertising. Existing methods mainly attempt to mine user interests from users' historical behaviours, which contain users' directly interacted items. Although these methods have made great progress, they are often limited by the recommender system's direct exposure and inactive interactions, and thus fail to mine all potential user interests. To tackle these problems, we propose Neighbor-Interaction based CTR prediction (NI-CTR), which considers this task under a Heterogeneous Information Network (HIN) setting. In short, Neighbor-Interaction based CTR prediction involves the local neighborhood of the target user-item pair in the HIN to predict their linkage. In order to guide the representation learning of the local neighbourhood, we further consider different kinds of interactions among the local neighborhood nodes from both explicit and implicit perspective, and propose a novel Graph-Masked Transformer (GMT) to effectively incorporates these kinds of interactions to produce highly representative embeddings for the target user-item pair. Moreover, in order to improve model robustness against neighbour sampling, we enforce a consistency regularization loss over the neighbourhood embedding. We conduct extensive experiments on two real-world datasets with millions of instances and the experimental results show that our proposed method outperforms state-of-the-art CTR models significantly. Meanwhile, the comprehensive ablation studies verify the effectiveness of every component of our model. Furthermore, we have deployed this framework on the WeChat Official Account Platform with billions of users. The online A/B tests demonstrate an average CTR improvement of 21.9 against all online baselines.  ( 3 min )
    Training Certifiably Robust Neural Networks Against Semantic Perturbations. (arXiv:2207.11177v1 [cs.CV])
    Semantic image perturbations, such as scaling and rotation, have been shown to easily deceive deep neural networks (DNNs). Hence, training DNNs to be certifiably robust to these perturbations is critical. However, no prior work has been able to incorporate the objective of deterministic semantic robustness into the training procedure, as existing deterministic semantic verifiers are exceedingly slow. To address these challenges, we propose Certified Semantic Training (CST), the first training framework for deterministic certified robustness against semantic image perturbations. Our framework leverages a novel GPU-optimized verifier that, unlike existing works, is fast enough for use in training. Our results show that networks trained via CST consistently achieve both better provable semantic robustness and clean accuracy, compared to networks trained via baselines based on existing works.  ( 2 min )
    Improving Nonparametric Classification via Local Radial Regression with an Application to Stock Prediction. (arXiv:2112.13951v2 [stat.ML] UPDATED)
    For supervised classification problems, this paper considers estimating the query's label probability through local regression using observed covariates. Well-known nonparametric kernel smoother and $k$-nearest neighbor ($k$-NN) estimator, which take label average over a ball around the query, are consistent but asymptotically biased particularly for a large radius of the ball. To eradicate such bias, local polynomial regression (LPoR) and multiscale $k$-NN (MS-$k$-NN) learn the bias term by local regression around the query and extrapolate it to the query itself. However, their theoretical optimality has been shown for the limit of the infinite number of training samples. For correcting the asymptotic bias with fewer observations, this paper proposes a \emph{local radial regression (LRR)} and its logistic regression variant called \emph{local radial logistic regression~(LRLR)}, by combining the advantages of LPoR and MS-$k$-NN. The idea is quite simple: we fit the local regression to observed labels by taking only the radial distance as the explanatory variable and then extrapolate the estimated label probability to zero distance. The usefulness of the proposed method is shown theoretically and experimentally. We prove the convergence rate of the $L^2$ risk for LRR with reference to MS-$k$-NN, and our numerical experiments, including real-world datasets of daily stock indices, demonstrate that LRLR outperforms LPoR and MS-$k$-NN.  ( 3 min )
    Seeing 3D Objects in a Single Image via Self-Supervised Static-Dynamic Disentanglement. (arXiv:2207.11232v1 [cs.CV])
    Human perception reliably identifies movable and immovable parts of 3D scenes, and completes the 3D structure of objects and background from incomplete observations. We learn this skill not via labeled examples, but simply by observing objects move. In this work, we propose an approach that observes unlabeled multi-view videos at training time and learns to map a single image observation of a complex scene, such as a street with cars, to a 3D neural scene representation that is disentangled into movable and immovable parts while plausibly completing its 3D structure. We separately parameterize movable and immovable scene parts via 2D neural ground plans. These ground plans are 2D grids of features aligned with the ground plane that can be locally decoded into 3D neural radiance fields. Our model is trained self-supervised via neural rendering. We demonstrate that the structure inherent to our disentangled 3D representation enables a variety of downstream tasks in street-scale 3D scenes using simple heuristics, such as extraction of object-centric 3D representations, novel view synthesis, instance segmentation, and 3D bounding box prediction, highlighting its value as a backbone for data-efficient 3D scene understanding models. This disentanglement further enables scene editing via object manipulation such as deletion, insertion, and rigid-body motion.  ( 3 min )
    E2N: Error Estimation Networks for Goal-Oriented Mesh Adaptation. (arXiv:2207.11233v1 [cs.LG])
    Given a partial differential equation (PDE), goal-oriented error estimation allows us to understand how errors in a diagnostic quantity of interest (QoI), or goal, occur and accumulate in a numerical approximation, for example using the finite element method. By decomposing the error estimates into contributions from individual elements, it is possible to formulate adaptation methods, which modify the mesh with the objective of minimising the resulting QoI error. However, the standard error estimate formulation involves the true adjoint solution, which is unknown in practice. As such, it is common practice to approximate it with an 'enriched' approximation (e.g. in a higher order space or on a refined mesh). Doing so generally results in a significant increase in computational cost, which can be a bottleneck compromising the competitiveness of (goal-oriented) adaptive simulations. The central idea of this paper is to develop a "data-driven" goal-oriented mesh adaptation approach through the selective replacement of the expensive error estimation step with an appropriately configured and trained neural network. In doing so, the error estimator may be obtained without even constructing the enriched spaces. An element-by-element construction is employed here, whereby local values of various parameters related to the mesh geometry and underlying problem physics are taken as inputs, and the corresponding contribution to the error estimator is taken as output. We demonstrate that this approach is able to obtain the same accuracy with a reduced computational cost, for adaptive mesh test cases related to flow around tidal turbines, which interact via their downstream wakes, and where the overall power output of the farm is taken as the QoI. Moreover, we demonstrate that the element-by-element approach implies reasonably low training costs.  ( 3 min )
    Learning Unsupervised Hierarchies of Audio Concepts. (arXiv:2207.11231v1 [cs.SD])
    Music signals are difficult to interpret from their low-level features, perhaps even more than images: e.g. highlighting part of a spectrogram or an image is often insufficient to convey high-level ideas that are genuinely relevant to humans. In computer vision, concept learning was therein proposed to adjust explanations to the right abstraction level (e.g. detect clinical concepts from radiographs). These methods have yet to be used for MIR. In this paper, we adapt concept learning to the realm of music, with its particularities. For instance, music concepts are typically non-independent and of mixed nature (e.g. genre, instruments, mood), unlike previous work that assumed disentangled concepts. We propose a method to learn numerous music concepts from audio and then automatically hierarchise them to expose their mutual relationships. We conduct experiments on datasets of playlists from a music streaming service, serving as a few annotated examples for diverse concepts. Evaluations show that the mined hierarchies are aligned with both ground-truth hierarchies of concepts -- when available -- and with proxy sources of concept similarity in the general case.  ( 2 min )
    Face editing with GAN -- A Review. (arXiv:2207.11227v1 [cs.CV])
    In recent years, Generative Adversarial Networks (GANs) have become a hot topic among researchers and engineers that work with deep learning. It has been a ground-breaking technique which can generate new pieces of content of data in a consistent way. The topic of GANs has exploded in popularity due to its applicability in fields like image generation and synthesis, and music production and composition. GANs have two competing neural networks: a generator and a discriminator. The generator is used to produce new samples or pieces of content, while the discriminator is used to recognize whether the piece of content is real or generated. What makes it different from other generative models is its ability to learn unlabeled samples. In this review paper, we will discuss the evolution of GANs, several improvements proposed by the authors and a brief comparison between the different models. Index Terms generative adversarial networks, unsupervised learning, deep learning.  ( 2 min )
    OmniXAI: A Library for Explainable AI. (arXiv:2206.01612v5 [cs.LG] UPDATED)
    We introduce OmniXAI (short for Omni eXplainable AI), an open-source Python library of eXplainable AI (XAI), which offers omni-way explainable AI capabilities and various interpretable machine learning techniques to address the pain points of understanding and interpreting the decisions made by machine learning (ML) in practice. OmniXAI aims to be a one-stop comprehensive library that makes explainable AI easy for data scientists, ML researchers and practitioners who need explanation for various types of data, models and explanation methods at different stages of ML process (data exploration, feature engineering, model development, evaluation, and decision-making, etc). In particular, our library includes a rich family of explanation methods integrated in a unified interface, which supports multiple data types (tabular data, images, texts, time-series), multiple types of ML models (traditional ML in Scikit-learn and deep learning models in PyTorch/TensorFlow), and a range of diverse explanation methods including "model-specific" and "model-agnostic" ones (such as feature-attribution explanation, counterfactual explanation, gradient-based explanation, etc). For practitioners, the library provides an easy-to-use unified interface to generate the explanations for their applications by only writing a few lines of codes, and also a GUI dashboard for visualization of different explanations for more insights about decisions. In this technical report, we present OmniXAI's design principles, system architectures, and major functionalities, and also demonstrate several example use cases across different types of data, tasks, and models.
    NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework. (arXiv:2111.04130v2 [cs.CL] UPDATED)
    Pretrained language models have become the standard approach for many NLP tasks due to strong performance, but they are very expensive to train. We propose a simple and efficient learning framework, TLM, that does not rely on large-scale pretraining. Given some labeled task data and a large general corpus, TLM uses task data as queries to retrieve a tiny subset of the general corpus and jointly optimizes the task objective and the language modeling objective from scratch. On eight classification datasets in four domains, TLM achieves results better than or similar to pretrained language models (e.g., RoBERTa-Large) while reducing the training FLOPs by two orders of magnitude. With high accuracy and efficiency, we hope TLM will contribute to democratizing NLP and expediting its development.
    Self-attention Presents Low-dimensional Knowledge Graph Embeddings for Link Prediction. (arXiv:2112.10644v2 [cs.LG] UPDATED)
    A few models have tried to tackle the link prediction problem, also known as knowledge graph completion, by embedding knowledge graphs in comparably lower dimensions. However, the state-of-the-art results are attained at the cost of considerably increasing the dimensionality of embeddings which causes scalability issues in the case of huge knowledge bases. Transformers have been successfully used recently as powerful encoders for knowledge graphs, but available models still have scalability issues. To address this limitation, we introduce a Transformer-based model to gain expressive low-dimensional embeddings. We utilize a large number of self-attention heads as the key to applying query-dependent projections to capture mutual information between entities and relations. Empirical results on WN18RR and FB15k-237 as standard link prediction benchmarks demonstrate that our model has favorably comparable performance with the current state-of-the-art models. Notably, we yield our promising results with a significant reduction of 66.9% in the dimensionality of embeddings compared to the five best recent state-of-the-art competitors on average.
    FewGAN: Generating from the Joint Distribution of a Few Images. (arXiv:2207.11226v1 [cs.CV])
    We introduce FewGAN, a generative model for generating novel, high-quality and diverse images whose patch distribution lies in the joint patch distribution of a small number of N>1 training samples. The method is, in essence, a hierarchical patch-GAN that applies quantization at the first coarse scale, in a similar fashion to VQ-GAN, followed by a pyramid of residual fully convolutional GANs at finer scales. Our key idea is to first use quantization to learn a fixed set of patch embeddings for training images. We then use a separate set of side images to model the structure of generated images using an autoregressive model trained on the learned patch embeddings of training images. Using quantization at the coarsest scale allows the model to generate both conditional and unconditional novel images. Subsequently, a patch-GAN renders the fine details, resulting in high-quality images. In an extensive set of experiments, it is shown that FewGAN outperforms baselines both quantitatively and qualitatively.
    Rapid protein assignments and structures from raw NMR spectra with the deep learning technique ARTINA. (arXiv:2201.12041v4 [q-bio.BM] UPDATED)
    Nuclear Magnetic Resonance (NMR) spectroscopy is one of the major techniques in structural biology with over 11,800 protein structures deposited in the Protein Data Bank. NMR can elucidate structures and dynamics of small and medium size proteins in solution, living cells, and solids, but has been limited by the tedious data analysis process. It typically requires weeks or months of manual work of a trained expert to turn NMR measurements into a protein structure. Automation of this process is an open problem, formulated in the field over 30 years ago. Here, we present a solution to this challenge that enables the completely automated analysis of protein NMR data within hours after completing the measurements. Using only NMR spectra and the protein sequence as input, our machine learning-based method, ARTINA, delivers signal positions, resonance assignments, and structures strictly without any human intervention. Tested on a 100-protein benchmark comprising 1329 multidimensional NMR spectra, ARTINA demonstrated its ability to solve structures with 1.44 {\AA} median RMSD to the PDB reference and to identify 91.36% correct NMR resonance assignments. ARTINA can be used by non-experts, reducing the effort for a protein assignment or structure determination by NMR essentially to the preparation of the sample and the spectra measurements.
    Do Artificial Intelligence Systems Understand?. (arXiv:2207.11089v1 [cs.AI])
    Are intelligent machines really intelligent? Is the underlying philosophical concept of intelligence satisfactory for describing how the present systems work? Is understanding a necessary and sufficient condition for intelligence? If a machine could understand, should we attribute subjectivity to it? This paper addresses the problem of deciding whether the so-called "intelligent machines" are capable of understanding, instead of merely processing signs. It deals with the relationship between syntaxis and semantics. The main thesis concerns the inevitability of semantics for any discussion about the possibility of building conscious machines, condensed into the following two tenets: "If a machine is capable of understanding (in the strong sense), then it must be capable of combining rules and intuitions"; "If semantics cannot be reduced to syntaxis, then a machine cannot understand." Our conclusion states that it is not necessary to attribute understanding to a machine in order to explain its exhibited "intelligent" behavior; a merely syntactic and mechanistic approach to intelligence as a task-solving tool suffices to justify the range of operations that it can display in the current state of technological development.
    Fairness-aware Network Revenue Management with Demand Learning. (arXiv:2207.11159v1 [stat.ML])
    In addition to maximizing the total revenue, decision-makers in lots of industries would like to guarantee fair consumption across different resources and avoid saturating certain resources. Motivated by these practical needs, this paper studies the price-based network revenue management problem with both demand learning and fairness concern about the consumption across different resources. We introduce the regularized revenue, i.e., the total revenue with a fairness regularization, as our objective to incorporate fairness into the revenue maximization goal. We propose a primal-dual-type online policy with the Upper-Confidence-Bound (UCB) demand learning method to maximize the regularized revenue. We adopt several innovative techniques to make our algorithm a unified and computationally efficient framework for the continuous price set and a wide class of fairness regularizers. Our algorithm achieves a worst-case regret of $\tilde O(N^{5/2}\sqrt{T})$, where $N$ denotes the number of products and $T$ denotes the number of time periods. Numerical experiments in a few NRM examples demonstrate the effectiveness of our algorithm for balancing revenue and fairness.
    Target-Driven Structured Transformer Planner for Vision-Language Navigation. (arXiv:2207.11201v1 [cs.CV])
    Vision-language navigation is the task of directing an embodied agent to navigate in 3D scenes with natural language instructions. For the agent, inferring the long-term navigation target from visual-linguistic clues is crucial for reliable path planning, which, however, has rarely been studied before in literature. In this article, we propose a Target-Driven Structured Transformer Planner (TD-STP) for long-horizon goal-guided and room layout-aware navigation. Specifically, we devise an Imaginary Scene Tokenization mechanism for explicit estimation of the long-term target (even located in unexplored environments). In addition, we design a Structured Transformer Planner which elegantly incorporates the explored room layout into a neural attention architecture for structured and global planning. Experimental results demonstrate that our TD-STP substantially improves previous best methods' success rate by 2% and 5% on the test set of R2R and REVERIE benchmarks, respectively. Our code is available at https://github.com/YushengZhao/TD-STP .
    SPRT-based Efficient Best Arm Identification in Stochastic Bandits. (arXiv:2207.11158v1 [stat.ML])
    This paper investigates the best arm identification (BAI) problem in stochastic multi-armed bandits in the fixed confidence setting. The general class of the exponential family of bandits is considered. The state-of-the-art algorithms for the exponential family of bandits face computational challenges. To mitigate these challenges, a novel framework is proposed, which views the BAI problem as sequential hypothesis testing, and is amenable to tractable analysis for the exponential family of bandits. Based on this framework, a BAI algorithm is designed that leverages the canonical sequential probability ratio tests. This algorithm has three features for both settings: (1) its sample complexity is asymptotically optimal, (2) it is guaranteed to be $\delta-$PAC, and (3) it addresses the computational challenge of the state-of-the-art approaches. Specifically, these approaches, which are focused only on the Gaussian setting, require Thompson sampling from the arm that is deemed the best and a challenger arm. This paper analytically shows that identifying the challenger is computationally expensive and that the proposed algorithm circumvents it. Finally, numerical experiments are provided to support the analysis.
    Learning Dialogue Representations from Consecutive Utterances. (arXiv:2205.13568v2 [cs.CL] UPDATED)
    Learning high-quality dialogue representations is essential for solving a variety of dialogue-oriented tasks, especially considering that dialogue systems often suffer from data scarcity. In this paper, we introduce Dialogue Sentence Embedding (DSE), a self-supervised contrastive learning method that learns effective dialogue representations suitable for a wide range of dialogue tasks. DSE learns from dialogues by taking consecutive utterances of the same dialogue as positive pairs for contrastive learning. Despite its simplicity, DSE achieves significantly better representation capability than other dialogue representation and universal sentence representation models. We evaluate DSE on five downstream dialogue tasks that examine dialogue representation at different semantic granularities. Experiments in few-shot and zero-shot settings show that DSE outperforms baselines by a large margin. For example, it achieves 13% average performance improvement over the strongest unsupervised baseline in 1-shot intent classification on 6 datasets. We also provide analyses on the benefits and limitations of our model.
    Improved Relation Networks for End-to-End Speaker Verification and Identification. (arXiv:2203.17218v2 [eess.AS] UPDATED)
    Speaker identification systems in a real-world scenario are tasked to identify a speaker amongst a set of enrolled speakers given just a few samples for each enrolled speaker. This paper demonstrates the effectiveness of meta-learning and relation networks for this use case. We propose improved relation networks for speaker verification and few-shot (unseen) speaker identification. The use of relation networks facilitates joint training of the frontend speaker encoder and the backend model. Inspired by the use of prototypical networks in speaker verification and to increase the discriminability of the speaker embeddings, we train the model to classify samples in the current episode amongst all speakers present in the training set. Furthermore, we propose a new training regime for faster model convergence by extracting more information from a given meta-learning episode with negligible extra computation. We evaluate the proposed techniques on VoxCeleb, SITW and VCTK datasets on the tasks of speaker verification and unseen speaker identification. The proposed approach outperforms the existing approaches consistently on both tasks.
    Controlled Generation of Unseen Faults for Partial and Open-Partial Domain Adaptation. (arXiv:2204.14068v2 [cs.LG] UPDATED)
    New operating conditions can result in a significant performance drop of fault diagnostics models due to the domain shift between the training and the testing data distributions. While several domain adaptation approaches have been proposed to overcome such domain shifts, their application is limited if the fault classes represented in the two domains are not the same. To enable a better transferability of the trained models between two different domains, particularly in setups where only the healthy data class is shared between the two domains, we propose a new framework for Partial and Open-Partial domain adaptation based on generating distinct fault signatures with a Wasserstein GAN. The main contribution of the proposed framework is the controlled synthetic fault data generation with two main distinct characteristics. Firstly, the proposed methodology enables to generate unobserved fault types in the target domain by having only access to the healthy samples in the target domain and faulty samples in the source domain. Secondly, the fault generation can be controlled to precisely generate distinct fault types and fault severity levels. The proposed method is especially suited in extreme domain adaption settings that are particularly relevant in the context of complex and safety-critical systems, where only one class is shared between the two domains. We evaluate the proposed framework on Partial as well as Open-Partial domain adaptation tasks on two bearing fault diagnostics case studies. Our experiments conducted in different label space settings showcase the versatility of the proposed framework. The proposed methodology provided superior results compared to other methods given large domain gaps.
    Tight bounds on the hardness of learning simple nonparametric mixtures. (arXiv:2203.15150v2 [cs.LG] UPDATED)
    We study the problem of learning nonparametric distributions in a finite mixture, and establish tight bounds on the sample complexity for learning the component distributions in such models. Namely, we are given i.i.d. samples from a pdf $f$ where $$ f=\sum_{i=1}^k w_i f_i, \quad\sum_{i=1}^k w_i=1, \quad w_i>0 $$ and we are interested in learning each component $f_i$. Without any assumptions on $f_i$, this problem is ill-posed. In order to identify the components $f_i$, we assume that each $f_i$ can be written as a convolution of a Gaussian and a compactly supported density $\nu_i$ with $\text{supp}(\nu_i)\cap \text{supp}(\nu_j)=\emptyset$. Our main result shows that $(\frac{1}{\varepsilon})^{\Omega(\log\log \frac{1}{\varepsilon})}$ samples are required for estimating each $f_i$. Unlike parametric mixtures, the difficulty does not arise from the order $k$ or small weights $w_i$, and unlike nonparametric density estimation it does not arise from the curse of dimensionality, irregularity, or inhomogeneity. The proof relies on a fast rate for approximation with Gaussians, which may be of independent interest. To show this is tight, we also propose an algorithm that uses $(\frac{1}{\varepsilon})^{O(\log\log \frac{1}{\varepsilon})}$ samples to estimate each $f_i$. Unlike existing approaches to learning latent variable models based on moment-matching and tensor methods, our proof instead involves a delicate analysis of an ill-conditioned linear system via orthogonal functions. Combining these bounds, we conclude that the optimal sample complexity of this problem properly lies in between polynomial and exponential, which is not common in learning theory.
    Efficient Automated Deep Learning for Time Series Forecasting. (arXiv:2205.05511v3 [cs.LG] UPDATED)
    Recent years have witnessed tremendously improved efficiency of Automated Machine Learning (AutoML), especially Automated Deep Learning (AutoDL) systems, but recent work focuses on tabular, image, or NLP tasks. So far, little attention has been paid to general AutoDL frameworks for time series forecasting, despite the enormous success in applying different novel architectures to such tasks. In this paper, we propose an efficient approach for the joint optimization of neural architecture and hyperparameters of the entire data processing pipeline for time series forecasting. In contrast to common NAS search spaces, we designed a novel neural architecture search space covering various state-of-the-art architectures, allowing for an efficient macro-search over different DL approaches. To efficiently search in such a large configuration space, we use Bayesian optimization with multi-fidelity optimization. We empirically study several different budget types enabling efficient multi-fidelity optimization on different forecasting datasets. Furthermore, we compared our resulting system, dubbed \system, against several established baselines and show that it significantly outperforms all of them across several datasets.
    STOPS: Short-Term-based Volatility-controlled Policy Search and its Global Convergence. (arXiv:2201.09857v5 [cs.LG] UPDATED)
    It remains challenging to deploy existing risk-averse approaches to real-world applications. The reasons are multi-fold, including the lack of global optimality guarantee and the necessity of learning from long-term consecutive trajectories. Long-term consecutive trajectories are prone to involving visiting hazardous states, which is a major concern in the risk-averse setting. This paper proposes Short-Term VOlatility-controlled Policy Search (STOPS), a novel algorithm that solves risk-averse problems by learning from short-term trajectories instead of long-term trajectories. Short-term trajectories are more flexible to generate, and can avoid the danger of hazardous state visitations. By using an actor-critic scheme with an overparameterized two-layer neural network, our algorithm finds a globally optimal policy at a sublinear rate with proximal policy optimization and natural policy gradient, with effectiveness comparable to the state-of-the-art convergence rate of risk-neutral policy-search methods. The algorithm is evaluated on challenging Mujoco robot simulation tasks under the mean-variance evaluation metric. Both theoretical analysis and experimental results demonstrate a state-of-the-art level of STOPS' performance among existing risk-averse policy search methods.
    Post-training Quantization for Neural Networks with Provable Guarantees. (arXiv:2201.11113v2 [cs.LG] UPDATED)
    While neural networks have been remarkably successful in a wide array of applications, implementing them in resource-constrained hardware remains an area of intense research. By replacing the weights of a neural network with quantized (e.g., 4-bit, or binary) counterparts, massive savings in computation cost, memory, and power consumption are attained. To that end, we generalize a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism. Among other things, we propose modifications to promote sparsity of the weights, and rigorously analyze the associated error. Additionally, our error analysis expands the results of previous work on GPFQ to handle general quantization alphabets, showing that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights -- i.e., level of over-parametrization. Our result holds across a range of input distributions and for both fully-connected and convolutional architectures thereby also extending previous results. To empirically evaluate the method, we quantize several common architectures with few bits per weight, and test them on ImageNet, showing only minor loss of accuracy compared to unquantized models. We also demonstrate that standard modifications, such as bias correction and mixed precision quantization, further improve accuracy.
    Differential Geometry for Neural Implicit Models. (arXiv:2201.09263v3 [cs.GR] UPDATED)
    We introduce a neural implicit framework that exploits the differentiable properties of neural networks and the discrete geometry of point-sampled surfaces to approximate them as the level sets of neural implicit functions. To train a neural implicit function, we propose a loss functional that approximates a signed distance function, and allows terms with high-order derivatives, such as the alignment between the principal directions of curvature, to learn more geometric details. During training, we consider a non-uniform sampling strategy based on the curvatures of the point-sampled surface to prioritize points with more geometric details. This sampling implies faster learning while preserving geometric accuracy when compared with previous approaches. We also present the analytical differential geometry formulas for neural surfaces, such as normal vectors and curvatures.
    Improved $\alpha$-GAN architecture for generating 3D connected volumes with an application to radiosurgery treatment planning. (arXiv:2207.11223v1 [eess.IV])
    Generative Adversarial Networks (GANs) have gained significant attention in several computer vision tasks for generating high-quality synthetic data. Various medical applications including diagnostic imaging and radiation therapy can benefit greatly from synthetic data generation due to data scarcity in the domain. However, medical image data is typically kept in 3D space, and generative models suffer from the curse of dimensionality issues in generating such synthetic data. In this paper, we investigate the potential of GANs for generating connected 3D volumes. We propose an improved version of 3D $\alpha$-GAN by incorporating various architectural enhancements. On a synthetic dataset of connected 3D spheres and ellipsoids, our model can generate fully connected 3D shapes with similar geometrical characteristics to that of training data. We also show that our 3D GAN model can successfully generate high-quality 3D tumor volumes and associated treatment specifications (e.g., isocenter locations). Similar moment invariants to the training data as well as fully connected 3D shapes confirm that improved 3D $\alpha$-GAN implicitly learns the training data distribution, and generates realistic-looking samples. The capability of improved 3D $\alpha$-GAN makes it a valuable source for generating synthetic medical image data that can help future research in this domain.  ( 3 min )
    Formulating Event-based Image Reconstruction as a Linear Inverse Problem using Optical Flow. (arXiv:2112.06242v2 [cs.CV] UPDATED)
    Event cameras are novel bio-inspired sensors that measure per-pixel brightness differences asynchronously. Recovering brightness from events is appealing since the reconstructed images inherit the high dynamic range (HDR) and high-speed properties of events; hence they can be used in many robotic vision applications and to generate slow-motion HDR videos. However, state-of-the-art methods tackle this problem by training an event-to-image recurrent neural network (RNN), which lacks explainability and is difficult to tune. In this work we show, for the first time, how tackling the joint problem of motion and brightness estimation leads us to formulate event-based image reconstruction as a linear inverse problem that can be solved without training an image reconstruction RNN. Instead, classical and learning-based image priors can be used to solve the problem and remove artifacts from the reconstructed images. The experiments show that the proposed approach generates images with visual quality on par with state-of-the-art methods despite only using data from a short time interval. The proposed linear formulation and solvers have a unifying character because they can be applied also to reconstruct brightness from the second derivative. Additionally, the linear formulation is attractive because it can be naturally combined with super-resolution, motion-segmentation and color demosaicing.  ( 3 min )
    The Shape Part Slot Machine: Contact-based Reasoning for Generating 3D Shapes from Parts. (arXiv:2112.00584v2 [cs.GR] UPDATED)
    We present the Shape Part Slot Machine, a new method for assembling novel 3D shapes from existing parts by performing contact-based reasoning. Our method represents each shape as a graph of ``slots,'' where each slot is a region of contact between two shape parts. Based on this representation, we design a graph-neural-network-based model for generating new slot graphs and retrieving compatible parts, as well as a gradient-descent-based optimization scheme for assembling the retrieved parts into a complete shape that respects the generated slot graph. This approach does not require any semantic part labels; interestingly, it also does not require complete part geometries -- reasoning about the slots proves sufficient to generate novel, high-quality 3D shapes. We demonstrate that our method generates shapes that outperform existing modeling-by-assembly approaches regarding quality, diversity, and structural complexity.  ( 2 min )
    Flat Latent Manifolds for Human-machine Co-creation of Music. (arXiv:2202.12243v2 [cs.SD] UPDATED)
    The use of machine learning in artistic music generation leads to controversial discussions of the quality of art, for which objective quantification is nonsensical. We therefore consider a music-generating algorithm as a counterpart to a human musician, in a setting where reciprocal interplay is to lead to new experiences, both for the musician and the audience. To obtain this behaviour, we resort to the framework of recurrent Variational Auto-Encoders (VAE) and learn to generate music, seeded by a human musician. In the learned model, we generate novel musical sequences by interpolation in latent space. Standard VAEs however do not guarantee any form of smoothness in their latent representation. This translates into abrupt changes in the generated music sequences. To overcome these limitations, we regularise the decoder and endow the latent space with a flat Riemannian manifold, i.e., a manifold that is isometric to the Euclidean space. As a result, linearly interpolating in the latent space yields realistic and smooth musical changes that fit the type of machine--musician interactions we aim for. We provide empirical evidence for our method via a set of experiments on music datasets and we deploy our model for an interactive jam session with a professional drummer. The live performance provides qualitative evidence that the latent representation can be intuitively interpreted and exploited by the drummer to drive the interplay. Beyond the musical application, our approach showcases an instance of human-centred design of machine-learning models, driven by interpretability and the interaction with the end user.  ( 3 min )
    Prompt Tuning GPT-2 language model for parameter-efficient domain adaptation of ASR systems. (arXiv:2112.08718v3 [cs.CL] UPDATED)
    Automatic Speech Recognition (ASR) systems have found their use in numerous industrial applications in very diverse domains creating a need to adapt to new domains with small memory and deployment overhead. In this work, we introduce domain-prompts, a methodology that involves training a small number of domain embedding parameters to prime a Transformer-based Language Model (LM) to a particular domain. Using this domain-adapted LM for rescoring ASR hypotheses can achieve 7-13% WER reduction for a new domain with just 1000 unlabeled textual domain-specific sentences. This improvement is comparable or even better than fully fine-tuned models even though just 0.02% of the parameters of the base LM are updated. Additionally, our method is deployment-friendly as the learnt domain embeddings are prefixed to the input to the model rather than changing the base model architecture. Therefore, our method is an ideal choice for on-the-fly adaptation of LMs used in ASR systems to progressively scale it to new domains.  ( 2 min )
    Stronger Generalization Guarantees for Robot Learning by Combining Generative Models and Real-World Data. (arXiv:2111.08761v2 [cs.RO] UPDATED)
    We are motivated by the problem of learning policies for robotic systems with rich sensory inputs (e.g., vision) in a manner that allows us to guarantee generalization to environments unseen during training. We provide a framework for providing such generalization guarantees by leveraging a finite dataset of real-world environments in combination with a (potentially inaccurate) generative model of environments. The key idea behind our approach is to utilize the generative model in order to implicitly specify a prior over policies. This prior is updated using the real-world dataset of environments by minimizing an upper bound on the expected cost across novel environments derived via Probably Approximately Correct (PAC)-Bayes generalization theory. We demonstrate our approach on two simulated systems with nonlinear/hybrid dynamics and rich sensing modalities: (i) quadrotor navigation with an onboard vision sensor, and (ii) grasping objects using a depth sensor. Comparisons with prior work demonstrate the ability of our approach to obtain stronger generalization guarantees by utilizing generative models. We also present hardware experiments for validating our bounds for the grasping task.  ( 3 min )
    X-Risk Analysis for AI Research. (arXiv:2206.05862v6 [cs.CY] UPDATED)
    Artificial intelligence (AI) has the potential to greatly improve society, but as with any powerful technology, it comes with heightened risks and responsibilities. Current AI research lacks a systematic discussion of how to manage long-tail risks from AI systems, including speculative long-term risks. Keeping in mind the potential benefits of AI, there is some concern that building ever more intelligent and powerful AI systems could eventually result in systems that are more powerful than us; some say this is like playing with fire and speculate that this could create existential risks (x-risks). To add precision and ground these discussions, we provide a guide for how to analyze AI x-risk, which consists of three parts: First, we review how systems can be made safer today, drawing on time-tested concepts from hazard analysis and systems safety that have been designed to steer large processes in safer directions. Next, we discuss strategies for having long-term impacts on the safety of future systems. Finally, we discuss a crucial concept in making AI systems safer by improving the balance between safety and general capabilities. We hope this document and the presented concepts and tools serve as a useful guide for understanding how to analyze AI x-risk.  ( 3 min )
    Human Treelike Tubular Structure Segmentation: A Comprehensive Review and Future Perspectives. (arXiv:2207.11203v1 [eess.IV])
    Various structures in human physiology follow a treelike morphology, which often expresses complexity at very fine scales. Examples of such structures are intrathoracic airways, retinal blood vessels, and hepatic blood vessels. Large collections of 2D and 3D images have been made available by medical imaging modalities such as magnetic resonance imaging (MRI), computed tomography (CT), Optical coherence tomography (OCT) and ultrasound in which the spatial arrangement can be observed. Segmentation of these structures in medical imaging is of great importance since the analysis of the structure provides insights into disease diagnosis, treatment planning, and prognosis. Manually labelling extensive data by radiologists is often time-consuming and error-prone. As a result, automated or semi-automated computational models have become a popular research field of medical imaging in the past two decades, and many have been developed to date. In this survey, we aim to provide a comprehensive review of currently publicly available datasets, segmentation algorithms, and evaluation metrics. In addition, current challenges and future research directions are discussed.  ( 2 min )
    Machine learning approach in the development of building occupant personas. (arXiv:2207.11239v1 [cs.LG])
    The user persona is a communication tool for designers to generate a mental model that describes the archetype of users. Developing building occupant personas is proven to be an effective method for human-centered smart building design, which considers occupant comfort, behavior, and energy consumption. Optimization of building energy consumption also requires a deep understanding of occupants' preferences and behaviors. The current approaches to developing building occupant personas face a major obstruction of manual data processing and analysis. In this study, we propose and evaluate a machine learning-based semi-automated approach to generate building occupant personas. We investigate the 2015 Residential Energy Consumption Dataset with five machine learning techniques - Linear Discriminant Analysis, K Nearest Neighbors, Decision Tree (Random Forest), Support Vector Machine, and AdaBoost classifier - for the prediction of 16 occupant characteristics, such as age, education, and, thermal comfort. The models achieve an average accuracy of 61% and accuracy over 90% for attributes including the number of occupants in the household, their age group, and preferred usage of heating or cooling equipment. The results of the study show the feasibility of using machine learning techniques for the development of building occupant persona to minimize human effort.  ( 2 min )
    NASA: Neural Articulated Shape Approximation. (arXiv:1912.03207v5 [cs.CV] UPDATED)
    Efficient representation of articulated objects such as human bodies is an important problem in computer vision and graphics. To efficiently simulate deformation, existing approaches represent 3D objects using polygonal meshes and deform them using skinning techniques. This paper introduces neural articulated shape approximation (NASA), an alternative framework that enables efficient representation of articulated deformable objects using neural indicator functions that are conditioned on pose. Occupancy testing using NASA is straightforward, circumventing the complexity of meshes and the issue of water-tightness. We demonstrate the effectiveness of NASA for 3D tracking applications, and discuss other potential extensions.  ( 2 min )
    Concept Identification for Complex Engineering Datasets. (arXiv:2206.06155v2 [cs.LG] UPDATED)
    Finding meaningful concepts in engineering application datasets which allow for a sensible grouping of designs is very helpful in many contexts. It allows for determining different groups of designs with similar properties and provides useful knowledge in the engineering decision making process. Also, it opens the route for further refinements of specific design candidates which exhibit certain characteristic features. In this work, an approach to define meaningful and consistent concepts in an existing engineering dataset is presented. The designs in the dataset are characterized by a multitude of features such as design parameters, geometrical properties or performance values of the design for various boundary conditions. In the proposed approach the complete feature set is partitioned into several subsets called description spaces. The definition of the concepts respects this partitioning which leads to several desired properties of the identified concepts. This cannot be achieved with state-of-the-art clustering or concept identification approaches. A novel concept quality measure is proposed, which provides an objective value for a given definition of concepts in a dataset. The usefulness of the measure is demonstrated by considering a realistic engineering dataset consisting of about 2500 airfoil profiles, for which the performance values (lift and drag) for three different operating conditions were obtained by a computational fluid dynamics simulation. A numerical optimization procedure is employed, which maximizes the concept quality measure and finds meaningful concepts for different setups of the description spaces, while also incorporating user preference. It is demonstrated how these concepts can be used to select archetypal representatives of the dataset which exhibit characteristic features of each concept.  ( 3 min )
    Deep Portrait Delighting. (arXiv:2203.12088v5 [cs.CV] UPDATED)
    We present a deep neural network for removing undesirable shading features from an unconstrained portrait image, recovering the underlying texture. Our training scheme incorporates three regularization strategies: masked loss, to emphasize high-frequency shading features; soft-shadow loss, which improves sensitivity to subtle changes in lighting; and shading-offset estimation, to supervise separation of shading and texture. Our method demonstrates improved delighting quality and generalization when compared with the state-of-the-art. We further demonstrate how our delighting method can enhance the performance of light-sensitive computer vision tasks such as face relighting and semantic parsing, allowing them to handle extreme lighting conditions.  ( 2 min )
    Relaxing the I.I.D. Assumption: Adaptively Minimax Optimal Regret via Root-Entropic Regularization. (arXiv:2007.06552v3 [stat.ML] UPDATED)
    We consider prediction with expert advice when data are generated from distributions varying arbitrarily within an unknown constraint set. This semi-adversarial setting includes (at the extremes) the classical i.i.d. setting, when the unknown constraint set is restricted to be a singleton, and the unconstrained adversarial setting, when the constraint set is the set of all distributions. The Hedge algorithm -- long known to be minimax (rate) optimal in the adversarial regime -- was recently shown to be simultaneously minimax optimal for i.i.d. data. In this work, we propose to relax the i.i.d. assumption by seeking adaptivity at all levels of a natural ordering on constraint sets. We provide matching upper and lower bounds on the minimax regret at all levels, show that Hedge with deterministic learning rates is suboptimal outside of the extremes, and prove that one can adaptively obtain minimax regret at all levels. We achieve this optimal adaptivity using the follow-the-regularized-leader (FTRL) framework, with a novel adaptive regularization scheme that implicitly scales as the square root of the entropy of the current predictive distribution, rather than the entropy of the initial predictive distribution. Finally, we provide novel technical tools to study the statistical performance of FTRL along the semi-adversarial spectrum.  ( 3 min )
    Optimal Model Averaging of Support Vector Machines in Diverging Model Spaces. (arXiv:2112.12961v3 [stat.ML] UPDATED)
    Support vector machine (SVM) is a powerful classification method that has achieved great success in many fields. Since its performance can be seriously impaired by redundant covariates, model selection techniques are widely used for SVM with high dimensional covariates. As an alternative to model selection, significant progress has been made in the area of model averaging in the past decades. Yet no frequentist model averaging method was considered for SVM. This work aims to fill the gap and to propose a frequentist model averaging procedure for SVM which selects the optimal weight by cross validation. Even when the number of covariates diverges at an exponential rate of the sample size, we show asymptotic optimality of the proposed method in the sense that the ratio of its hinge loss to the lowest possible loss converges to one. We also derive the convergence rate which provides more insights to model averaging. Compared to model selection methods of SVM which require a tedious but critical task of tuning parameter selection, the model averaging method avoids the task and shows promising performances in the empirical studies.  ( 3 min )
    Learning for MPC with Stability & Safety Guarantees. (arXiv:2012.07369v2 [cs.LG] UPDATED)
    The combination of learning methods with Model Predictive Control (MPC) has attracted a significant amount of attention in the recent literature. The hope of this combination is to reduce the reliance of MPC schemes on accurate models, and to tap into the fast developing machine learning and reinforcement learning tools to exploit the growing amount of data available for many systems. In particular, the combination of reinforcement learning and MPC has been proposed as a viable and theoretically justified approach to introduce explainable, safe and stable policies in reinforcement learning. However, a formal theory detailing how the safety and stability of an MPC-based policy can be maintained through the parameter updates delivered by the learning tools is still lacking. This paper addresses this gap. The theory is developed for the generic Robust MPC case, and applied in simulation in the robust tube-based linear MPC case, where the theory is fairly easy to deploy in practice. The paper focuses on Reinforcement Learning as a learning tool, but it applies to any learning method that updates the MPC parameters online.  ( 2 min )
    Composing Neural Learning and Symbolic Reasoning with an Application to Visual Discrimination. (arXiv:1907.05878v2 [cs.LG] UPDATED)
    We consider the problem of combining machine learning models to perform higher-level cognitive tasks with clear specifications. We propose the novel problem of Visual Discrimination Puzzles (VDP) that requires finding interpretable discriminators that classify images according to a logical specification. Humans can solve these puzzles with ease and they give robust, verifiable, and interpretable discriminators as answers. We propose a compositional neurosymbolic framework that combines a neural network to detect objects and relationships with a symbolic learner that finds interpretable discriminators. We create large classes of VDP datasets involving natural and artificial images and show that our neurosymbolic framework performs favorably compared to several purely neural approaches.  ( 2 min )
    Domain Generalization for Activity Recognition via Adaptive Feature Fusion. (arXiv:2207.11221v1 [cs.CV])
    Human activity recognition requires the efforts to build a generalizable model using the training datasets with the hope to achieve good performance in test datasets. However, in real applications, the training and testing datasets may have totally different distributions due to various reasons such as different body shapes, acting styles, and habits, damaging the model's generalization performance. While such a distribution gap can be reduced by existing domain adaptation approaches, they typically assume that the test data can be accessed in the training stage, which is not realistic. In this paper, we consider a more practical and challenging scenario: domain-generalized activity recognition (DGAR) where the test dataset \emph{cannot} be accessed during training. To this end, we propose \emph{Adaptive Feature Fusion for Activity Recognition~(AFFAR)}, a domain generalization approach that learns to fuse the domain-invariant and domain-specific representations to improve the model's generalization performance. AFFAR takes the best of both worlds where domain-invariant representations enhance the transferability across domains and domain-specific representations leverage the model discrimination power from each domain. Extensive experiments on three public HAR datasets show its effectiveness. Furthermore, we apply AFFAR to a real application, i.e., the diagnosis of Children's Attention Deficit Hyperactivity Disorder~(ADHD), which also demonstrates the superiority of our approach.  ( 3 min )
    Doubly-Valid/Doubly-Sharp Sensitivity Analysis for Causal Inference with Unmeasured Confounding. (arXiv:2112.11449v2 [stat.ME] UPDATED)
    We consider the problem of constructing bounds on the average treatment effect (ATE) when unmeasured confounders exist but have bounded influence. Specifically, we assume that omitted confounders could not change the odds of treatment for any unit by more than a fixed factor. We derive the sharp partial identification bounds implied by this assumption by leveraging distributionally robust optimization, and we propose estimators of these bounds with several novel robustness properties. The first is double sharpness: our estimators consistently estimate the sharp ATE bounds when one of two nuisance parameters is misspecified and achieve semiparametric efficiency when all nuisance parameters are suitably consistent. The second is double validity: even when most nuisance parameters are misspecified, our estimators still provide valid but possibly conservative bounds for the ATE and our Wald confidence intervals remain valid even when our estimators are not asymptotically normal. As a result, our estimators provide a highly credible method for sensitivity analysis of causal inferences.  ( 2 min )
    Statistical and Computational Trade-offs in Variational Inference: A Case Study in Inferential Model Selection. (arXiv:2207.11208v1 [stat.ML])
    Variational inference has recently emerged as a popular alternative to the classical Markov chain Monte Carlo (MCMC) in large-scale Bayesian inference. The core idea of variational inference is to trade statistical accuracy for computational efficiency. It aims to approximate the posterior, reducing computation costs but potentially compromising its statistical accuracy. In this work, we study this statistical and computational trade-off in variational inference via a case study in inferential model selection. Focusing on Gaussian inferential models (a.k.a. variational approximating families) with diagonal plus low-rank precision matrices, we initiate a theoretical study of the trade-offs in two aspects, Bayesian posterior inference error and frequentist uncertainty quantification error. From the Bayesian posterior inference perspective, we characterize the error of the variational posterior relative to the exact posterior. We prove that, given a fixed computation budget, a lower-rank inferential model produces variational posteriors with a higher statistical approximation error, but a lower computational error; it reduces variances in stochastic optimization and, in turn, accelerates convergence. From the frequentist uncertainty quantification perspective, we consider the precision matrix of the variational posterior as an uncertainty estimate. We find that, relative to the true asymptotic precision, the variational approximation suffers from an additional statistical error originating from the sampling uncertainty of the data. Moreover, this statistical error becomes the dominant factor as the computation budget increases. As a consequence, for small datasets, the inferential model need not be full-rank to achieve optimal estimation error. We finally demonstrate these statistical and computational trade-offs inference across empirical studies, corroborating the theoretical findings.
    TaDaa: real time Ticket Assignment Deep learning Auto Advisor for customer support, help desk, and issue ticketing systems. (arXiv:2207.11187v1 [cs.IR])
    This paper proposes TaDaa: Ticket Assignment Deep learning Auto Advisor, which leverages the latest Transformers models and machine learning techniques quickly assign issues within an organization, like customer support, help desk and alike issue ticketing systems. The project provides functionality to 1) assign an issue to the correct group, 2) assign an issue to the best resolver, and 3) provide the most relevant previously solved tickets to resolvers. We leverage one ticketing system sample dataset, with over 3k+ groups and over 10k+ resolvers to obtain a 95.2% top 3 accuracy on group suggestions and a 79.0% top 5 accuracy on resolver suggestions. We hope this research will greatly improve average issue resolution time on customer support, help desk, and issue ticketing systems.
    Improved lightweight identification of agricultural diseases based on MobileNetV3. (arXiv:2207.11238v1 [cs.CV])
    At present, the identification of agricultural pests and diseases has the problem that the model is not lightweight enough and difficult to apply. Based on MobileNetV3, this paper introduces the Coordinate Attention block. The parameters of MobileNetV3-large are reduced by 22%, the model size is reduced by 19.7%, and the accuracy is improved by 0.92%. The parameters of MobileNetV3-small are reduced by 23.4%, the model size is reduced by 18.3%, and the accuracy is increased by 0.40%. In addition, the improved MobileNetV3-small was migrated to Jetson Nano for testing. The accuracy increased by 2.48% to 98.31%, and the inference speed increased by 7.5%. It provides a reference for deploying the agricultural pest identification model to embedded devices.
    Discrete Key-Value Bottleneck. (arXiv:2207.11240v1 [cs.LG])
    Deep neural networks perform well on prediction and classification tasks in the canonical setting where data streams are i.i.d., labeled data is abundant, and class labels are balanced. Challenges emerge with distribution shifts, including non-stationary or imbalanced data streams. One powerful approach that has addressed this challenge involves self-supervised pretraining of large encoders on volumes of unlabeled data, followed by task-specific tuning. Given a new task, updating the weights of these encoders is challenging as a large number of weights needs to be fine-tuned, and as a result, they forget information about the previous tasks. In the present work, we propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable (key, value) codes. In this setup, we follow the encode; process the representation via a discrete bottleneck; and decode paradigm, where the input is fed to the pretrained encoder, the output of the encoder is used to select the nearest keys, and the corresponding values are fed to the decoder to solve the current task. The model can only fetch and re-use a limited number of these (key, value) pairs during inference, enabling localized and context-dependent model updates. We theoretically investigate the ability of the proposed model to minimize the effect of the distribution shifts and show that such a discrete bottleneck with (key, value) pairs reduces the complexity of the hypothesis class. We empirically verified the proposed methods' benefits under challenging distribution shift scenarios across various benchmark datasets and show that the proposed model reduces the common vulnerability to non-i.i.d. and non-stationary training distributions compared to various other baselines.
    Progressive Deblurring of Diffusion Models for Coarse-to-Fine Image Synthesis. (arXiv:2207.11192v1 [cs.CV])
    Recently, diffusion models have shown remarkable results in image synthesis by gradually removing noise and amplifying signals. Although the simple generative process surprisingly works well, is this the best way to generate image data? For instance, despite the fact that human perception is more sensitive to the low frequencies of an image, diffusion models themselves do not consider any relative importance of each frequency component. Therefore, to incorporate the inductive bias for image data, we propose a novel generative process that synthesizes images in a coarse-to-fine manner. First, we generalize the standard diffusion models by enabling diffusion in a rotated coordinate system with different velocities for each component of the vector. We further propose a blur diffusion as a special case, where each frequency component of an image is diffused at different speeds. Specifically, the proposed blur diffusion consists of a forward process that blurs an image and adds noise gradually, after which a corresponding reverse process deblurs an image and removes noise progressively. Experiments show that the proposed model outperforms the previous method in FID on LSUN bedroom and church datasets. Code is available at https://github.com/sangyun884/blur-diffusion.
    Quantized Sparse Weight Decomposition for Neural Network Compression. (arXiv:2207.11048v1 [cs.LG])
    In this paper, we introduce a novel method of neural network weight compression. In our method, we store weight tensors as sparse, quantized matrix factors, whose product is computed on the fly during inference to generate the target model's weights. We use projected gradient descent methods to find quantized and sparse factorization of the weight tensors. We show that this approach can be seen as a unification of weight SVD, vector quantization, and sparse PCA. Combined with end-to-end fine-tuning our method exceeds or is on par with previous state-of-the-art methods in terms of the trade-off between accuracy and model size. Our method is applicable to both moderate compression regimes, unlike vector quantization, and extreme compression regimes.
    Optimism in Face of a Context: Regret Guarantees for Stochastic Contextual MDP. (arXiv:2207.11126v1 [cs.LG])
    We present regret minimization algorithms for stochastic contextual MDPs under minimum reachability assumption, using an access to an offline least square regression oracle. We analyze three different settings: where the dynamics is known, where the dynamics is unknown but independent of the context and the most challenging setting where the dynamics is unknown and context-dependent. For the latter, our algorithm obtains $ \tilde{O}\left( \max\{H,{1}/{p_{min}}\}H|S|^{3/2}\sqrt{|A|T\log(\max\{|\mathcal{F}|,|\mathcal{P}|\}/\delta)} \right)$ regret bound, with probability $1-\delta$, where $\mathcal{P}$ and $\mathcal{F}$ are finite and realizable function classes used to approximate the dynamics and rewards respectively, $p_{min}$ is the minimum reachability parameter, $S$ is the set of states, $A$ the set of actions, $H$ the horizon, and $T$ the number of episodes. To our knowledge, our approach is the first optimistic approach applied to contextual MDPs with general function approximation (i.e., without additional knowledge regarding the function class, such as it being linear and etc.). In addition, we present a lower bound of $\Omega(\sqrt{T H |S| |A| \ln(|\mathcal{F}|/|S|)/\ln(|A|)})$, on the expected regret which holds even in the case of known dynamics.
    MobileDenseNet: A new approach to object detection on mobile devices. (arXiv:2207.11031v1 [cs.CV])
    Object detection problem solving has developed greatly within the past few years. There is a need for lighter models in instances where hardware limitations exist, as well as a demand for models to be tailored to mobile devices. In this article, we will assess the methods used when creating algorithms that address these issues. The main goal of this article is to increase accuracy in state-of-the-art algorithms while maintaining speed and real-time efficiency. The most significant issues in one-stage object detection pertains to small objects and inaccurate localization. As a solution, we created a new network by the name of MobileDenseNet suitable for embedded systems. We also developed a light neck FCPNLite for mobile devices that will aid with the detection of small objects. Our research revealed that very few papers cited necks in embedded systems. What differentiates our network from others is our use of concatenation features. A small yet significant change to the head of the network amplified accuracy without increasing speed or limiting parameters. In short, our focus on the challenging CoCo and Pascal VOC datasets were 24.8 and 76.8 in percentage terms respectively - a rate higher than that recorded by other state-of-the-art systems thus far. Our network is able to increase accuracy while maintaining real-time efficiency on mobile devices. We calculated operational speed on Pixel 3 (Snapdragon 845) to 22.8 fps. The source code of this research is available on https://github.com/hajizadeh/MobileDenseNet.
    Decentralized scheduling through an adaptive, trading-based multi-agent system. (arXiv:2207.11172v1 [cs.AI])
    In multi-agent reinforcement learning systems, the actions of one agent can have a negative impact on the rewards of other agents. One way to combat this problem is to let agents trade their rewards amongst each other. Motivated by this, this work applies a trading approach to a simulated scheduling environment, where the agents are responsible for the assignment of incoming jobs to compute cores. In this environment, reinforcement learning agents learn to trade successfully. The agents can trade the usage right of computational cores to process high-priority, high-reward jobs faster than low-priority, low-reward jobs. However, due to combinatorial effects, the action and observation spaces of a simple reinforcement learning agent in this environment scale exponentially with key parameters of the problem size. However, the exponential scaling behavior can be transformed into a linear one if the agent is split into several independent sub-units. We further improve this distributed architecture using agent-internal parameter sharing. Moreover, it can be extended to set the exchange prices autonomously. We show that in our scheduling environment, the advantages of a distributed agent architecture clearly outweigh more aggregated approaches. We demonstrate that the distributed agent architecture becomes even more performant using agent-internal parameter sharing. Finally, we investigate how two different reward functions affect autonomous pricing and the corresponding scheduling.
    Learn Continuously, Act Discretely: Hybrid Action-Space Reinforcement Learning For Optimal Execution. (arXiv:2207.11152v1 [q-fin.TR])
    Optimal execution is a sequential decision-making problem for cost-saving in algorithmic trading. Studies have found that reinforcement learning (RL) can help decide the order-splitting sizes. However, a problem remains unsolved: how to place limit orders at appropriate limit prices? The key challenge lies in the "continuous-discrete duality" of the action space. On the one hand, the continuous action space using percentage changes in prices is preferred for generalization. On the other hand, the trader eventually needs to choose limit prices discretely due to the existence of the tick size, which requires specialization for every single stock with different characteristics (e.g., the liquidity and the price range). So we need continuous control for generalization and discrete control for specialization. To this end, we propose a hybrid RL method to combine the advantages of both of them. We first use a continuous control agent to scope an action subset, then deploy a fine-grained agent to choose a specific limit price. Extensive experiments show that our method has higher sample efficiency and better training stability than existing RL algorithms and significantly outperforms previous learning-based methods for order execution.
    Verifying Fairness in Quantum Machine Learning. (arXiv:2207.11173v1 [quant-ph])
    Due to the beyond-classical capability of quantum computing, quantum machine learning is applied independently or embedded in classical models for decision making, especially in the field of finance. Fairness and other ethical issues are often one of the main concerns in decision making. In this work, we define a formal framework for the fairness verification and analysis of quantum machine learning decision models, where we adopt one of the most popular notions of fairness in the literature based on the intuition -- any two similar individuals must be treated similarly and are thus unbiased. We show that quantum noise can improve fairness and develop an algorithm to check whether a (noisy) quantum machine learning model is fair. In particular, this algorithm can find bias kernels of quantum data (encoding individuals) during checking. These bias kernels generate infinitely many bias pairs for investigating the unfairness of the model. Our algorithm is designed based on a highly efficient data structure -- Tensor Networks -- and implemented on Google's TensorFlow Quantum. The utility and effectiveness of our algorithm are confirmed by the experimental results, including income prediction and credit scoring on real-world data, for a class of random (noisy) quantum decision models with 27 qubits ($2^{27}$-dimensional state space) tripling ($2^{18}$ times more than) that of the state-of-the-art algorithms for verifying quantum machine learning models.
    Latent Space Unsupervised Semantic Segmentation. (arXiv:2207.11067v1 [cs.LG])
    The development of compact and energy-efficient wearable sensors has led to an increase in the availability of biosignals. To analyze these continuously recorded, and often multidimensional, time series at scale, being able to conduct meaningful unsupervised data segmentation is an auspicious target. A common way to achieve this is to identify change-points within the time series as the segmentation basis. However, traditional change-point detection algorithms often come with drawbacks, limiting their real-world applicability. Notably, they generally rely on the complete time series to be available and thus cannot be used for real-time applications. Another common limitation is that they poorly (or cannot) handle the segmentation of multidimensional time series. Consequently, the main contribution of this work is to propose a novel unsupervised segmentation algorithm for multidimensional time series named Latent Space Unsupervised Semantic Segmentation (LS-USS), which was designed to work easily with both online and batch data. When comparing LS-USS against other state-of-the-art change-point detection algorithms on a variety of real-world datasets, in both the offline and real-time setting, LS-USS systematically achieves on par or better performances.
    Low cost prediction of probability distributions of molecular properties for early virtual screening. (arXiv:2207.11174v1 [q-bio.BM])
    While there is a general focus on predictions of values, mathematically more appropriate is prediction of probability distributions: with additional possibilities like prediction of uncertainty, higher moments and quantiles. For the purpose of the computer-aided drug design field, this article applies Hierarchical Correlation Reconstruction approach, previously applied in the analysis of demographic, financial and astronomical data. Instead of a single linear regression to predict values, it uses multiple linear regressions to independently predict multiple moments, finally combining them into predicted probability distribution, here of several ADMET properties based on substructural fingerprint developed by Klekota\&Roth. Discussed application example is inexpensive selection of a percentage of molecules with properties nearly certain to be in a predicted or chosen range during virtual screening. Such an approach can facilitate the interpretation of the results as the predictions characterized by high rate of uncertainty are automatically detected. In addition, for each of the investigated predictive problems, we detected crucial structural features, which should be carefully considered when optimizing compounds towards particular property. The whole methodology developed in the study constitutes therefore a great support for medicinal chemists, as it enable fast rejection of compounds with the lowest potential of desired physicochemical/ADMET characteristic and guides the compound optimization process.
    Generalized Identifiability Bounds for Mixture Models with Grouped Samples. (arXiv:2207.11164v1 [math.ST])
    Recent work has shown that finite mixture models with $m$ components are identifiable, while making no assumptions on the mixture components, so long as one has access to groups of samples of size $2m-1$ which are known to come from the same mixture component. In this work we generalize that result and show that, if every subset of $k$ mixture components of a mixture model are linearly independent, then that mixture model is identifiable with only $(2m-1)/(k-1)$ samples per group. We further show that this value cannot be improved. We prove an analogous result for a stronger form of identifiability known as "determinedness" along with a corresponding lower bound. This independence assumption almost surely holds if mixture components are chosen randomly from a $k$-dimensional space. We describe some implications of our results for multinomial mixture models and topic modeling.
    Learning to identify cracks on wind turbine blade surfaces using drone-based inspection images. (arXiv:2207.11186v1 [cs.CV])
    Wind energy is expected to be one of the leading ways to achieve the goals of the Paris Agreement but it in turn heavily depends on effective management of its operations and maintenance (O&M) costs. Blade failures account for one-third of all O&M costs thus making accurate detection of blade damages, especially cracks, very important for sustained operations and cost savings. Traditionally, damage inspection has been a completely manual process thus making it subjective, error-prone, and time-consuming. Hence in this work, we bring more objectivity, scalability, and repeatability in our damage inspection process, using deep learning, to miss fewer cracks. We build a deep learning model trained on a large dataset of blade damages, collected by our drone-based inspection, to correctly detect cracks. Our model is already in production and has processed more than a million damages with a recall of 0.96. We also focus on model interpretability using class activation maps to get a peek into the model workings. The model not only performs as good as human experts but also better in certain tricky cases. Thus, in this work, we aim to increase wind energy adoption by decreasing one of its major hurdles - the O\&M costs resulting from missing blade failures like cracks.
    Custom Structure Preservation in Face Aging. (arXiv:2207.11025v1 [cs.CV])
    In this work, we propose a novel architecture for face age editing that can produce structural modifications while maintaining relevant details present in the original image. We disentangle the style and content of the input image and propose a new decoder network that adopts a style-based strategy to combine the style and content representations of the input image while conditioning the output on the target age. We go beyond existing aging methods allowing users to adjust the degree of structure preservation in the input image during inference. To this purpose, we introduce a masking mechanism, the CUstom Structure Preservation module, that distinguishes relevant regions in the input image from those that should be discarded. CUSP requires no additional supervision. Finally, our quantitative and qualitative analysis which include a user study, show that our method outperforms prior art and demonstrates the effectiveness of our strategy regarding image editing and adjustable structure preservation. Code and pretrained models are available at https://github.com/guillermogotre/CUSP.
    On Controller Tuning with Time-Varying Bayesian Optimization. (arXiv:2207.11120v1 [cs.LG])
    Changing conditions or environments can cause system dynamics to vary over time. To ensure optimal control performance, controllers should adapt to these changes. When the underlying cause and time of change is unknown, we need to rely on online data for this adaptation. In this paper, we will use time-varying Bayesian optimization (TVBO) to tune controllers online in changing environments using appropriate prior knowledge on the control objective and its changes. Two properties are characteristic of many online controller tuning problems: First, they exhibit incremental and lasting changes in the objective due to changes to the system dynamics, e.g., through wear and tear. Second, the optimization problem is convex in the tuning parameters. Current TVBO methods do not explicitly account for these properties, resulting in poor tuning performance and many unstable controllers through over-exploration of the parameter space. We propose a novel TVBO forgetting strategy using Uncertainty-Injection (UI), which incorporates the assumption of incremental and lasting changes. The control objective is modeled as a spatio-temporal Gaussian process (GP) with UI through a Wiener process in the temporal domain. Further, we explicitly model the convexity assumptions in the spatial dimension through GP models with linear inequality constraints. In numerical experiments, we show that our model outperforms the state-of-the-art method in TVBO, exhibiting reduced regret and fewer unstable parameter configurations.
    DeVIS: Making Deformable Transformers Work for Video Instance Segmentation. (arXiv:2207.11103v1 [cs.CV])
    Video Instance Segmentation (VIS) jointly tackles multi-object detection, tracking, and segmentation in video sequences. In the past, VIS methods mirrored the fragmentation of these subtasks in their architectural design, hence missing out on a joint solution. Transformers recently allowed to cast the entire VIS task as a single set-prediction problem. Nevertheless, the quadratic complexity of existing Transformer-based methods requires long training times, high memory requirements, and processing of low-single-scale feature maps. Deformable attention provides a more efficient alternative but its application to the temporal domain or the segmentation task have not yet been explored. In this work, we present Deformable VIS (DeVIS), a VIS method which capitalizes on the efficiency and performance of deformable Transformers. To reason about all VIS subtasks jointly over multiple frames, we present temporal multi-scale deformable attention with instance-aware object queries. We further introduce a new image and video instance mask head with multi-scale features, and perform near-online video processing with multi-cue clip tracking. DeVIS reduces memory as well as training time requirements, and achieves state-of-the-art results on the YouTube-VIS 2021, as well as the challenging OVIS dataset. Code is available at https://github.com/acaelles97/DeVIS.
    Near Real-Time Distributed State Estimation via AI/ML-Empowered 5G Networks. (arXiv:2207.11117v1 [cs.LG])
    Fifth-Generation (5G) networks have a potential to accelerate power system transition to a flexible, softwarized, data-driven, and intelligent grid. With their evolving support for Machine Learning (ML)/Artificial Intelligence (AI) functions, 5G networks are expected to enable novel data-centric Smart Grid (SG) services. In this paper, we explore how data-driven SG services could be integrated with ML/AI-enabled 5G networks in a symbiotic relationship. We focus on the State Estimation (SE) function as a key element of the energy management system and focus on two main questions. Firstly, in a tutorial fashion, we present an overview on how distributed SE can be integrated with the elements of the 5G core network and radio access network architecture. Secondly, we present and compare two powerful distributed SE methods based on: i) graphical models and belief propagation, and ii) graph neural networks. We discuss their performance and capability to support a near real-time distributed SE via 5G network, taking into account communication delays.
    Lagrangian Method for Q-Function Learning (with Applications to Machine Translation). (arXiv:2207.11161v1 [cs.LG])
    This paper discusses a new approach to the fundamental problem of learning optimal Q-functions. In this approach, optimal Q-functions are formulated as saddle points of a nonlinear Lagrangian function derived from the classic Bellman optimality equation. The paper shows that the Lagrangian enjoys strong duality, in spite of its nonlinearity, which paves the way to a general Lagrangian method to Q-function learning. As a demonstration, the paper develops an imitation learning algorithm based on the duality theory, and applies the algorithm to a state-of-the-art machine translation benchmark. The paper then turns to demonstrate a symmetry breaking phenomenon regarding the optimality of the Lagrangian saddle points, which justifies a largely overlooked direction in developing the Lagrangian method.
    A Transferable Intersection Reconstruction Network for Traffic Speed Prediction. (arXiv:2207.11030v1 [cs.LG])
    Traffic speed prediction is the key to many valuable applications, and it is also a challenging task because of its various influencing factors. Recent work attempts to obtain more information through various hybrid models, thereby improving the prediction accuracy. However, the spatial information acquisition schemes of these methods have two-level differentiation problems. Either the modeling is simple but contains little spatial information, or the modeling is complete but lacks flexibility. In order to introduce more spatial information on the basis of ensuring flexibility, this paper proposes IRNet (Transferable Intersection Reconstruction Network). First, this paper reconstructs the intersection into a virtual intersection with the same structure, which simplifies the topology of the road network. Then, the spatial information is subdivided into intersection information and sequence information of traffic flow direction, and spatiotemporal features are obtained through various models. Third, a self-attention mechanism is used to fuse spatiotemporal features for prediction. In the comparison experiment with the baseline, not only the prediction effect, but also the transfer performance has obvious advantages.
    Classification via score-based generative modelling. (arXiv:2207.11091v1 [cs.LG])
    In this work, we investigated the application of score-based gradient learning in discriminative and generative classification settings. Score function can be used to characterize data distribution as an alternative to density. It can be efficiently learned via score matching, and used to flexibly generate credible samples to enhance discriminative classification quality, to recover density and to build generative classifiers. We analysed the decision theories involving score-based representations, and performed experiments on simulated and real-world datasets, demonstrating its effectiveness in achieving and improving binary classification performance, and robustness to perturbations, particularly in high dimensions and imbalanced situations.
    Domain Generalization by Mutual-Information Regularization with Pre-trained Models. (arXiv:2203.10789v2 [cs.LG] UPDATED)
    Domain generalization (DG) aims to learn a generalized model to an unseen target domain using only limited source domains. Previous attempts to DG fail to learn domain-invariant representations only from the source domains due to the significant domain shifts between training and test domains. Instead, we re-formulate the DG objective using mutual information with the oracle model, a model generalized to any possible domain. We derive a tractable variational lower bound via approximating the oracle model by a pre-trained model, called Mutual Information Regularization with Oracle (MIRO). Our extensive experiments show that MIRO significantly improves the out-of-distribution performance. Furthermore, our scaling experiments show that the larger the scale of the pre-trained model, the greater the performance improvement of MIRO. Source code is available at https://github.com/kakaobrain/miro.
    Towards Global Optimality in Cooperative MARL with Sequential Transformation. (arXiv:2207.11143v1 [cs.MA])
    Policy learning in multi-agent reinforcement learning (MARL) is challenging due to the exponential growth of joint state-action space with respect to the number of agents. To achieve higher scalability, the paradigm of centralized training with decentralized execution (CTDE) is broadly adopted with factorized structure in MARL. However, we observe that existing CTDE algorithms in cooperative MARL cannot achieve optimality even in simple matrix games. To understand this phenomenon, we introduce a framework of Generalized Multi-Agent Actor-Critic with Policy Factorization (GPF-MAC), which characterizes the learning of factorized joint policies, i.e., each agent's policy only depends on its own observation-action history. We show that most popular CTDE MARL algorithms are special instances of GPF-MAC and may be stuck in a suboptimal joint policy. To address this issue, we present a novel transformation framework that reformulates a multi-agent MDP as a special "single-agent" MDP with a sequential structure and can allow employing off-the-shelf single-agent reinforcement learning (SARL) algorithms to efficiently learn corresponding multi-agent tasks. This transformation retains the optimality guarantee of SARL algorithms into cooperative MARL. To instantiate this transformation framework, we propose a Transformed PPO, called T-PPO, which can theoretically perform optimal policy learning in the finite multi-agent MDPs and shows significant outperformance on a large set of cooperative multi-agent tasks.
    Deep learning of diffeomorphisms for optimal reparametrizations of shapes. (arXiv:2207.11141v1 [math.OC])
    In shape analysis, one of the fundamental problems is to align curves or surfaces before computing a (geodesic) distance between these shapes. To find the optimal reparametrization realizing this alignment is a computationally demanding task which leads to an optimization problem on the diffeomorphism group. In this paper, we construct approximations of orientation-preserving diffeomorphisms by composition of elementary diffeomorphisms to solve the approximation problem. We propose a practical algorithm implemented in PyTorch which is applicable both to unparametrized curves and surfaces. We derive universal approximation results and obtain bounds for the Lipschitz constant of the obtained compositions of diffeomorphisms.
    Decoupled Adversarial Contrastive Learning for Self-supervised Adversarial Robustness. (arXiv:2207.10899v1 [cs.CV])
    Adversarial training (AT) for robust representation learning and self-supervised learning (SSL) for unsupervised representation learning are two active research fields. Integrating AT into SSL, multiple prior works have accomplished a highly significant yet challenging task: learning robust representation without labels. A widely used framework is adversarial contrastive learning which couples AT and SSL, and thus constitute a very complex optimization problem. Inspired by the divide-and-conquer philosophy, we conjecture that it might be simplified as well as improved by solving two sub-problems: non-robust SSL and pseudo-supervised AT. This motivation shifts the focus of the task from seeking an optimal integrating strategy for a coupled problem to finding sub-solutions for sub-problems. With this said, this work discards prior practices of directly introducing AT to SSL frameworks and proposed a two-stage framework termed Decoupled Adversarial Contrastive Learning (DeACL). Extensive experimental results demonstrate that our DeACL achieves SOTA self-supervised adversarial robustness while significantly reducing the training time, which validates its effectiveness and efficiency. Moreover, our DeACL constitutes a more explainable solution, and its success also bridges the gap with semi-supervised AT for exploiting unlabeled samples for robust representation learning. The code is publicly accessible at https://github.com/pantheon5100/DeACL.
    Automatic Termination for Hyperparameter Optimization. (arXiv:2104.08166v4 [cs.LG] UPDATED)
    Bayesian optimization (BO) is a widely popular approach for the hyperparameter optimization (HPO) in machine learning. At its core, BO iteratively evaluates promising configurations until a user-defined budget, such as wall-clock time or number of iterations, is exhausted. While the final performance after tuning heavily depends on the provided budget, it is hard to pre-specify an optimal value in advance. In this work, we propose an effective and intuitive termination criterion for BO that automatically stops the procedure if it is sufficiently close to the global optimum. Our key insight is that the discrepancy between the true objective (predictive performance on test data) and the computable target (validation performance) suggests stopping once the suboptimality in optimizing the target is dominated by the statistical estimation error. Across an extensive range of real-world HPO problems and baselines, we show that our termination criterion achieves a better trade-off between the test performance and optimization time. Additionally, we find that overfitting may occur in the context of HPO, which is arguably an overlooked problem in the literature, and show how our termination criterion helps to mitigate this phenomenon on both small and large datasets.
    Analyzing and Mitigating Interference in Neural Architecture Search. (arXiv:2108.12821v3 [cs.CL] UPDATED)
    Weight sharing is a popular approach to reduce the cost of neural architecture search (NAS) by reusing the weights of shared operators from previously trained child models. However, the rank correlation between the estimated accuracy and ground truth accuracy of those child models is low due to the interference among different child models caused by weight sharing. In this paper, we investigate the interference issue by sampling different child models and calculating the gradient similarity of shared operators, and observe: 1) the interference on a shared operator between two child models is positively correlated with the number of different operators; 2) the interference is smaller when the inputs and outputs of the shared operator are more similar. Inspired by these two observations, we propose two approaches to mitigate the interference: 1) MAGIC-T: rather than randomly sampling child models for optimization, we propose a gradual modification scheme by modifying one operator between adjacent optimization steps to minimize the interference on the shared operators; 2) MAGIC-A: forcing the inputs and outputs of the operator across all child models to be similar to reduce the interference. Experiments on a BERT search space verify that mitigating interference via each of our proposed methods improves the rank correlation of super-pet and combining both methods can achieve better results. Our discovered architecture outperforms RoBERTa$_{\rm base}$ by 1.1 and 0.6 points and ELECTRA$_{\rm base}$ by 1.6 and 1.1 points on the dev and test set of GLUE benchmark. Extensive results on the BERT compression, reading comprehension and ImageNet task demonstrate the effectiveness and generality of our proposed methods.
    Explaining Dynamic Graph Neural Networks via Relevance Back-propagation. (arXiv:2207.11175v1 [cs.LG])
    Graph Neural Networks (GNNs) have shown remarkable effectiveness in capturing abundant information in graph-structured data. However, the black-box nature of GNNs hinders users from understanding and trusting the models, thus leading to difficulties in their applications. While recent years witness the prosperity of the studies on explaining GNNs, most of them focus on static graphs, leaving the explanation of dynamic GNNs nearly unexplored. It is challenging to explain dynamic GNNs, due to their unique characteristic of time-varying graph structures. Directly using existing models designed for static graphs on dynamic graphs is not feasible because they ignore temporal dependencies among the snapshots. In this work, we propose DGExplainer to provide reliable explanation on dynamic GNNs. DGExplainer redistributes the output activation score of a dynamic GNN to the relevances of the neurons of its previous layer, which iterates until the relevance scores of the input neuron are obtained. We conduct quantitative and qualitative experiments on real-world datasets to demonstrate the effectiveness of the proposed framework for identifying important nodes for link prediction and node regression for dynamic GNNs.
    High dimensional stochastic linear contextual bandit with missing covariates. (arXiv:2207.11165v1 [stat.ML])
    Recent works in bandit problems adopted lasso convergence theory in the sequential decision-making setting. Even with fully observed contexts, there are technical challenges that hinder the application of existing lasso convergence theory: 1) proving the restricted eigenvalue condition under conditionally sub-Gaussian noise and 2) accounting for the dependence between the context variables and the chosen actions. This paper studies the effect of missing covariates on regret for stochastic linear bandit algorithms. Our work provides a high-probability upper bound on the regret incurred by the proposed algorithm in terms of covariate sampling probabilities, showing that the regret degrades due to missingness by at most $\zeta_{min}^2$, where $\zeta_{min}$ is the minimum probability of observing covariates in the context vector. We illustrate our algorithm for the practical application of experimental design for collecting gene expression data by a sequential selection of class discriminating DNA probes.
    Target Identification and Bayesian Model Averaging with Probabilistic Hierarchical Factor Probabilities. (arXiv:2207.11212v1 [cs.CV])
    Target detection in hyperspectral imagery is the process of locating pixels from an image which are likely to contain target, typically done by comparing one or more spectra for the desired target material to each pixel in the image. Target identification is the process of target detection incorporating an additional process to identify more specifically the material that is present in each pixel that scored high in detection. Detection is generally a 2-class problem of target vs. background, and identification is a many class problem including target, background, and additional know materials. The identification process we present is probabilistic and hierarchical which provides transparency to the process and produces trustworthy output. In this paper we show that target identification has a much lower false alarm rate than detection alone, and provide a detailed explanation of a robust identification method using probabilistic hierarchical classification that handles the vague categories of materials that depend on users which are different than the specific physical categories of chemical constituents. Identification is often done by comparing mixtures of materials including the target spectra to mixtures of materials that do not include the target spectra, possibly with other steps. (band combinations, feature checking, background removal, etc.) Standard linear regression does not handle these problems well because the number of regressors (identification spectra) is greater than the number of feature variables (bands), and there are multiple correlated spectra. Our proposed method handles these challenges efficiently and provides additional important practical information in the form of hierarchical probabilities computed from Bayesian model averaging.
    Panoptic Scene Graph Generation. (arXiv:2207.11247v1 [cs.CV])
    Existing research addresses scene graph generation (SGG) -- a critical technology for scene understanding in images -- from a detection perspective, i.e., objects are detected using bounding boxes followed by prediction of their pairwise relationships. We argue that such a paradigm causes several problems that impede the progress of the field. For instance, bounding box-based labels in current datasets usually contain redundant classes like hairs, and leave out background information that is crucial to the understanding of context. In this work, we introduce panoptic scene graph generation (PSG), a new problem task that requires the model to generate a more comprehensive scene graph representation based on panoptic segmentations rather than rigid bounding boxes. A high-quality PSG dataset, which contains 49k well-annotated overlapping images from COCO and Visual Genome, is created for the community to keep track of its progress. For benchmarking, we build four two-stage baselines, which are modified from classic methods in SGG, and two one-stage baselines called PSGTR and PSGFormer, which are based on the efficient Transformer-based detector, i.e., DETR. While PSGTR uses a set of queries to directly learn triplets, PSGFormer separately models the objects and relations in the form of queries from two Transformer decoders, followed by a prompting-like relation-object matching mechanism. In the end, we share insights on open challenges and future directions.
    BigIssue: A Realistic Bug Localization Benchmark. (arXiv:2207.10739v1 [cs.LG])
    As machine learning tools progress, the inevitable question arises: How can machine learning help us write better code? With significant progress being achieved in natural language processing with models like GPT-3 and Bert, the applications of natural language processing techniques to code are starting to be explored. Most of the research has been focused on automatic program repair (APR), and while the results on synthetic or highly filtered datasets are promising, such models are hard to apply in real-world scenarios because of inadequate bug localization. We propose BigIssue: a benchmark for realistic bug localization. The goal of the benchmark is two-fold. We provide (1) a general benchmark with a diversity of real and synthetic Java bugs and (2) a motivation to improve bug localization capabilities of models through attention to the full repository context. With the introduction of BigIssue, we hope to advance the state of the art in bug localization, in turn improving APR performance and increasing its applicability to the modern development cycle.
    Assessing mortality prediction through different representation models based on concepts extracted from clinical notes. (arXiv:2207.10872v1 [cs.CL])
    Recent years have seen particular interest in using electronic medical records (EMRs) for secondary purposes to enhance the quality and safety of healthcare delivery. EMRs tend to contain large amounts of valuable clinical notes. Learning of embedding is a method for converting notes into a format that makes them comparable. Transformer-based representation models have recently made a great leap forward. These models are pre-trained on large online datasets to understand natural language texts effectively. The quality of a learning embedding is influenced by how clinical notes are used as input to representation models. A clinical note has several sections with different levels of information value. It is also common for healthcare providers to use different expressions for the same concept. Existing methods use clinical notes directly or with an initial preprocessing as input to representation models. However, to learn a good embedding, we identified the most essential clinical notes section. We then mapped the extracted concepts from selected sections to the standard names in the Unified Medical Language System (UMLS). We used the standard phrases corresponding to the unique concepts as input for clinical models. We performed experiments to measure the usefulness of the learned embedding vectors in the task of hospital mortality prediction on a subset of the publicly available Medical Information Mart for Intensive Care (MIMIC-III) dataset. According to the experiments, clinical transformer-based representation models produced better results with getting input generated by standard names of extracted unique concepts compared to other input formats. The best-performing models were BioBERT, PubMedBERT, and UmlsBERT, respectively.  ( 3 min )
    Revisiting Parameter Reuse to Overcome Catastrophic Forgetting in Neural Networks. (arXiv:2207.11005v1 [cs.LG])
    Neural networks tend to forget previously learned knowledge when continuously learning on datasets with varying distributions, a phenomenon known as catastrophic forgetting. More significant distribution shifts among datasets lead to more forgetting. Recently, parameter-isolation-based approaches have shown great potential in overcoming forgetting with significant distribution shifts. However, they suffer from poor generalization as they fix the neural path for each dataset during training and require dataset labels during inference. In addition, they do not support backward knowledge transfer as they prioritize past data over future ones. In this paper, we propose a new adaptive learning method, named AdaptCL, that fully reuses and grows on learned parameters to overcome catastrophic forgetting and allows the positive backward transfer without requiring dataset labels. Our proposed technique adaptively grows on the same neural path by allowing optimal reuse of frozen parameters. Besides, it uses parameter-level data-driven pruning to assign equal priority to the data. We conduct extensive experiments on MNIST Variants, DomainNet, and Food Freshness Detection datasets under different intensities of distribution shifts without requiring dataset labels. Results demonstrate that our proposed method is superior to alternative baselines in minimizing forgetting and enabling positive backward knowledge transfer.  ( 2 min )
    PhishSim: Aiding Phishing Website Detection with a Feature-Free Tool. (arXiv:2207.10801v1 [cs.CR])
    In this paper, we propose a feature-free method for detecting phishing websites using the Normalized Compression Distance (NCD), a parameter-free similarity measure which computes the similarity of two websites by compressing them, thus eliminating the need to perform any feature extraction. It also removes any dependence on a specific set of website features. This method examines the HTML of webpages and computes their similarity with known phishing websites, in order to classify them. We use the Furthest Point First algorithm to perform phishing prototype extractions, in order to select instances that are representative of a cluster of phishing webpages. We also introduce the use of an incremental learning algorithm as a framework for continuous and adaptive detection without extracting new features when concept drift occurs. On a large dataset, our proposed method significantly outperforms previous methods in detecting phishing websites, with an AUC score of 98.68%, a high true positive rate (TPR) of around 90%, while maintaining a low false positive rate (FPR) of 0.58%. Our approach uses prototypes, eliminating the need to retain long term data in the future, and is feasible to deploy in real systems with a processing time of roughly 0.3 seconds.  ( 3 min )
    End-to-End and Self-Supervised Learning for ComParE 2022 Stuttering Sub-Challenge. (arXiv:2207.10817v1 [cs.SD])
    In this paper, we present end-to-end and speech embedding based systems trained in a self-supervised fashion to participate in the ACM Multimedia 2022 ComParE Challenge, specifically the stuttering sub-challenge. In particular, we exploit the embeddings from the pre-trained Wav2Vec2.0 model for stuttering detection (SD) on the KSoF dataset. After embedding extraction, we benchmark with several methods for SD. Our proposed self-supervised based SD system achieves a UAR of 36.9% and 41.0% on validation and test sets respectively, which is 31.32% (validation set) and 1.49% (test set) higher than the best (DeepSpectrum) challenge baseline (CBL). Moreover, we show that concatenating layer embeddings with Mel-frequency cepstral coefficients (MFCCs) features further improves the UAR of 33.81% and 5.45% on validation and test sets respectively over the CBL. Finally, we demonstrate that the summing information across all the layers of Wav2Vec2.0 surpasses the CBL by a relative margin of 45.91% and 5.69% on validation and test sets respectively. Grand-challenge: Computational Paralinguistics ChallengE  ( 2 min )
    FairGRAPE: Fairness-aware GRAdient Pruning mEthod for Face Attribute Classification. (arXiv:2207.10888v1 [cs.CV])
    Existing pruning techniques preserve deep neural networks' overall ability to make correct predictions but may also amplify hidden biases during the compression process. We propose a novel pruning method, Fairness-aware GRAdient Pruning mEthod (FairGRAPE), that minimizes the disproportionate impacts of pruning on different sub-groups. Our method calculates the per-group importance of each model weight and selects a subset of weights that maintain the relative between-group total importance in pruning. The proposed method then prunes network edges with small importance values and repeats the procedure by updating importance values. We demonstrate the effectiveness of our method on four different datasets, FairFace, UTKFace, CelebA, and ImageNet, for the tasks of face attribute classification where our method reduces the disparity in performance degradation by up to 90% compared to the state-of-the-art pruning algorithms. Our method is substantially more effective in a setting with a high pruning rate (99%). The code and dataset used in the experiments are available at https://github.com/Bernardo1998/FairGRAPE  ( 2 min )
    What's in the laundromat? Mapping and characterising offshore owned domestic property in London. (arXiv:2207.10931v1 [cs.LG])
    The UK, particularly London, is a global hub for money laundering, a significant portion of which uses domestic property. However, understanding the distribution and characteristics of offshore domestic property in the UK is challenging due to data availability. This paper attempts to remedy that situation by enhancing a publicly available dataset of UK property owned by offshore companies. We create a data processing pipeline which draws on several datasets and machine learning techniques to create a parsed set of addresses classified into six use classes. The enhanced dataset contains 138,000 properties 44,000 more than the original dataset. The majority are domestic (95k), with a disproportionate amount of those in London (42k). The average offshore domestic property in London is worth 1.33 million GBP collectively this amounts to approximately 56 Billion GBP. We perform an in-depth analysis of the offshore domestic property in London, comparing the price, distribution and entropy/concentration with Airbnb property, low-use/empty property and conventional domestic property. We estimate that the total amount of offshore, low-use and airbnb property in London is between 144,000 and 164,000 and that they are collectively worth between 145-174 billion GBP. Furthermore, offshore domestic property is more expensive and has higher entropy/concentration than all other property types. In addition, we identify two different types of offshore property, nested and individual, which have different price and distribution characteristics. Finally, we release the enhanced offshore property dataset, the complete low-use London dataset and the pipeline for creating the enhanced dataset to reduce the barriers to studying this topic.  ( 3 min )
    Principal Geodesic Analysis of Merge Trees (and Persistence Diagrams). (arXiv:2207.10960v1 [cs.GR])
    This paper presents a computational framework for the Principal Geodesic Analysis of merge trees (MT-PGA), a novel adaptation of the celebrated Principal Component Analysis (PCA) framework [87] to the Wasserstein metric space of merge trees [92]. We formulate MT-PGA computation as a constrained optimization problem, aiming at adjusting a basis of orthogonal geodesic axes, while minimizing a fitting energy. We introduce an efficient, iterative algorithm which exploits shared-memory parallelism, as well as an analytic expression of the fitting energy gradient, to ensure fast iterations. Our approach also trivially extends to extremum persistence diagrams. Extensive experiments on public ensembles demonstrate the efficiency of our approach - with MT-PGA computations in the orders of minutes for the largest examples. We show the utility of our contributions by extending to merge trees two typical PCA applications. First, we apply MT-PGA to data reduction and reliably compress merge trees by concisely representing them by their first coordinates in the MT-PGA basis. Second, we present a dimensionality reduction framework exploiting the first two directions of the MT-PGA basis to generate two-dimensional layouts of the ensemble. We augment these layouts with persistence correlation views, enabling global and local visual inspections of the feature variability in the ensemble. In both applications, quantitative experiments assess the relevance of our framework. Finally, we provide a lightweight C++ implementation that can be used to reproduce our results.  ( 3 min )
    Heterogeneous Ensemble Learning for Enhanced Crash Forecasts -- A Frequentest and Machine Learning based Stacking Framework. (arXiv:2207.10721v1 [cs.LG])
    A variety of statistical and machine learning methods are used to model crash frequency on specific roadways with machine learning methods generally having a higher prediction accuracy. Recently, heterogeneous ensemble methods (HEM), including stacking, have emerged as more accurate and robust intelligent techniques and are often used to solve pattern recognition problems by providing more reliable and accurate predictions. In this study, we apply one of the key HEM methods, Stacking, to model crash frequency on five lane undivided segments (5T) of urban and suburban arterials. The prediction performance of Stacking is compared with parametric statistical models (Poisson and negative binomial) and three state of the art machine learning techniques (Decision tree, random forest, and gradient boosting), each of which is termed as the base learner. By employing an optimal weight scheme to combine individual base learners through stacking, the problem of biased predictions in individual base-learners due to differences in specifications and prediction accuracies is avoided. Data including crash, traffic, and roadway inventory were collected and integrated from 2013 to 2017. The data are split into training, validation, and testing datasets. Estimation results of statistical models reveal that besides other factors, crashes increase with density (number per mile) of different types of driveways. Comparison of out-of-sample predictions of various models confirms the superiority of Stacking over the alternative methods considered. From a practical standpoint, stacking can enhance prediction accuracy (compared to using only one base learner with a particular specification). When applied systemically, stacking can help identify more appropriate countermeasures.  ( 3 min )
    Privacy and Transparency in Graph Machine Learning: A Unified Perspective. (arXiv:2207.10896v1 [cs.LG])
    Graph Machine Learning (GraphML), whereby classical machine learning is generalized to irregular graph domains, has enjoyed a recent renaissance, leading to a dizzying array of models and their applications in several domains. With its growing applicability to sensitive domains and regulations by government agencies for trustworthy AI systems, researchers have started looking into the issues of transparency and privacy of graph learning. However, these topics have been mainly investigated independently. In this position paper, we provide a unified perspective on the interplay of privacy and transparency in GraphML.  ( 2 min )
    Multiple Robust Learning for Recommendation. (arXiv:2207.10796v1 [cs.IR])
    In recommender systems, a common problem is the presence of various biases in the collected data, which deteriorates the generalization ability of the recommendation models and leads to inaccurate predictions. Doubly robust (DR) learning has been studied in many tasks in RS, with the advantage that unbiased learning can be achieved when either a single imputation or a single propensity model is accurate. In this paper, we propose a multiple robust (MR) estimator that can take the advantage of multiple candidate imputation and propensity models to achieve unbiasedness. Specifically, the MR estimator is unbiased when any of the imputation or propensity models, or a linear combination of these models is accurate. Theoretical analysis shows that the proposed MR is an enhanced version of DR when only having a single imputation and propensity model, and has a smaller bias. Inspired by the generalization error bound of MR, we further propose a novel multiple robust learning approach with stabilization. We conduct extensive experiments on real-world and semi-synthetic datasets, which demonstrates the superiority of the proposed approach over state-of-the-art methods.  ( 2 min )
    Statistical Hypothesis Testing Based on Machine Learning: Large Deviations Analysis. (arXiv:2207.10939v1 [stat.ML])
    We study the performance -- and specifically the rate at which the error probability converges to zero -- of Machine Learning (ML) classification techniques. Leveraging the theory of large deviations, we provide the mathematical conditions for a ML classifier to exhibit error probabilities that vanish exponentially, say $\sim \exp\left(-n\,I + o(n) \right)$, where $n$ is the number of informative observations available for testing (or another relevant parameter, such as the size of the target in an image) and $I$ is the error rate. Such conditions depend on the Fenchel-Legendre transform of the cumulant-generating function of the Data-Driven Decision Function (D3F, i.e., what is thresholded before the final binary decision is made) learned in the training phase. As such, the D3F and, consequently, the related error rate $I$, depend on the given training set, which is assumed of finite size. Interestingly, these conditions can be verified and tested numerically exploiting the available dataset, or a synthetic dataset, generated according to the available information on the underlying statistical model. In other words, the classification error probability convergence to zero and its rate can be computed on a portion of the dataset available for training. Coherently with the large deviations theory, we can also establish the convergence, for $n$ large enough, of the normalized D3F statistic to a Gaussian distribution. This property is exploited to set a desired asymptotic false alarm probability, which empirically turns out to be accurate even for quite realistic values of $n$. Furthermore, approximate error probability curves $\sim \zeta_n \exp\left(-n\,I \right)$ are provided, thanks to the refined asymptotic derivation (often referred to as exact asymptotics), where $\zeta_n$ represents the most representative sub-exponential terms of the error probabilities.  ( 3 min )
    Explainable AI Algorithms for Vibration Data-based Fault Detection: Use Case-adadpted Methods and Critical Evaluation. (arXiv:2207.10732v1 [eess.SP])
    Analyzing vibration data using deep neural network algorithms is an effective way to detect damages in rotating machinery at an early stage. However, the black-box approach of these methods often does not provide a satisfactory solution because the cause of classifications is not comprehensible to humans. Therefore, this work investigates the application of explainable AI (XAI) algorithms to convolutional neural networks for vibration-based condition monitoring. For this, various XAI algorithms are applied to classifications based on the Fourier transform as well as the order analysis of the vibration signal. The results are visualized as a function of the revolutions per minute (RPM), in the shape of frequency-RPM maps and order-RPM maps. This allows to assess the saliency given to features which depend on the rotation speed and those with constant frequency. To compare the explanatory power of the XAI methods, investigations are first carried out with a synthetic data set with known class-specific characteristics. Then a real-world data set for vibration-based imbalance classification on an electric motor, which runs at a broad range of rotation speeds, is used. A special focus is put on the consistency for variable periodicity of the data, which translates to a varying rotation speed of a real-world machine. This work aims to show the different strengths and weaknesses of the methods for this use case: GradCAM, LRP and LIME with a new perturbation strategy.  ( 3 min )
    Twitmo: A Twitter Data Topic Modeling and Visualization Package for R. (arXiv:2207.11236v1 [cs.IR])
    We present Twitmo, a package that provides a broad range of methods to collect, pre-process, analyze and visualize geo-tagged Twitter data. Twitmo enables the user to collect geo-tagged Tweets from Twitter and and provides a comprehensive and user-friendly toolbox to generate topic distributions from Latent Dirichlet Allocations (LDA), correlated topic models (CTM) and structural topic models (STM). Functions are included for pre-processing of text, model building and prediction. In addition, one of the innovations of the package is the automatic pooling of Tweets into longer pseudo-documents using hashtags and cosine similarities for better topic coherence. The package additionally comes with functionality to visualize collected data sets and fitted models in static as well as interactive ways and offers built-in support for model visualizations via LDAvis providing great convenience for researchers in this area. The Twitmo package is an innovative toolbox that can be used to analyze public discourse of various topics, political parties or persons of interest in space and time.
    Layer-Wise Partitioning and Merging for Efficient and Scalable Deep Learning. (arXiv:2207.11019v1 [cs.DC])
    Deep Neural Network (DNN) models are usually trained sequentially from one layer to another, which causes forward, backward and update locking's problems, leading to poor performance in terms of training time. The existing parallel strategies to mitigate these problems provide suboptimal runtime performance. In this work, we have proposed a novel layer-wise partitioning and merging, forward and backward pass parallel framework to provide better training performance. The novelty of the proposed work consists of 1) a layer-wise partition and merging model which can minimise communication overhead between devices without the memory cost of existing strategies during the training process; 2) a forward pass and backward pass parallelisation and optimisation to address the update locking problem and minimise the total training cost. The experimental evaluation on real use cases shows that the proposed method outperforms the state-of-the-art approaches in terms of training speed; and achieves almost linear speedup without compromising the accuracy performance of the non-parallel approach.  ( 2 min )
    JAWS: Predictive Inference Under Covariate Shift. (arXiv:2207.10716v1 [cs.LG])
    We propose \textbf{JAWS}, a series of wrapper methods for distribution-free uncertainty quantification tasks under covariate shift, centered on our core method \textbf{JAW}, the \textbf{JA}ckknife+ \textbf{W}eighted with likelihood-ratio weights. JAWS also includes computationally efficient \textbf{A}pproximations of JAW using higher-order influence functions: \textbf{JAWA}. Theoretically, we show that JAW relaxes the jackknife+'s assumption of data exchangeability to achieve the same finite-sample coverage guarantee even under covariate shift. JAWA further approaches the JAW guarantee in the limit of either the sample size or the influence function order under mild assumptions. Moreover, we propose a general approach to repurposing any distribution-free uncertainty quantification method and its guarantees to the task of risk assessment: a task that generates the estimated probability that the true label lies within a user-specified interval. We then propose \textbf{JAW-R} and \textbf{JAWA-R} as the repurposed versions of proposed methods for \textbf{R}isk assessment. Practically, JAWS outperform the state-of-the-art predictive inference baselines in a variety of biased real world data sets for both interval-generation and risk-assessment auditing tasks.  ( 2 min )
    Hyper-Representations for Pre-Training and Transfer Learning. (arXiv:2207.10951v1 [cs.LG])
    Learning representations of neural network weights given a model zoo is an emerging and challenging area with many potential applications from model inspection, to neural architecture search or knowledge distillation. Recently, an autoencoder trained on a model zoo was able to learn a hyper-representation, which captures intrinsic and extrinsic properties of the models in the zoo. In this work, we extend hyper-representations for generative use to sample new model weights as pre-training. We propose layer-wise loss normalization which we demonstrate is key to generate high-performing models and a sampling method based on the empirical density of hyper-representations. The models generated using our methods are diverse, performant and capable to outperform conventional baselines for transfer learning. Our results indicate the potential of knowledge aggregation from model zoos to new models via hyper-representations thereby paving the avenue for novel research directions.  ( 2 min )
    A Machine Learning Approach for Driver Identification Based on CAN-BUS Sensor Data. (arXiv:2207.10807v1 [cs.LG])
    Driver identification is a momentous field of modern decorated vehicles in the controller area network (CAN-BUS) perspective. Many conventional systems are used to identify the driver. One step ahead, most of the researchers use sensor data of CAN-BUS but there are some difficulties because of the variation of the protocol of different models of vehicle. Our aim is to identify the driver through supervised learning algorithms based on driving behavior analysis. To determine the driver, a driver verification technique is proposed that evaluate driving pattern using the measurement of CAN sensor data. In this paper on-board diagnostic (OBD-II) is used to capture the data from the CAN-BUS sensor and the sensors are listed under SAE J1979 statement. According to the service of OBD-II, drive identification is possible. However, we have gained two types of accuracy on a complete data set with 10 drivers and a partial data set with two drivers. The accuracy is good with less number of drivers compared to the higher number of drivers. We have achieved statistically significant results in terms of accuracy in contrast to the baseline algorithm  ( 2 min )
    PowerFDNet: Deep Learning-Based Stealthy False Data Injection Attack Detection for AC-model Transmission Systems. (arXiv:2207.10805v1 [cs.CR])
    Recent studies have demonstrated that smart grids are vulnerable to stealthy false data injection attacks (SFDIAs), as SFDIAs can bypass residual-based bad data detection mechanisms. The SFDIA detection has become one of the focuses of smart grid research. Methods based on deep learning technology have shown promising accuracy in the detection of SFDIAs. However, most existing methods rely on the temporal structure of a sequence of measurements but do not take account of the spatial structure between buses and transmission lines. To address this issue, we propose a spatiotemporal deep network, PowerFDNet, for the SFDIA detection in AC-model power grids. The PowerFDNet consists of two sub-architectures: spatial architecture (SA) and temporal architecture (TA). The SA is aimed at extracting representations of bus/line measurements and modeling the spatial structure based on their representations. The TA is aimed at modeling the temporal structure of a sequence of measurements. Therefore, the proposed PowerFDNet can effectively model the spatiotemporal structure of measurements. Case studies on the detection of SFDIAs on the benchmark smart grids show that the PowerFDNet achieved significant improvement compared with the state-of-the-art SFDIA detection methods. In addition, an IoT-oriented lightweight prototype of size 52 MB is implemented and tested for mobile devices, which demonstrates the potential applications on mobile devices. The trained model will be available at \textit{https://github.com/FrankYinXF/PowerFDNet}.  ( 3 min )
    Modeling User Behavior With Interaction Networks for Spam Detection. (arXiv:2207.10767v1 [cs.LG])
    Spam is a serious problem plaguing web-scale digital platforms which facilitate user content creation and distribution. It compromises platform's integrity, performance of services like recommendation and search, and overall business. Spammers engage in a variety of abusive and evasive behavior which are distinct from non-spammers. Users' complex behavior can be well represented by a heterogeneous graph rich with node and edge attributes. Learning to identify spammers in such a graph for a web-scale platform is challenging because of its structural complexity and size. In this paper, we propose SEINE (Spam DEtection using Interaction NEtworks), a spam detection model over a novel graph framework. Our graph simultaneously captures rich users' details and behavior and enables learning on a billion-scale graph. Our model considers neighborhood along with edge types and attributes, allowing it to capture a wide range of spammers. SEINE, trained on a real dataset of tens of millions of nodes and billions of edges, achieves a high performance of 80% recall with 1% false positive rate. SEINE achieves comparable performance to the state-of-the-art techniques on a public dataset while being pragmatic to be used in a large-scale production system.  ( 3 min )
    NFDLM: A Lightweight Network Flow based Deep Learning Model for DDoS Attack Detection in IoT Domains. (arXiv:2207.10803v1 [cs.CR])
    In the recent years, Distributed Denial of Service (DDoS) attacks on Internet of Things (IoT) devices have become one of the prime concerns to Internet users around the world. One of the sources of the attacks on IoT ecosystems are botnets. Intruders force IoT devices to become unavailable for its legitimate users by sending large number of messages within a short interval. This study proposes NFDLM, a lightweight and optimised Artificial Neural Network (ANN) based Distributed Denial of Services (DDoS) attack detection framework with mutual correlation as feature selection method which produces a superior result when compared with Long Short Term Memory (LSTM) and simple ANN. Overall, the detection performance achieves approximately 99\% accuracy for the detection of attacks from botnets. In this work, we have designed and compared four different models where two are based on ANN and the other two are based on LSTM to detect the attack types of DDoS.  ( 2 min )
    VTrackIt: A Synthetic Self-Driving Dataset with Infrastructure and Pooled Vehicle Information. (arXiv:2207.11146v1 [cs.CV])
    Artificial intelligence solutions for Autonomous Vehicles (AVs) have been developed using publicly available datasets such as Argoverse, ApolloScape, Level5, and NuScenes. One major limitation of these datasets is the absence of infrastructure and/or pooled vehicle information like lane line type, vehicle speed, traffic signs, and intersections. Such information is necessary and not complementary to eliminating high-risk edge cases. The rapid advancements in Vehicle-to-Infrastructure and Vehicle-to-Vehicle technologies show promise that infrastructure and pooled vehicle information will soon be accessible in near real-time. Taking a leap in the future, we introduce the first comprehensive synthetic dataset with intelligent infrastructure and pooled vehicle information for advancing the next generation of AVs, named VTrackIt. We also introduce the first deep learning model (InfraGAN) for trajectory predictions that considers such information. Our experiments with InfraGAN show that the comprehensive information offered by VTrackIt reduces the number of high-risk edge cases. The VTrackIt dataset is available upon request under the Creative Commons CC BY-NC-SA 4.0 license at this http URL
    Spatial-Temporal Feature Extraction and Evaluation Network for Citywide Traffic Condition Prediction. (arXiv:2207.11034v1 [cs.LG])
    Traffic prediction plays an important role in the realization of traffic control and scheduling tasks in intelligent transportation systems. With the diversification of data sources, reasonably using rich traffic data to model the complex spatial-temporal dependence and nonlinear characteristics in traffic flow are the key challenge for intelligent transportation system. In addition, clearly evaluating the importance of spatial-temporal features extracted from different data becomes a challenge. A Double Layer - Spatial Temporal Feature Extraction and Evaluation (DL-STFEE) model is proposed. The lower layer of DL-STFEE is spatial-temporal feature extraction layer. The spatial and temporal features in traffic data are extracted by multi-graph graph convolution and attention mechanism, and different combinations of spatial and temporal features are generated. The upper layer of DL-STFEE is the spatial-temporal feature evaluation layer. Through the attention score matrix generated by the high-dimensional self-attention mechanism, the spatial-temporal features combinations are fused and evaluated, so as to get the impact of different combinations on prediction effect. Three sets of experiments are performed on actual traffic datasets to show that DL-STFEE can effectively capture the spatial-temporal features and evaluate the importance of different spatial-temporal feature combinations.
    Learning Generalized Non-Rigid Multimodal Biomedical Image Registration from Generic Point Set Data. (arXiv:2207.10994v1 [cs.CV])
    Free Point Transformer (FPT) has been proposed as a data-driven, non-rigid point set registration approach using deep neural networks. As FPT does not assume constraints based on point vicinity or correspondence, it may be trained simply and in a flexible manner by minimizing an unsupervised loss based on the Chamfer Distance. This makes FPT amenable to real-world medical imaging applications where ground-truth deformations may be infeasible to obtain, or in scenarios where only a varying degree of completeness in the point sets to be aligned is available. To test the limit of the correspondence finding ability of FPT and its dependency on training data sets, this work explores the generalizability of the FPT from well-curated non-medical data sets to medical imaging data sets. First, we train FPT on the ModelNet40 dataset to demonstrate its effectiveness and the superior registration performance of FPT over iterative and learning-based point set registration methods. Second, we demonstrate superior performance in rigid and non-rigid registration and robustness to missing data. Last, we highlight the interesting generalizability of the ModelNet-trained FPT by registering reconstructed freehand ultrasound scans of the spine and generic spine models without additional training, whereby the average difference to the ground truth curvatures is 1.3 degrees, across 13 patients.
    Federated Semi-Supervised Domain Adaptation via Knowledge Transfer. (arXiv:2207.10727v1 [cs.LG])
    Given the rapidly changing machine learning environments and expensive data labeling, semi-supervised domain adaptation (SSDA) is imperative when the labeled data from the source domain is statistically different from the partially labeled data from the target domain. Most prior SSDA research is centrally performed, requiring access to both source and target data. However, data in many fields nowadays is generated by distributed end devices. Due to privacy concerns, the data might be locally stored and cannot be shared, resulting in the ineffectiveness of existing SSDA research. This paper proposes an innovative approach to achieve SSDA over multiple distributed and confidential datasets, named by Federated Semi-Supervised Domain Adaptation (FSSDA). FSSDA integrates SSDA with federated learning based on strategically designed knowledge distillation techniques, whose efficiency is improved by performing source and target training in parallel. Moreover, FSSDA controls the amount of knowledge transferred across domains by properly selecting a key parameter, i.e., the imitation parameter. Further, the proposed FSSDA can be effectively generalized to multi-source domain adaptation scenarios. Extensive experiments are conducted to demonstrate the effectiveness and efficiency of FSSDA design.  ( 2 min )
    Supervised Contrastive ResNet and Transfer Learning for the In-vehicle Intrusion Detection System. (arXiv:2207.10814v1 [cs.CR])
    High-end vehicles have been furnished with a number of electronic control units (ECUs), which provide upgrading functions to enhance the driving experience. The controller area network (CAN) is a well-known protocol that connects these ECUs because of its modesty and efficiency. However, the CAN bus is vulnerable to various types of attacks. Although the intrusion detection system (IDS) is proposed to address the security problem of the CAN bus, most previous studies only provide alerts when attacks occur without knowing the specific type of attack. Moreover, an IDS is designed for a specific car model due to diverse car manufacturers. In this study, we proposed a novel deep learning model called supervised contrastive (SupCon) ResNet, which can handle multiple attack identification on the CAN bus. Furthermore, the model can be used to improve the performance of a limited-size dataset using a transfer learning technique. The capability of the proposed model is evaluated on two real car datasets. When tested with the car hacking dataset, the experiment results show that the SupCon ResNet model improves the overall false-negative rates of four types of attack by four times on average, compared to other models. In addition, the model achieves the highest F1 score at 0.9994 on the survival dataset by utilizing transfer learning. Finally, the model can adapt to hardware constraints in terms of memory size and running time.  ( 3 min )
    A Convolutional Attention Based Deep Network Solution for UAV Network Attack Recognition over Fading Channels and Interference. (arXiv:2207.10810v1 [cs.CR])
    When users exchange data with Unmanned Aerial vehicles - (UAVs) over air-to-ground (A2G) wireless communication networks, they expose the link to attacks that could increase packet loss and might disrupt connectivity. For example, in emergency deliveries, losing control information (i.e data related to the UAV control communication) might result in accidents that cause UAV destruction and damage to buildings or other elements in a city. To prevent these problems, these issues must be addressed in 5G and 6G scenarios. This research offers a deep learning (DL) approach for detecting attacks in UAVs equipped with orthogonal frequency division multiplexing (OFDM) receivers on Clustered Delay Line (CDL) channels in highly complex scenarios involving authenticated terrestrial users, as well as attackers in unknown locations. We use the two observable parameters available in 5G UAV connections: the Received Signal Strength Indicator (RSSI) and the Signal to Interference plus Noise Ratio (SINR). The prospective algorithm is generalizable regarding attack identification, which does not occur during training. Further, it can identify all the attackers in the environment with 20 terrestrial users. A deeper investigation into the timing requirements for recognizing attacks show that after training, the minimum time necessary after the attack begins is 100 ms, and the minimum attack power is 2 dBm, which is the same power that the authenticated UAV uses. Our algorithm also detects moving attackers from a distance of 500 m.  ( 3 min )
    Federated Learning on Adaptively Weighted Nodes by Bilevel Optimization. (arXiv:2207.10751v1 [cs.LG])
    We propose a federated learning method with weighted nodes in which the weights can be modified to optimize the model's performance on a separate validation set. The problem is formulated as a bilevel optimization where the inner problem is a federated learning problem with weighted nodes and the outer problem focuses on optimizing the weights based on the validation performance of the model returned from the inner problem. A communication-efficient federated optimization algorithm is designed to solve this bilevel optimization problem. Under an error-bound assumption, we analyze the generalization performance of the output model and identify scenarios when our method is in theory superior to training a model only locally and to federated learning with static and evenly distributed weights.  ( 2 min )
    Uncertainty-aware Multi-modal Learning via Cross-modal Random Network Prediction. (arXiv:2207.10851v1 [cs.CV])
    Multi-modal learning focuses on training models by equally combining multiple input data modalities during the prediction process. However, this equal combination can be detrimental to the prediction accuracy because different modalities are usually accompanied by varying levels of uncertainty. Using such uncertainty to combine modalities has been studied by a couple of approaches, but with limited success because these approaches are either designed to deal with specific classification or segmentation problems and cannot be easily translated into other tasks, or suffer from numerical instabilities. In this paper, we propose a new Uncertainty-aware Multi-modal Learner that estimates uncertainty by measuring feature density via Cross-modal Random Network Prediction (CRNP). CRNP is designed to require little adaptation to translate between different prediction tasks, while having a stable training process. From a technical point of view, CRNP is the first approach to explore random network prediction to estimate uncertainty and to combine multi-modal data. Experiments on two 3D multi-modal medical image segmentation tasks and three 2D multi-modal computer vision classification tasks show the effectiveness, adaptability and robustness of CRNP. Also, we provide an extensive discussion on different fusion functions and visualization to validate the proposed model.  ( 2 min )
    Respecting Time Series Properties Makes Deep Time Series Forecasting Perfect. (arXiv:2207.10941v1 [cs.LG])
    How to handle time features shall be the core question of any time series forecasting model. Ironically, it is often ignored or misunderstood by deep-learning based models, even those baselines which are state-of-the-art. This behavior makes their inefficient, untenable and unstable. In this paper, we rigorously analyze three prevalent but deficient/unfounded deep time series forecasting mechanisms or methods from the view of time series properties, including normalization methods, multivariate forecasting and input sequence length. Corresponding corollaries and solutions are given on both empirical and theoretical basis. We thereby propose a novel time series forecasting network, i.e. RTNet, on the basis of aforementioned analysis. It is general enough to be combined with both supervised and self-supervised forecasting format. Thanks to the core idea of respecting time series properties, no matter in which forecasting format, RTNet shows obviously superior forecasting performances compared with dozens of other SOTA time series forecasting baselines in three real-world benchmark datasets. By and large, it even occupies less time complexity and memory usage while acquiring better forecasting accuracy. The source code is available at https://github.com/OrigamiSL/RTNet.  ( 2 min )
    Automated Dilated Spatio-Temporal Synchronous Graph Modeling for Traffic Prediction. (arXiv:2207.10830v1 [cs.LG])
    Accurate traffic prediction is a challenging task in intelligent transportation systems because of the complex spatio-temporal dependencies in transportation networks. Many existing works utilize sophisticated temporal modeling approaches to incorporate with graph convolution networks (GCNs) for capturing short-term and long-term spatio-temporal dependencies. However, these separated modules with complicated designs could restrict effectiveness and efficiency of spatio-temporal representation learning. Furthermore, most previous works adopt the fixed graph construction methods to characterize the global spatio-temporal relations, which limits the learning capability of the model for different time periods and even different data scenarios. To overcome these limitations, we propose an automated dilated spatio-temporal synchronous graph network, named Auto-DSTSGN for traffic prediction. Specifically, we design an automated dilated spatio-temporal synchronous graph (Auto-DSTSG) module to capture the short-term and long-term spatio-temporal correlations by stacking deeper layers with dilation factors in an increasing order. Further, we propose a graph structure search approach to automatically construct the spatio-temporal synchronous graph that can adapt to different data scenarios. Extensive experiments on four real-world datasets demonstrate that our model can achieve about 10% improvements compared with the state-of-art methods. Source codes are available at https://github.com/jinguangyin/Auto-DSTSGN.  ( 2 min )
    Neuroimaging Feature Extraction using a Neural Network Classifier for Imaging Genetics. (arXiv:2207.10794v1 [q-bio.QM])
    A major issue in the association of genes to neuroimaging phenotypes is the high dimension of both genetic data and neuroimaging data. In this article, we tackle the latter problem with an eye toward developing solutions that are relevant for disease prediction. Supported by a vast literature on the predictive power of neural networks, our proposed solution uses neural networks to extract from neuroimaging data features that are relevant for predicting Alzheimer's Disease (AD) for subsequent relation to genetics. Our neuroimaging-genetic pipeline is comprised of image processing, neuroimaging feature extraction and genetic association steps. We propose a neural network classifier for extracting neuroimaging features that are related with disease and a multivariate Bayesian group sparse regression model for genetic association. We compare the predictive power of these features to expert selected features and take a closer look at the SNPs identified with the new neuroimaging features.  ( 2 min )
    Robust Knowledge Adaptation for Dynamic Graph Neural Networks. (arXiv:2207.10839v1 [cs.LG])
    Graph structured data often possess dynamic characters in nature, e.g., the addition of links and nodes, in many real-world applications. Recent years have witnessed the increasing attentions paid to dynamic graph neural networks for modelling such graph data, where almost all the existing approaches assume that when a new link is built, the embeddings of the neighbor nodes should be updated by learning the temporal dynamics to propagate new information. However, such approaches suffer from the limitation that if the node introduced by a new connection contains noisy information, propagating its knowledge to other nodes is not reliable and even leads to the collapse of the model. In this paper, we propose AdaNet: a robust knowledge Adaptation framework via reinforcement learning for dynamic graph neural Networks. In contrast to previous approaches immediately updating the embeddings of the neighbor nodes once adding a new link, AdaNet attempts to adaptively determine which nodes should be updated because of the new link involved. Considering that the decision whether to update the embedding of one neighbor node will have great impact on other neighbor nodes, we thus formulate the selection of node update as a sequence decision problem, and address this problem via reinforcement learning. By this means, we can adaptively propagate knowledge to other nodes for learning robust node embedding representations. To the best of our knowledge, our approach constitutes the first attempt to explore robust knowledge adaptation via reinforcement learning for dynamic graph neural networks. Extensive experiments on three benchmark datasets demonstrate that AdaNet achieves the state-of-the-art performance. In addition, we perform the experiments by adding different degrees of noise into the dataset, quantitatively and qualitatively illustrating the robustness of AdaNet.  ( 3 min )
    Just Rotate it: Deploying Backdoor Attacks via Rotation Transformation. (arXiv:2207.10825v1 [cs.CV])
    Recent works have demonstrated that deep learning models are vulnerable to backdoor poisoning attacks, where these attacks instill spurious correlations to external trigger patterns or objects (e.g., stickers, sunglasses, etc.). We find that such external trigger signals are unnecessary, as highly effective backdoors can be easily inserted using rotation-based image transformation. Our method constructs the poisoned dataset by rotating a limited amount of objects and labeling them incorrectly; once trained with it, the victim's model will make undesirable predictions during run-time inference. It exhibits a significantly high attack success rate while maintaining clean performance through comprehensive empirical studies on image classification and object detection tasks. Furthermore, we evaluate standard data augmentation techniques and four different backdoor defenses against our attack and find that none of them can serve as a consistent mitigation approach. Our attack can be easily deployed in the real world since it only requires rotating the object, as we show in both image classification and object detection applications. Overall, our work highlights a new, simple, physically realizable, and highly effective vector for backdoor attacks. Our video demo is available at https://youtu.be/6JIF8wnX34M.  ( 2 min )
    Characterizing Coherent Integrated Photonic Neural Networks under Imperfections. (arXiv:2207.10835v1 [cs.ET])
    Integrated photonic neural networks (IPNNs) are emerging as promising successors to conventional electronic AI accelerators as they offer substantial improvements in computing speed and energy efficiency. In particular, coherent IPNNs use arrays of Mach-Zehnder interferometers (MZIs) for unitary transformations to perform energy-efficient matrix-vector multiplication. However, the underlying MZI devices in IPNNs are susceptible to uncertainties stemming from optical lithographic variations and thermal crosstalk and can experience imprecisions due to non-uniform MZI insertion loss and quantization errors due to low-precision encoding in the tuned phase angles. In this paper, we, for the first time, systematically characterize the impact of such uncertainties and imprecisions (together referred to as imperfections) in IPNNs using a bottom-up approach. We show that their impact on IPNN accuracy can vary widely based on the tuned parameters (e.g., phase angles) of the affected components, their physical location, and the nature and distribution of the imperfections. To improve reliability measures, we identify critical IPNN building blocks that, under imperfections, can lead to catastrophic degradation in the classification accuracy. We show that under multiple simultaneous imperfections, the IPNN inferencing accuracy can degrade by up to 46%, even when the imperfection parameters are restricted within a small range. Our results also indicate that the inferencing accuracy is sensitive to imperfections affecting the MZIs in the linear layers next to the input layer of the IPNN.  ( 3 min )
    Multilabel Prototype Generation for Data Reduction in k-Nearest Neighbour classification. (arXiv:2207.10947v1 [cs.LG])
    Prototype Generation (PG) methods are typically considered for improving the efficiency of the $k$-Nearest Neighbour ($k$NN) classifier when tackling high-size corpora. Such approaches aim at generating a reduced version of the corpus without decreasing the classification performance when compared to the initial set. Despite their large application in multiclass scenarios, very few works have addressed the proposal of PG methods for the multilabel space. In this regard, this work presents the novel adaptation of four multiclass PG strategies to the multilabel case. These proposals are evaluated with three multilabel $k$NN-based classifiers, 12 corpora comprising a varied range of domains and corpus sizes, and different noise scenarios artificially induced in the data. The results obtained show that the proposed adaptations are capable of significantly improving -- both in terms of efficiency and classification performance -- the only reference multilabel PG work in the literature as well as the case in which no PG method is applied, also presenting a statistically superior robustness in noisy scenarios. Moreover, these novel PG strategies allow prioritising either the efficiency or efficacy criteria through its configuration depending on the target scenario, hence covering a wide area in the solution space not previously filled by other works.  ( 2 min )
    Active Data Pattern Extraction Attacks on Generative Language Models. (arXiv:2207.10802v1 [cs.CR])
    With the wide availability of large pre-trained language model checkpoints, such as GPT-2 and BERT, the recent trend has been to fine-tune them on a downstream task to achieve the state-of-the-art performance with a small computation overhead. One natural example is the Smart Reply application where a pre-trained model is fine-tuned for suggesting a number of responses given a query message. In this work, we set out to investigate potential information leakage vulnerabilities in a typical Smart Reply pipeline and show that it is possible for an adversary, having black-box or gray-box access to a Smart Reply model, to extract sensitive user information present in the training data. We further analyse the privacy impact of specific components, e.g. the decoding strategy, pertained to this application through our attack settings. We explore potential mitigation strategies and demonstrate how differential privacy can be a strong defense mechanism to such data extraction attacks.  ( 2 min )
    Suppressing Poisoning Attacks on Federated Learning for Medical Imaging. (arXiv:2207.10804v1 [cs.CR])
    Collaboration among multiple data-owning entities (e.g., hospitals) can accelerate the training process and yield better machine learning models due to the availability and diversity of data. However, privacy concerns make it challenging to exchange data while preserving confidentiality. Federated Learning (FL) is a promising solution that enables collaborative training through exchange of model parameters instead of raw data. However, most existing FL solutions work under the assumption that participating clients are \emph{honest} and thus can fail against poisoning attacks from malicious parties, whose goal is to deteriorate the global model performance. In this work, we propose a robust aggregation rule called Distance-based Outlier Suppression (DOS) that is resilient to byzantine failures. The proposed method computes the distance between local parameter updates of different clients and obtains an outlier score for each client using Copula-based Outlier Detection (COPOD). The resulting outlier scores are converted into normalized weights using a softmax function, and a weighted average of the local parameters is used for updating the global model. DOS aggregation can effectively suppress parameter updates from malicious clients without the need for any hyperparameter selection, even when the data distributions are heterogeneous. Evaluation on two medical imaging datasets (CheXpert and HAM10000) demonstrates the higher robustness of DOS method against a variety of poisoning attacks in comparison to other state-of-the-art methods. The code can be found here https://github.com/Naiftt/SPAFD.  ( 3 min )
    Transformer with Implicit Edges for Particle-based Physics Simulation. (arXiv:2207.10860v1 [cs.LG])
    Particle-based systems provide a flexible and unified way to simulate physics systems with complex dynamics. Most existing data-driven simulators for particle-based systems adopt graph neural networks (GNNs) as their network backbones, as particles and their interactions can be naturally represented by graph nodes and graph edges. However, while particle-based systems usually contain hundreds even thousands of particles, the explicit modeling of particle interactions as graph edges inevitably leads to a significant computational overhead, due to the increased number of particle interactions. Consequently, in this paper we propose a novel Transformer-based method, dubbed as Transformer with Implicit Edges (TIE), to capture the rich semantics of particle interactions in an edge-free manner. The core idea of TIE is to decentralize the computation involving pair-wise particle interactions into per-particle updates. This is achieved by adjusting the self-attention module to resemble the update formula of graph edges in GNN. To improve the generalization ability of TIE, we further amend TIE with learnable material-specific abstract particles to disentangle global material-wise semantics from local particle-wise semantics. We evaluate our model on diverse domains of varying complexity and materials. Compared with existing GNN-based methods, without bells and whistles, TIE achieves superior performance and generalization across all these domains. Codes and models are available at https://github.com/ftbabi/TIE_ECCV2022.git.  ( 2 min )
    IDPS Signature Classification with a Reject Option and the Incorporation of Expert Knowledge. (arXiv:2207.10797v1 [cs.CR])
    As the importance of intrusion detection and prevention systems (IDPSs) increases, great costs are incurred to manage the signatures that are generated by malicious communication pattern files. Experts in network security need to classify signatures by importance for an IDPS to work. We propose and evaluate a machine learning signature classification model with a reject option (RO) to reduce the cost of setting up an IDPS. To train the proposed model, it is essential to design features that are effective for signature classification. Experts classify signatures with predefined if-then rules. An if-then rule returns a label of low, medium, high, or unknown importance based on keyword matching of the elements in the signature. Therefore, we first design two types of features, symbolic features (SFs) and keyword features (KFs), which are used in keyword matching for the if-then rules. Next, we design web information and message features (WMFs) to capture the properties of signatures that do not match the if-then rules. The WMFs are extracted as term frequency-inverse document frequency (TF-IDF) features of the message text in the signatures. The features are obtained by web scraping from the referenced external attack identification systems described in the signature. Because failure needs to be minimized in the classification of IDPS signatures, as in the medical field, we consider introducing a RO in our proposed model. The effectiveness of the proposed classification model is evaluated in experiments with two real datasets composed of signatures labeled by experts: a dataset that can be classified with if-then rules and a dataset with elements that do not match an if-then rule. In the experiment, the proposed model is evaluated. In both cases, the combined SFs and WMFs performed better than the combined SFs and KFs. In addition, we also performed feature analysis.  ( 3 min )
    Learning Physics from the Machine: An Interpretable Boosted Decision Tree Analysis for the Majorana Demonstrator. (arXiv:2207.10710v1 [physics.data-an])
    The Majorana Demonstrator is a leading experiment searching for neutrinoless double-beta decay with high purity germanium detectors (HPGe). Machine learning provides a new way to maximize the amount of information provided by these detectors, but the data-driven nature makes it less interpretable compared to traditional analysis. An interpretability study reveals the machine's decision-making logic, allowing us to learn from the machine to feedback to the traditional analysis. In this work, we have presented the first machine learning analysis of the data from the Majorana Demonstrator; this is also the first interpretable machine learning analysis of any germanium detector experiment. Two gradient boosted decision tree models are trained to learn from the data, and a game-theory-based model interpretability study is conducted to understand the origin of the classification power. By learning from data, this analysis recognizes the correlations among reconstruction parameters to further enhance the background rejection performance. By learning from the machine, this analysis reveals the importance of new background categories to reciprocally benefit the standard Majorana analysis. This model is highly compatible with next-generation germanium detector experiments like LEGEND since it can be simultaneously trained on a large number of detectors.  ( 3 min )
    DEVIANT: Depth EquiVarIAnt NeTwork for Monocular 3D Object Detection. (arXiv:2207.10758v1 [cs.CV])
    Modern neural networks use building blocks such as convolutions that are equivariant to arbitrary 2D translations. However, these vanilla blocks are not equivariant to arbitrary 3D translations in the projective manifold. Even then, all monocular 3D detectors use vanilla blocks to obtain the 3D coordinates, a task for which the vanilla blocks are not designed for. This paper takes the first step towards convolutions equivariant to arbitrary 3D translations in the projective manifold. Since the depth is the hardest to estimate for monocular detection, this paper proposes Depth EquiVarIAnt NeTwork (DEVIANT) built with existing scale equivariant steerable blocks. As a result, DEVIANT is equivariant to the depth translations in the projective manifold whereas vanilla networks are not. The additional depth equivariance forces the DEVIANT to learn consistent depth estimates, and therefore, DEVIANT achieves state-of-the-art monocular 3D detection results on KITTI and Waymo datasets in the image-only category and performs competitively to methods using extra information. Moreover, DEVIANT works better than vanilla networks in cross-dataset evaluation. Code and models at https://github.com/abhi1kumar/DEVIANT  ( 2 min )
    GreenDB -- A Dataset and Benchmark for Extraction of Sustainability Information of Consumer Goods. (arXiv:2207.10733v1 [cs.LG])
    The production, shipping, usage, and disposal of consumer goods have a substantial impact on greenhouse gas emissions and the depletion of resources. Machine Learning (ML) can help to foster sustainable consumption patterns by accounting for sustainability aspects in product search or recommendations of modern retail platforms. However, the lack of large high quality publicly available product data with trustworthy sustainability information impedes the development of ML technology that can help to reach our sustainability goals. Here we present GreenDB, a database that collects products from European online shops on a weekly basis. As proxy for the products' sustainability, it relies on sustainability labels, which are evaluated by experts. The GreenDB schema extends the well-known schema.org Product definition and can be readily integrated into existing product catalogs. We present initial results demonstrating that ML models trained with our data can reliably (F1 score 96%) predict the sustainability label of products. These contributions can help to complement existing e-commerce experiences and ultimately encourage users to more sustainable consumption patterns.  ( 2 min )
    Deep Sufficient Representation Learning via Mutual Information. (arXiv:2207.10772v1 [stat.ML])
    We propose a mutual information-based sufficient representation learning (MSRL) approach, which uses the variational formulation of the mutual information and leverages the approximation power of deep neural networks. MSRL learns a sufficient representation with the maximum mutual information with the response and a user-selected distribution. It can easily handle multi-dimensional continuous or categorical response variables. MSRL is shown to be consistent in the sense that the conditional probability density function of the response variable given the learned representation converges to the conditional probability density function of the response variable given the predictor. Non-asymptotic error bounds for MSRL are also established under suitable conditions. To establish the error bounds, we derive a generalized Dudley's inequality for an order-two U-process indexed by deep neural networks, which may be of independent interest. We discuss how to determine the intrinsic dimension of the underlying data distribution. Moreover, we evaluate the performance of MSRL via extensive numerical experiments and real data analysis and demonstrate that MSRL outperforms some existing nonlinear sufficient dimension reduction methods.  ( 2 min )
    Strategising template-guided needle placement for MR-targeted prostate biopsy. (arXiv:2207.10784v1 [cs.LG])
    Clinically significant prostate cancer has a better chance to be sampled during ultrasound-guided biopsy procedures, if suspected lesions found in pre-operative magnetic resonance (MR) images are used as targets. However, the diagnostic accuracy of the biopsy procedure is limited by the operator-dependent skills and experience in sampling the targets, a sequential decision making process that involves navigating an ultrasound probe and placing a series of sampling needles for potentially multiple targets. This work aims to learn a reinforcement learning (RL) policy that optimises the actions of continuous positioning of 2D ultrasound views and biopsy needles with respect to a guiding template, such that the MR targets can be sampled efficiently and sufficiently. We first formulate the task as a Markov decision process (MDP) and construct an environment that allows the targeting actions to be performed virtually for individual patients, based on their anatomy and lesions derived from MR images. A patient-specific policy can thus be optimised, before each biopsy procedure, by rewarding positive sampling in the MDP environment. Experiment results from fifty four prostate cancer patients show that the proposed RL-learned policies obtained a mean hit rate of 93% and an average cancer core length of 11 mm, which compared favourably to two alternative baseline strategies designed by humans, without hand-engineered rewards that directly maximise these clinically relevant metrics. Perhaps more interestingly, it is found that the RL agents learned strategies that were adaptive to the lesion size, where spread of the needles was prioritised for smaller lesions. Such a strategy has not been previously reported or commonly adopted in clinical practice, but led to an overall superior targeting performance when compared with intuitively designed strategies.  ( 3 min )
    Efficient model compression with Random Operation Access Specific Tile (ROAST) hashing. (arXiv:2207.10702v1 [cs.LG])
    Advancements in deep learning are often associated with increasing model sizes. The model size dramatically affects the deployment cost and latency of deep models. For instance, models like BERT cannot be deployed on edge devices and mobiles due to their sheer size. As a result, most advances in Deep Learning are yet to reach the edge. Model compression has sought much-deserved attention in literature across natural language processing, vision, and recommendation domains. This paper proposes a model-agnostic, cache-friendly model compression approach: Random Operation Access Specific Tile (ROAST) hashing. ROAST collapses the parameters by clubbing them through a lightweight mapping. Notably, while clubbing these parameters, ROAST utilizes cache hierarchies by aligning the memory access pattern with the parameter access pattern. ROAST is up to $\sim 25 \times$ faster to train and $\sim 50 \times$ faster to infer than the popular parameter sharing method HashedNet. Additionally, ROAST introduces global weight sharing, which is empirically and theoretically superior to local weight sharing in HashedNet, and can be of independent interest in itself. With ROAST, we present the first compressed BERT, which is $100\times - 1000\times$ smaller but does not result in quality degradation. These compression levels on universal architecture like transformers are promising for the future of SOTA model deployment on resource-constrained devices like mobile and edge devices  ( 3 min )
    Correcting Model Bias with Sparse Implicit Processes. (arXiv:2207.10673v1 [stat.ML])
    Model selection in machine learning (ML) is a crucial part of the Bayesian learning procedure. Model choice may impose strong biases on the resulting predictions, which can hinder the performance of methods such as Bayesian neural networks and neural samplers. On the other hand, newly proposed approaches for Bayesian ML exploit features of approximate inference in function space with implicit stochastic processes (a generalization of Gaussian processes). The approach of Sparse Implicit Processes (SIP) is particularly successful in this regard, since it is fully trainable and achieves flexible predictions. Here, we expand on the original experiments to show that SIP is capable of correcting model bias when the data generating mechanism differs strongly from the one implied by the model. We use synthetic datasets to show that SIP is capable of providing predictive distributions that reflect the data better than the exact predictions of the initial, but wrongly assumed model.  ( 2 min )
    A Transferable Recommender Approach for Selecting the Best Density Functional Approximations in Chemical Discovery. (arXiv:2207.10747v1 [physics.chem-ph])
    Approximate density functional theory (DFT) has become indispensable owing to its cost-accuracy trade-off in comparison to more computationally demanding but accurate correlated wavefunction theory. To date, however, no single density functional approximation (DFA) with universal accuracy has been identified, leading to uncertainty in the quality of data generated from DFT. With electron density fitting and transfer learning, we build a DFA recommender that selects the DFA with the lowest expected error with respect to gold standard but cost-prohibitive coupled cluster theory in a system-specific manner. We demonstrate this recommender approach on vertical spin-splitting energy evaluation for challenging transition metal complexes. Our recommender predicts top-performing DFAs and yields excellent accuracy (ca. 2 kcal/mol) for chemical discovery, outperforming both individual transfer learning models and the single best functional in a set of 48 DFAs. We demonstrate the transferability of the DFA recommender to experimentally synthesized compounds with distinct chemistry.  ( 2 min )
    Improved Generalization Guarantees in Restricted Data Models. (arXiv:2207.10668v1 [cs.CR])
    Differential privacy is known to protect against threats to validity incurred due to adaptive, or exploratory, data analysis -- even when the analyst adversarially searches for a statistical estimate that diverges from the true value of the quantity of interest on the underlying population. The cost of this protection is the accuracy loss incurred by differential privacy. In this work, inspired by standard models in the genomics literature, we consider data models in which individuals are represented by a sequence of attributes with the property that where distant attributes are only weakly correlated. We show that, under this assumption, it is possible to "re-use" privacy budget on different portions of the data, significantly improving accuracy without increasing the risk of overfitting.  ( 2 min )
    Delayed Feedback in Generalised Linear Bandits Revisited. (arXiv:2207.10786v1 [cs.LG])
    The stochastic generalised linear bandit is a well-understood model for sequential decision-making problems, with many algorithms achieving near-optimal regret guarantees under immediate feedback. However, in many real world settings, the requirement that the reward is observed immediately is not applicable. In this setting, standard algorithms are no longer theoretically understood. We study the phenomenon of delayed rewards in a theoretical manner by introducing a delay between selecting an action and receiving the reward. Subsequently, we show that an algorithm based on the optimistic principle improves on existing approaches for this setting by eliminating the need for prior knowledge of the delay distribution and relaxing assumptions on the decision set and the delays. This also leads to improving the regret guarantees from $ \widetilde O(\sqrt{dT}\sqrt{d + \mathbb{E}[\tau]})$ to $ \widetilde O(d\sqrt{T} + d^{3/2}\mathbb{E}[\tau])$, where $\mathbb{E}[\tau]$ denotes the expected delay, $d$ is the dimension and $T$ the time horizon and we have suppressed logarithmic terms. We verify our theoretical results through experiments on simulated data.  ( 2 min )
    Data-Driven Stochastic AC-OPF using Gaussian Processes. (arXiv:2207.10781v1 [stat.ML])
    In recent years, electricity generation has been responsible for more than a quarter of the greenhouse gas emissions in the US. Integrating a significant amount of renewables into a power grid is probably the most accessible way to reduce carbon emissions from power grids and slow down climate change. Unfortunately, the most accessible renewable power sources, such as wind and solar, are highly fluctuating and thus bring a lot of uncertainty to power grid operations and challenge existing optimization and control policies. The chance-constrained alternating current (AC) optimal power flow (OPF) framework finds the minimum cost generation dispatch maintaining the power grid operations within security limits with a prescribed probability. Unfortunately, the AC-OPF problem's chance-constrained extension is non-convex, computationally challenging, and requires knowledge of system parameters and additional assumptions on the behavior of renewable distribution. Known linear and convex approximations to the above problems, though tractable, are too conservative for operational practice and do not consider uncertainty in system parameters. This paper presents an alternative data-driven approach based on Gaussian process (GP) regression to close this gap. The GP approach learns a simple yet non-convex data-driven approximation to the AC power flow equations that can incorporate uncertainty inputs. The latter is then used to determine the solution of CC-OPF efficiently, by accounting for both input and parameter uncertainty. The practical efficiency of the proposed approach using different approximations for GP-uncertainty propagation is illustrated over numerous IEEE test cases.  ( 3 min )
    A machine learning based approach to gravitational lens identification with the International LOFAR Telescope. (arXiv:2207.10698v1 [astro-ph.GA])
    We present a novel machine learning based approach for detecting galaxy-scale gravitational lenses from interferometric data, specifically those taken with the International LOFAR Telescope (ILT), which is observing the northern radio sky at a frequency of 150 MHz, an angular resolution of 350 mas and a sensitivity of 90 uJy beam-1 (1 sigma). We develop and test several Convolutional Neural Networks to determine the probability and uncertainty of a given sample being classified as a lensed or non-lensed event. By training and testing on a simulated interferometric imaging data set that includes realistic lensed and non-lensed radio sources, we find that it is possible to recover 95.3 per cent of the lensed samples (true positive rate), with a contamination of just 0.008 per cent from non-lensed samples (false positive rate). Taking the expected lensing probability into account results in a predicted sample purity for lensed events of 92.2 per cent. We find that the network structure is most robust when the maximum image separation between the lensed images is greater than 3 times the synthesized beam size, and the lensed images have a total flux density that is equivalent to at least a 20 sigma (point-source) detection. For the ILT, this corresponds to a lens sample with Einstein radii greater than 0.5 arcsec and a radio source population with 150 MHz flux densities more than 2 mJy. By applying these criteria and our lens detection algorithm we expect to discover the vast majority of galaxy-scale gravitational lens systems contained within the LOFAR Two Metre Sky Survey.  ( 3 min )
    The trade-offs of model size in large recommendation models : A 10000 $\times$ compressed criteo-tb DLRM model (100 GB parameters to mere 10MB). (arXiv:2207.10731v1 [cs.LG])
    Embedding tables dominate industrial-scale recommendation model sizes, using up to terabytes of memory. A popular and the largest publicly available machine learning MLPerf benchmark on recommendation data is a Deep Learning Recommendation Model (DLRM) trained on a terabyte of click-through data. It contains 100GB of embedding memory (25+Billion parameters). DLRMs, due to their sheer size and the associated volume of data, face difficulty in training, deploying for inference, and memory bottlenecks due to large embedding tables. This paper analyzes and extensively evaluates a generic parameter sharing setup (PSS) for compressing DLRM models. We show theoretical upper bounds on the learnable memory requirements for achieving $(1 \pm \epsilon)$ approximations to the embedding table. Our bounds indicate exponentially fewer parameters suffice for good accuracy. To this end, we demonstrate a PSS DLRM reaching 10000$\times$ compression on criteo-tb without losing quality. Such a compression, however, comes with a caveat. It requires 4.5 $\times$ more iterations to reach the same saturation quality. The paper argues that this tradeoff needs more investigations as it might be significantly favorable. Leveraging the small size of the compressed model, we show a 4.3$\times$ improvement in training latency leading to similar overall training times. Thus, in the tradeoff between system advantage of a small DLRM model vs. slower convergence, we show that scales are tipped towards having a smaller DLRM model, leading to faster inference, easier deployment, and similar training times.  ( 3 min )
    Understanding High Dimensional Spaces through Visual Means Employing Multidimensional Projections. (arXiv:2207.10800v1 [cs.HC])
    Data visualisation helps understanding data represented by multiple variables, also called features, stored in a large matrix where individuals are stored in lines and variable values in columns. These data structures are frequently called multidimensional spaces.In this paper, we illustrate ways of employing the visual results of multidimensional projection algorithms to understand and fine-tune the parameters of their mathematical framework. Some of the common mathematical common to these approaches are Laplacian matrices, Euclidian distance, Cosine distance, and statistical methods such as Kullback-Leibler divergence, employed to fit probability distributions and reduce dimensions. Two of the relevant algorithms in the data visualisation field are t-distributed stochastic neighbourhood embedding (t-SNE) and Least-Square Projection (LSP). These algorithms can be used to understand several ranges of mathematical functions including their impact on datasets. In this article, mathematical parameters of underlying techniques such as Principal Component Analysis (PCA) behind t-SNE and mesh reconstruction methods behind LSP are adjusted to reflect the properties afforded by the mathematical formulation. The results, supported by illustrative methods of the processes of LSP and t-SNE, are meant to inspire students in understanding the mathematics behind such methods, in order to apply them in effective data analysis tasks in multiple applications.  ( 2 min )
    Synthetic Dataset Generation for Adversarial Machine Learning Research. (arXiv:2207.10719v1 [cs.CV])
    Existing adversarial example research focuses on digitally inserted perturbations on top of existing natural image datasets. This construction of adversarial examples is not realistic because it may be difficult, or even impossible, for an attacker to deploy such an attack in the real-world due to sensing and environmental effects. To better understand adversarial examples against cyber-physical systems, we propose approximating the real-world through simulation. In this paper we describe our synthetic dataset generation tool that enables scalable collection of such a synthetic dataset with realistic adversarial examples. We use the CARLA simulator to collect such a dataset and demonstrate simulated attacks that undergo the same environmental transforms and processing as real-world images. Our tools have been used to collect datasets to help evaluate the efficacy of adversarial examples, and can be found at https://github.com/carla-simulator/carla/pull/4992.  ( 2 min )
    ME-GAN: Learning Panoptic Electrocardio Representations for Multi-view ECG Synthesis Conditioned on Heart Diseases. (arXiv:2207.10670v1 [cs.LG])
    Electrocardiogram (ECG) is a widely used non-invasive diagnostic tool for heart diseases. Many studies have devised ECG analysis models (e.g., classifiers) to assist diagnosis. As an upstream task, researches have built generative models to synthesize ECG data, which are beneficial to providing training samples, privacy protection, and annotation reduction. However, previous generative methods for ECG often neither synthesized multi-view data, nor dealt with heart disease conditions. In this paper, we propose a novel disease-aware generative adversarial network for multi-view ECG synthesis called ME-GAN, which attains panoptic electrocardio representations conditioned on heart diseases and projects the representations onto multiple standard views to yield ECG signals. Since ECG manifestations of heart diseases are often localized in specific waveforms, we propose a new "mixup normalization" to inject disease information precisely into suitable locations. In addition, we propose a view discriminator to revert disordered ECG views into a pre-determined order, supervising the generator to obtain ECG representing correct view characteristics. Besides, a new metric, rFID, is presented to assess the quality of the synthesized ECG signals. Comprehensive experiments verify that our ME-GAN performs well on multi-view ECG signal synthesis with trusty morbid manifestations.  ( 3 min )
    Context-aware controller inference for stabilizing dynamical systems from scarce data. (arXiv:2207.11049v1 [math.OC])
    This work introduces a data-driven control approach for stabilizing high-dimensional dynamical systems from scarce data. The proposed context-aware controller inference approach is based on the observation that controllers need to act locally only on the unstable dynamics to stabilize systems. This means it is sufficient to learn the unstable dynamics alone, which are typically confined to much lower dimensional spaces than the high-dimensional state spaces of all system dynamics and thus few data samples are sufficient to identify them. Numerical experiments demonstrate that context-aware controller inference learns stabilizing controllers from orders of magnitude fewer data samples than traditional data-driven control techniques and variants of reinforcement learning. The experiments further show that the low data requirements of context-aware controller inference are especially beneficial in data-scarce engineering problems with complex physics, for which learning complete system dynamics is often intractable in terms of data and training costs.  ( 2 min )
    Flow Moods: Recommending Music by Moods on Deezer. (arXiv:2207.11229v1 [cs.IR])
    The music streaming service Deezer extensively relies on its Flow algorithm, which generates personalized radio-style playlists of songs, to help users discover musical content. Nonetheless, despite promising results over the past years, Flow used to ignore the moods of users when providing recommendations. In this paper, we present Flow Moods, an improved version of Flow that addresses this limitation. Flow Moods leverages collaborative filtering, audio content analysis, and mood annotations from professional music curators to generate personalized mood-specific playlists at scale. We detail the motivations, the development, and the deployment of this system on Deezer. Since its release in 2021, Flow Moods has been recommending music by moods to millions of users every day.  ( 2 min )
  • Open

    Correcting Model Bias with Sparse Implicit Processes. (arXiv:2207.10673v1 [stat.ML])
    Model selection in machine learning (ML) is a crucial part of the Bayesian learning procedure. Model choice may impose strong biases on the resulting predictions, which can hinder the performance of methods such as Bayesian neural networks and neural samplers. On the other hand, newly proposed approaches for Bayesian ML exploit features of approximate inference in function space with implicit stochastic processes (a generalization of Gaussian processes). The approach of Sparse Implicit Processes (SIP) is particularly successful in this regard, since it is fully trainable and achieves flexible predictions. Here, we expand on the original experiments to show that SIP is capable of correcting model bias when the data generating mechanism differs strongly from the one implied by the model. We use synthetic datasets to show that SIP is capable of providing predictive distributions that reflect the data better than the exact predictions of the initial, but wrongly assumed model.  ( 2 min )
    Post-training Quantization for Neural Networks with Provable Guarantees. (arXiv:2201.11113v2 [cs.LG] UPDATED)
    While neural networks have been remarkably successful in a wide array of applications, implementing them in resource-constrained hardware remains an area of intense research. By replacing the weights of a neural network with quantized (e.g., 4-bit, or binary) counterparts, massive savings in computation cost, memory, and power consumption are attained. To that end, we generalize a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism. Among other things, we propose modifications to promote sparsity of the weights, and rigorously analyze the associated error. Additionally, our error analysis expands the results of previous work on GPFQ to handle general quantization alphabets, showing that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights -- i.e., level of over-parametrization. Our result holds across a range of input distributions and for both fully-connected and convolutional architectures thereby also extending previous results. To empirically evaluate the method, we quantize several common architectures with few bits per weight, and test them on ImageNet, showing only minor loss of accuracy compared to unquantized models. We also demonstrate that standard modifications, such as bias correction and mixed precision quantization, further improve accuracy.
    Technical Reports Compilation: Detecting the Fire Drill Anti-pattern Using Source Code and Issue-Tracking Data. (arXiv:2104.15090v7 [cs.SE] UPDATED)
    Detecting the presence of project management anti-patterns (AP) currently requires experts on the matter and is an expensive endeavor. Worse, experts may introduce their individual subjectivity or bias. Using the Fire Drill AP, we first introduce a novel way to translate descriptions into detectable AP that are comprised of arbitrary metrics and events such as logged time or maintenance activities, which are mined from the underlying source code or issue-tracking data, thus making the description objective as it becomes data-based. Secondly, we demonstrate a novel method to quantify and score the deviations of real-world projects to data-based AP descriptions. Using fifteen real-world projects that exhibit a Fire Drill to some degree, we show how to further enhance the translated AP. The ground truth in these projects was extracted from two individual experts and consensus was found between them. Our evaluation spans four kinds of patterns, where the first is purely derived from description, the second type is enhanced by data, and the third kind is derived from data only. The fourth type then is a derivative meta-process pattern. We introduce a novel method called automatic calibration, that optimizes a pattern such that only necessary and important scores remain that suffice to confidently detect the degree to which the AP is present. Without automatic calibration, the proposed patterns show only weak potential for detecting the presence. Enriching the AP with data from real-world projects significantly improves the potential. We conclude that the presence of similar patterns is most certainly detectable. Furthermore, any pattern that can be characteristically modeled using the proposed approach is potentially well detectable.
    Automatic Termination for Hyperparameter Optimization. (arXiv:2104.08166v4 [cs.LG] UPDATED)
    Bayesian optimization (BO) is a widely popular approach for the hyperparameter optimization (HPO) in machine learning. At its core, BO iteratively evaluates promising configurations until a user-defined budget, such as wall-clock time or number of iterations, is exhausted. While the final performance after tuning heavily depends on the provided budget, it is hard to pre-specify an optimal value in advance. In this work, we propose an effective and intuitive termination criterion for BO that automatically stops the procedure if it is sufficiently close to the global optimum. Our key insight is that the discrepancy between the true objective (predictive performance on test data) and the computable target (validation performance) suggests stopping once the suboptimality in optimizing the target is dominated by the statistical estimation error. Across an extensive range of real-world HPO problems and baselines, we show that our termination criterion achieves a better trade-off between the test performance and optimization time. Additionally, we find that overfitting may occur in the context of HPO, which is arguably an overlooked problem in the literature, and show how our termination criterion helps to mitigate this phenomenon on both small and large datasets.
    Function-space Inference with Sparse Implicit Processes. (arXiv:2110.07618v3 [stat.ML] UPDATED)
    Implicit Processes (IPs) represent a flexible framework that can be used to describe a wide variety of models, from Bayesian neural networks, neural samplers and data generators to many others. IPs also allow for approximate inference in function-space. This change of formulation solves intrinsic degenerate problems of parameter-space approximate inference concerning the high number of parameters and their strong dependencies in large models. For this, previous works in the literature have attempted to employ IPs both to set up the prior and to approximate the resulting posterior. However, this has proven to be a challenging task. Existing methods that can tune the prior IP result in a Gaussian predictive distribution, which fails to capture important data patterns. By contrast, methods producing flexible predictive distributions by using another IP to approximate the posterior process cannot tune the prior IP to the observed data. We propose here the first method that can accomplish both goals. For this, we rely on an inducing-point representation of the prior IP, as often done in the context of sparse Gaussian processes. The result is a scalable method for approximate inference with IPs that can tune the prior IP parameters to the data, and that provides accurate non-Gaussian predictive distributions.  ( 3 min )
    Relaxed Gaussian process interpolation: a goal-oriented approach to Bayesian optimization. (arXiv:2206.03034v2 [stat.CO] UPDATED)
    This work presents a new procedure for obtaining predictive distributions in the context of Gaussian process (GP) modeling, with a relaxation of the interpolation constraints outside some ranges of interest: the mean of the predictive distributions no longer necessarily interpolates the observed values when they are outside ranges of interest, but are simply constrained to remain outside. This method called relaxed Gaussian process (reGP) interpolation provides better predictive distributions in ranges of interest, especially in cases where a stationarity assumption for the GP model is not appropriate. It can be viewed as a goal-oriented method and becomes particularly interesting in Bayesian optimization, for example, for the minimization of an objective function, where good predictive distributions for low function values are important. When the expected improvement criterion and reGP are used for sequentially choosing evaluation points, the convergence of the resulting optimization algorithm is theoretically guaranteed (provided that the function to be optimized lies in the reproducing kernel Hilbert spaces attached to the known covariance of the underlying Gaussian process). Experiments indicate that using reGP instead of stationary GP models in Bayesian optimization is beneficial.  ( 3 min )
    Fast Bayesian Coresets via Subsampling and Quasi-Newton Refinement. (arXiv:2203.09675v2 [stat.ML] UPDATED)
    Bayesian coresets approximate a posterior distribution by building a small weighted subset of the data points. Any inference procedure that is too computationally expensive to be run on the full posterior can instead be run inexpensively on the coreset, with results that approximate those on the full data. However, current approaches are limited by either a significant run-time or the need for the user to specify a low-cost approximation to the full posterior. We propose a Bayesian coreset construction algorithm that first selects a uniformly random subset of data, and then optimizes the weights using a novel quasi-Newton method. Our algorithm is a simple to implement, black-box method, that does not require the user to specify a low-cost posterior approximation. It is the first to come with a general high-probability bound on the KL divergence of the output coreset posterior. Experiments demonstrate that our method provides significant improvements in coreset quality against alternatives with comparable construction times, with far less storage cost and user input required.  ( 2 min )
    Tight bounds on the hardness of learning simple nonparametric mixtures. (arXiv:2203.15150v2 [cs.LG] UPDATED)
    We study the problem of learning nonparametric distributions in a finite mixture, and establish tight bounds on the sample complexity for learning the component distributions in such models. Namely, we are given i.i.d. samples from a pdf $f$ where $$ f=\sum_{i=1}^k w_i f_i, \quad\sum_{i=1}^k w_i=1, \quad w_i>0 $$ and we are interested in learning each component $f_i$. Without any assumptions on $f_i$, this problem is ill-posed. In order to identify the components $f_i$, we assume that each $f_i$ can be written as a convolution of a Gaussian and a compactly supported density $\nu_i$ with $\text{supp}(\nu_i)\cap \text{supp}(\nu_j)=\emptyset$. Our main result shows that $(\frac{1}{\varepsilon})^{\Omega(\log\log \frac{1}{\varepsilon})}$ samples are required for estimating each $f_i$. Unlike parametric mixtures, the difficulty does not arise from the order $k$ or small weights $w_i$, and unlike nonparametric density estimation it does not arise from the curse of dimensionality, irregularity, or inhomogeneity. The proof relies on a fast rate for approximation with Gaussians, which may be of independent interest. To show this is tight, we also propose an algorithm that uses $(\frac{1}{\varepsilon})^{O(\log\log \frac{1}{\varepsilon})}$ samples to estimate each $f_i$. Unlike existing approaches to learning latent variable models based on moment-matching and tensor methods, our proof instead involves a delicate analysis of an ill-conditioned linear system via orthogonal functions. Combining these bounds, we conclude that the optimal sample complexity of this problem properly lies in between polynomial and exponential, which is not common in learning theory.  ( 3 min )
    High dimensional stochastic linear contextual bandit with missing covariates. (arXiv:2207.11165v1 [stat.ML])
    Recent works in bandit problems adopted lasso convergence theory in the sequential decision-making setting. Even with fully observed contexts, there are technical challenges that hinder the application of existing lasso convergence theory: 1) proving the restricted eigenvalue condition under conditionally sub-Gaussian noise and 2) accounting for the dependence between the context variables and the chosen actions. This paper studies the effect of missing covariates on regret for stochastic linear bandit algorithms. Our work provides a high-probability upper bound on the regret incurred by the proposed algorithm in terms of covariate sampling probabilities, showing that the regret degrades due to missingness by at most $\zeta_{min}^2$, where $\zeta_{min}$ is the minimum probability of observing covariates in the context vector. We illustrate our algorithm for the practical application of experimental design for collecting gene expression data by a sequential selection of class discriminating DNA probes.  ( 2 min )
    Doubly-Valid/Doubly-Sharp Sensitivity Analysis for Causal Inference with Unmeasured Confounding. (arXiv:2112.11449v2 [stat.ME] UPDATED)
    We consider the problem of constructing bounds on the average treatment effect (ATE) when unmeasured confounders exist but have bounded influence. Specifically, we assume that omitted confounders could not change the odds of treatment for any unit by more than a fixed factor. We derive the sharp partial identification bounds implied by this assumption by leveraging distributionally robust optimization, and we propose estimators of these bounds with several novel robustness properties. The first is double sharpness: our estimators consistently estimate the sharp ATE bounds when one of two nuisance parameters is misspecified and achieve semiparametric efficiency when all nuisance parameters are suitably consistent. The second is double validity: even when most nuisance parameters are misspecified, our estimators still provide valid but possibly conservative bounds for the ATE and our Wald confidence intervals remain valid even when our estimators are not asymptotically normal. As a result, our estimators provide a highly credible method for sensitivity analysis of causal inferences.  ( 2 min )
    Deep Sufficient Representation Learning via Mutual Information. (arXiv:2207.10772v1 [stat.ML])
    We propose a mutual information-based sufficient representation learning (MSRL) approach, which uses the variational formulation of the mutual information and leverages the approximation power of deep neural networks. MSRL learns a sufficient representation with the maximum mutual information with the response and a user-selected distribution. It can easily handle multi-dimensional continuous or categorical response variables. MSRL is shown to be consistent in the sense that the conditional probability density function of the response variable given the learned representation converges to the conditional probability density function of the response variable given the predictor. Non-asymptotic error bounds for MSRL are also established under suitable conditions. To establish the error bounds, we derive a generalized Dudley's inequality for an order-two U-process indexed by deep neural networks, which may be of independent interest. We discuss how to determine the intrinsic dimension of the underlying data distribution. Moreover, we evaluate the performance of MSRL via extensive numerical experiments and real data analysis and demonstrate that MSRL outperforms some existing nonlinear sufficient dimension reduction methods.  ( 2 min )
    Modeling Randomly Walking Volatility with Chained Gamma Distributions. (arXiv:2207.01151v2 [q-fin.CP] UPDATED)
    Volatility clustering is a common phenomenon in financial time series. Typically, linear models can be used to describe the temporal autocorrelation of the (logarithmic) variance of returns. Considering the difficulty in estimating this model, we construct a Dynamic Bayesian Network, which utilizes the conjugate prior relation of normal-gamma and gamma-gamma, so that its posterior form locally remains unchanged at each node. This makes it possible to find approximate solutions using variational methods quickly. Furthermore, we ensure that the volatility expressed by the model is an independent incremental process after inserting dummy gamma nodes between adjacent time steps. We have found that this model has two advantages: 1) It can be proved that it can express heavier tails than Gaussians, i.e., have positive excess kurtosis, compared to popular linear models. 2) If the variational inference(VI) is used for state estimation, it runs much faster than Monte Carlo(MC) methods since the calculation of the posterior uses only basic arithmetic operations. And its convergence process is deterministic. We tested the model, named Gam-Chain, using recent Crypto, Nasdaq, and Forex records of varying resolutions. The results show that: 1) In the same case of using MC, this model can achieve comparable state estimation results with the regular lognormal chain. 2) In the case of only using VI, this model can obtain accuracy that are slightly worse than MC, but still acceptable in practice; 3) Only using VI, the running time of Gam-Chain, under the most conservative settings, can be reduced to below 20% of that based on the lognormal chain via MC.  ( 3 min )
    Generalized Identifiability Bounds for Mixture Models with Grouped Samples. (arXiv:2207.11164v1 [math.ST])
    Recent work has shown that finite mixture models with $m$ components are identifiable, while making no assumptions on the mixture components, so long as one has access to groups of samples of size $2m-1$ which are known to come from the same mixture component. In this work we generalize that result and show that, if every subset of $k$ mixture components of a mixture model are linearly independent, then that mixture model is identifiable with only $(2m-1)/(k-1)$ samples per group. We further show that this value cannot be improved. We prove an analogous result for a stronger form of identifiability known as "determinedness" along with a corresponding lower bound. This independence assumption almost surely holds if mixture components are chosen randomly from a $k$-dimensional space. We describe some implications of our results for multinomial mixture models and topic modeling.  ( 2 min )
    Statistical Hypothesis Testing Based on Machine Learning: Large Deviations Analysis. (arXiv:2207.10939v1 [stat.ML])
    We study the performance -- and specifically the rate at which the error probability converges to zero -- of Machine Learning (ML) classification techniques. Leveraging the theory of large deviations, we provide the mathematical conditions for a ML classifier to exhibit error probabilities that vanish exponentially, say $\sim \exp\left(-n\,I + o(n) \right)$, where $n$ is the number of informative observations available for testing (or another relevant parameter, such as the size of the target in an image) and $I$ is the error rate. Such conditions depend on the Fenchel-Legendre transform of the cumulant-generating function of the Data-Driven Decision Function (D3F, i.e., what is thresholded before the final binary decision is made) learned in the training phase. As such, the D3F and, consequently, the related error rate $I$, depend on the given training set, which is assumed of finite size. Interestingly, these conditions can be verified and tested numerically exploiting the available dataset, or a synthetic dataset, generated according to the available information on the underlying statistical model. In other words, the classification error probability convergence to zero and its rate can be computed on a portion of the dataset available for training. Coherently with the large deviations theory, we can also establish the convergence, for $n$ large enough, of the normalized D3F statistic to a Gaussian distribution. This property is exploited to set a desired asymptotic false alarm probability, which empirically turns out to be accurate even for quite realistic values of $n$. Furthermore, approximate error probability curves $\sim \zeta_n \exp\left(-n\,I \right)$ are provided, thanks to the refined asymptotic derivation (often referred to as exact asymptotics), where $\zeta_n$ represents the most representative sub-exponential terms of the error probabilities.  ( 3 min )
    Relaxing the I.I.D. Assumption: Adaptively Minimax Optimal Regret via Root-Entropic Regularization. (arXiv:2007.06552v3 [stat.ML] UPDATED)
    We consider prediction with expert advice when data are generated from distributions varying arbitrarily within an unknown constraint set. This semi-adversarial setting includes (at the extremes) the classical i.i.d. setting, when the unknown constraint set is restricted to be a singleton, and the unconstrained adversarial setting, when the constraint set is the set of all distributions. The Hedge algorithm -- long known to be minimax (rate) optimal in the adversarial regime -- was recently shown to be simultaneously minimax optimal for i.i.d. data. In this work, we propose to relax the i.i.d. assumption by seeking adaptivity at all levels of a natural ordering on constraint sets. We provide matching upper and lower bounds on the minimax regret at all levels, show that Hedge with deterministic learning rates is suboptimal outside of the extremes, and prove that one can adaptively obtain minimax regret at all levels. We achieve this optimal adaptivity using the follow-the-regularized-leader (FTRL) framework, with a novel adaptive regularization scheme that implicitly scales as the square root of the entropy of the current predictive distribution, rather than the entropy of the initial predictive distribution. Finally, we provide novel technical tools to study the statistical performance of FTRL along the semi-adversarial spectrum.  ( 3 min )
    Deriving discriminative classifiers from generative models. (arXiv:2201.00844v2 [stat.ML] UPDATED)
    We deal with Bayesian generative and discriminative classifiers. Given a model distribution $p(x, y)$, with the observation $y$ and the target $x$, one computes generative classifiers by firstly considering $p(x, y)$ and then using the Bayes rule to calculate $p(x | y)$. A discriminative model is directly given by $p(x | y)$, which is used to compute discriminative classifiers. However, recent works showed that the Bayesian Maximum Posterior classifier defined from the Naive Bayes (NB) or Hidden Markov Chain (HMC), both generative models, can also match the discriminative classifier definition. Thus, there are situations in which dividing classifiers into "generative" and "discriminative" is somewhat misleading. Indeed, such a distinction is rather related to the way of computing classifiers, not to the classifiers themselves. We present a general theoretical result specifying how a generative classifier induced from a generative model can also be computed in a discriminative way from the same model. Examples of NB and HMC are found again as particular cases, and we apply the general result to two original extensions of NB, and two extensions of HMC, one of which being original. Finally, we shortly illustrate the interest of the new discriminative way of computing classifiers in the Natural Language Processing (NLP) framework.  ( 3 min )
    Delayed Feedback in Generalised Linear Bandits Revisited. (arXiv:2207.10786v1 [cs.LG])
    The stochastic generalised linear bandit is a well-understood model for sequential decision-making problems, with many algorithms achieving near-optimal regret guarantees under immediate feedback. However, in many real world settings, the requirement that the reward is observed immediately is not applicable. In this setting, standard algorithms are no longer theoretically understood. We study the phenomenon of delayed rewards in a theoretical manner by introducing a delay between selecting an action and receiving the reward. Subsequently, we show that an algorithm based on the optimistic principle improves on existing approaches for this setting by eliminating the need for prior knowledge of the delay distribution and relaxing assumptions on the decision set and the delays. This also leads to improving the regret guarantees from $ \widetilde O(\sqrt{dT}\sqrt{d + \mathbb{E}[\tau]})$ to $ \widetilde O(d\sqrt{T} + d^{3/2}\mathbb{E}[\tau])$, where $\mathbb{E}[\tau]$ denotes the expected delay, $d$ is the dimension and $T$ the time horizon and we have suppressed logarithmic terms. We verify our theoretical results through experiments on simulated data.  ( 2 min )
    Fairness-aware Network Revenue Management with Demand Learning. (arXiv:2207.11159v1 [stat.ML])
    In addition to maximizing the total revenue, decision-makers in lots of industries would like to guarantee fair consumption across different resources and avoid saturating certain resources. Motivated by these practical needs, this paper studies the price-based network revenue management problem with both demand learning and fairness concern about the consumption across different resources. We introduce the regularized revenue, i.e., the total revenue with a fairness regularization, as our objective to incorporate fairness into the revenue maximization goal. We propose a primal-dual-type online policy with the Upper-Confidence-Bound (UCB) demand learning method to maximize the regularized revenue. We adopt several innovative techniques to make our algorithm a unified and computationally efficient framework for the continuous price set and a wide class of fairness regularizers. Our algorithm achieves a worst-case regret of $\tilde O(N^{5/2}\sqrt{T})$, where $N$ denotes the number of products and $T$ denotes the number of time periods. Numerical experiments in a few NRM examples demonstrate the effectiveness of our algorithm for balancing revenue and fairness.  ( 2 min )
    Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks. (arXiv:2201.11729v4 [cs.LG] UPDATED)
    In the pursuit of explaining implicit regularization in deep learning, prominent focus was given to matrix and tensor factorizations, which correspond to simplified neural networks. It was shown that these models exhibit an implicit tendency towards low matrix and tensor ranks, respectively. Drawing closer to practical deep learning, the current paper theoretically analyzes the implicit regularization in hierarchical tensor factorization, a model equivalent to certain deep convolutional neural networks. Through a dynamical systems lens, we overcome challenges associated with hierarchy, and establish implicit regularization towards low hierarchical tensor rank. This translates to an implicit regularization towards locality for the associated convolutional networks. Inspired by our theory, we design explicit regularization discouraging locality, and demonstrate its ability to improve the performance of modern convolutional networks on non-local tasks, in defiance of conventional wisdom by which architectural changes are needed. Our work highlights the potential of enhancing neural networks via theoretical analysis of their implicit regularization.  ( 3 min )
    VTrackIt: A Synthetic Self-Driving Dataset with Infrastructure and Pooled Vehicle Information. (arXiv:2207.11146v1 [cs.CV])
    Artificial intelligence solutions for Autonomous Vehicles (AVs) have been developed using publicly available datasets such as Argoverse, ApolloScape, Level5, and NuScenes. One major limitation of these datasets is the absence of infrastructure and/or pooled vehicle information like lane line type, vehicle speed, traffic signs, and intersections. Such information is necessary and not complementary to eliminating high-risk edge cases. The rapid advancements in Vehicle-to-Infrastructure and Vehicle-to-Vehicle technologies show promise that infrastructure and pooled vehicle information will soon be accessible in near real-time. Taking a leap in the future, we introduce the first comprehensive synthetic dataset with intelligent infrastructure and pooled vehicle information for advancing the next generation of AVs, named VTrackIt. We also introduce the first deep learning model (InfraGAN) for trajectory predictions that considers such information. Our experiments with InfraGAN show that the comprehensive information offered by VTrackIt reduces the number of high-risk edge cases. The VTrackIt dataset is available upon request under the Creative Commons CC BY-NC-SA 4.0 license at this http URL  ( 2 min )
    Twitmo: A Twitter Data Topic Modeling and Visualization Package for R. (arXiv:2207.11236v1 [cs.IR])
    We present Twitmo, a package that provides a broad range of methods to collect, pre-process, analyze and visualize geo-tagged Twitter data. Twitmo enables the user to collect geo-tagged Tweets from Twitter and and provides a comprehensive and user-friendly toolbox to generate topic distributions from Latent Dirichlet Allocations (LDA), correlated topic models (CTM) and structural topic models (STM). Functions are included for pre-processing of text, model building and prediction. In addition, one of the innovations of the package is the automatic pooling of Tweets into longer pseudo-documents using hashtags and cosine similarities for better topic coherence. The package additionally comes with functionality to visualize collected data sets and fitted models in static as well as interactive ways and offers built-in support for model visualizations via LDAvis providing great convenience for researchers in this area. The Twitmo package is an innovative toolbox that can be used to analyze public discourse of various topics, political parties or persons of interest in space and time.  ( 2 min )
    Multiple Robust Learning for Recommendation. (arXiv:2207.10796v1 [cs.IR])
    In recommender systems, a common problem is the presence of various biases in the collected data, which deteriorates the generalization ability of the recommendation models and leads to inaccurate predictions. Doubly robust (DR) learning has been studied in many tasks in RS, with the advantage that unbiased learning can be achieved when either a single imputation or a single propensity model is accurate. In this paper, we propose a multiple robust (MR) estimator that can take the advantage of multiple candidate imputation and propensity models to achieve unbiasedness. Specifically, the MR estimator is unbiased when any of the imputation or propensity models, or a linear combination of these models is accurate. Theoretical analysis shows that the proposed MR is an enhanced version of DR when only having a single imputation and propensity model, and has a smaller bias. Inspired by the generalization error bound of MR, we further propose a novel multiple robust learning approach with stabilization. We conduct extensive experiments on real-world and semi-synthetic datasets, which demonstrates the superiority of the proposed approach over state-of-the-art methods.  ( 2 min )
    Optimal Model Averaging of Support Vector Machines in Diverging Model Spaces. (arXiv:2112.12961v3 [stat.ML] UPDATED)
    Support vector machine (SVM) is a powerful classification method that has achieved great success in many fields. Since its performance can be seriously impaired by redundant covariates, model selection techniques are widely used for SVM with high dimensional covariates. As an alternative to model selection, significant progress has been made in the area of model averaging in the past decades. Yet no frequentist model averaging method was considered for SVM. This work aims to fill the gap and to propose a frequentist model averaging procedure for SVM which selects the optimal weight by cross validation. Even when the number of covariates diverges at an exponential rate of the sample size, we show asymptotic optimality of the proposed method in the sense that the ratio of its hinge loss to the lowest possible loss converges to one. We also derive the convergence rate which provides more insights to model averaging. Compared to model selection methods of SVM which require a tedious but critical task of tuning parameter selection, the model averaging method avoids the task and shows promising performances in the empirical studies.  ( 3 min )
    JAWS: Predictive Inference Under Covariate Shift. (arXiv:2207.10716v1 [cs.LG])
    We propose \textbf{JAWS}, a series of wrapper methods for distribution-free uncertainty quantification tasks under covariate shift, centered on our core method \textbf{JAW}, the \textbf{JA}ckknife+ \textbf{W}eighted with likelihood-ratio weights. JAWS also includes computationally efficient \textbf{A}pproximations of JAW using higher-order influence functions: \textbf{JAWA}. Theoretically, we show that JAW relaxes the jackknife+'s assumption of data exchangeability to achieve the same finite-sample coverage guarantee even under covariate shift. JAWA further approaches the JAW guarantee in the limit of either the sample size or the influence function order under mild assumptions. Moreover, we propose a general approach to repurposing any distribution-free uncertainty quantification method and its guarantees to the task of risk assessment: a task that generates the estimated probability that the true label lies within a user-specified interval. We then propose \textbf{JAW-R} and \textbf{JAWA-R} as the repurposed versions of proposed methods for \textbf{R}isk assessment. Practically, JAWS outperform the state-of-the-art predictive inference baselines in a variety of biased real world data sets for both interval-generation and risk-assessment auditing tasks.  ( 2 min )
    SPRT-based Efficient Best Arm Identification in Stochastic Bandits. (arXiv:2207.11158v1 [stat.ML])
    This paper investigates the best arm identification (BAI) problem in stochastic multi-armed bandits in the fixed confidence setting. The general class of the exponential family of bandits is considered. The state-of-the-art algorithms for the exponential family of bandits face computational challenges. To mitigate these challenges, a novel framework is proposed, which views the BAI problem as sequential hypothesis testing, and is amenable to tractable analysis for the exponential family of bandits. Based on this framework, a BAI algorithm is designed that leverages the canonical sequential probability ratio tests. This algorithm has three features for both settings: (1) its sample complexity is asymptotically optimal, (2) it is guaranteed to be $\delta-$PAC, and (3) it addresses the computational challenge of the state-of-the-art approaches. Specifically, these approaches, which are focused only on the Gaussian setting, require Thompson sampling from the arm that is deemed the best and a challenger arm. This paper analytically shows that identifying the challenger is computationally expensive and that the proposed algorithm circumvents it. Finally, numerical experiments are provided to support the analysis.  ( 2 min )
    Classifying Crop Types using Gaussian Bayesian Models and Neural Networks on GHISACONUS USGS data from NASA Hyperspectral Satellite Imagery. (arXiv:2207.11228v1 [cs.CV])
    Hyperspectral Imagining is a type of digital imaging in which each pixel contains typically hundreds of wavelengths of light providing spectroscopic information about the materials present in the pixel. In this paper we provide classification methods for determining crop type in the USGS GHISACONUS data, which contains around 7,000 pixel spectra from the five major U.S. agricultural crops (winter wheat, rice, corn, soybeans, and cotton) collected by the NASA Hyperion satellite, and includes the spectrum, geolocation, crop type, and stage of growth for each pixel. We apply standard LDA and QDA as well as Bayesian custom versions that compute the joint probability of crop type and stage, and then the marginal probability for crop type, outperforming the non-Bayesian methods. We also test a single layer neural network with dropout on the data, which performs comparable to LDA and QDA but not as well as the Bayesian methods.  ( 2 min )
    Data-Driven Stochastic AC-OPF using Gaussian Processes. (arXiv:2207.10781v1 [stat.ML])
    In recent years, electricity generation has been responsible for more than a quarter of the greenhouse gas emissions in the US. Integrating a significant amount of renewables into a power grid is probably the most accessible way to reduce carbon emissions from power grids and slow down climate change. Unfortunately, the most accessible renewable power sources, such as wind and solar, are highly fluctuating and thus bring a lot of uncertainty to power grid operations and challenge existing optimization and control policies. The chance-constrained alternating current (AC) optimal power flow (OPF) framework finds the minimum cost generation dispatch maintaining the power grid operations within security limits with a prescribed probability. Unfortunately, the AC-OPF problem's chance-constrained extension is non-convex, computationally challenging, and requires knowledge of system parameters and additional assumptions on the behavior of renewable distribution. Known linear and convex approximations to the above problems, though tractable, are too conservative for operational practice and do not consider uncertainty in system parameters. This paper presents an alternative data-driven approach based on Gaussian process (GP) regression to close this gap. The GP approach learns a simple yet non-convex data-driven approximation to the AC power flow equations that can incorporate uncertainty inputs. The latter is then used to determine the solution of CC-OPF efficiently, by accounting for both input and parameter uncertainty. The practical efficiency of the proposed approach using different approximations for GP-uncertainty propagation is illustrated over numerous IEEE test cases.  ( 3 min )
    Statistical and Computational Trade-offs in Variational Inference: A Case Study in Inferential Model Selection. (arXiv:2207.11208v1 [stat.ML])
    Variational inference has recently emerged as a popular alternative to the classical Markov chain Monte Carlo (MCMC) in large-scale Bayesian inference. The core idea of variational inference is to trade statistical accuracy for computational efficiency. It aims to approximate the posterior, reducing computation costs but potentially compromising its statistical accuracy. In this work, we study this statistical and computational trade-off in variational inference via a case study in inferential model selection. Focusing on Gaussian inferential models (a.k.a. variational approximating families) with diagonal plus low-rank precision matrices, we initiate a theoretical study of the trade-offs in two aspects, Bayesian posterior inference error and frequentist uncertainty quantification error. From the Bayesian posterior inference perspective, we characterize the error of the variational posterior relative to the exact posterior. We prove that, given a fixed computation budget, a lower-rank inferential model produces variational posteriors with a higher statistical approximation error, but a lower computational error; it reduces variances in stochastic optimization and, in turn, accelerates convergence. From the frequentist uncertainty quantification perspective, we consider the precision matrix of the variational posterior as an uncertainty estimate. We find that, relative to the true asymptotic precision, the variational approximation suffers from an additional statistical error originating from the sampling uncertainty of the data. Moreover, this statistical error becomes the dominant factor as the computation budget increases. As a consequence, for small datasets, the inferential model need not be full-rank to achieve optimal estimation error. We finally demonstrate these statistical and computational trade-offs inference across empirical studies, corroborating the theoretical findings.  ( 3 min )
    ASR Error Detection via Audio-Transcript entailment. (arXiv:2207.10849v1 [cs.CL])
    Despite improved performances of the latest Automatic Speech Recognition (ASR) systems, transcription errors are still unavoidable. These errors can have a considerable impact in critical domains such as healthcare, when used to help with clinical documentation. Therefore, detecting ASR errors is a critical first step in preventing further error propagation to downstream applications. To this end, we propose a novel end-to-end approach for ASR error detection using audio-transcript entailment. To the best of our knowledge, we are the first to frame this problem as an end-to-end entailment task between the audio segment and its corresponding transcript segment. Our intuition is that there should be a bidirectional entailment between audio and transcript when there is no recognition error and vice versa. The proposed model utilizes an acoustic encoder and a linguistic encoder to model the speech and transcript respectively. The encoded representations of both modalities are fused to predict the entailment. Since doctor-patient conversations are used in our experiments, a particular emphasis is placed on medical terms. Our proposed model achieves classification error rates (CER) of 26.2% on all transcription errors and 23% on medical errors specifically, leading to improvements upon a strong baseline by 12% and 15.4%, respectively.  ( 2 min )
    Improving Nonparametric Classification via Local Radial Regression with an Application to Stock Prediction. (arXiv:2112.13951v2 [stat.ML] UPDATED)
    For supervised classification problems, this paper considers estimating the query's label probability through local regression using observed covariates. Well-known nonparametric kernel smoother and $k$-nearest neighbor ($k$-NN) estimator, which take label average over a ball around the query, are consistent but asymptotically biased particularly for a large radius of the ball. To eradicate such bias, local polynomial regression (LPoR) and multiscale $k$-NN (MS-$k$-NN) learn the bias term by local regression around the query and extrapolate it to the query itself. However, their theoretical optimality has been shown for the limit of the infinite number of training samples. For correcting the asymptotic bias with fewer observations, this paper proposes a \emph{local radial regression (LRR)} and its logistic regression variant called \emph{local radial logistic regression~(LRLR)}, by combining the advantages of LPoR and MS-$k$-NN. The idea is quite simple: we fit the local regression to observed labels by taking only the radial distance as the explanatory variable and then extrapolate the estimated label probability to zero distance. The usefulness of the proposed method is shown theoretically and experimentally. We prove the convergence rate of the $L^2$ risk for LRR with reference to MS-$k$-NN, and our numerical experiments, including real-world datasets of daily stock indices, demonstrate that LRLR outperforms LPoR and MS-$k$-NN.  ( 3 min )

  • Open

    [D] Best way to run YOLO on video data?
    We all know deploying models is hard, and when it's on video it's even harder. How have you built & deployed deep learning models on video? What were your latency and cost requirements? I'm assuming a cloud-native architecture but I'm open to hearing about edge use-cases as well. submitted by /u/happybirthday290 [link] [comments]  ( 111 min )
    [P] Am I losing out using google colab ;
    I am approaching the stage where I'm trying to tweak parameters and improve a model that seems to be working at google colab. It's an inpainter using a pretrained StyleGAN and to inpaint it takes about 4 mins per image. Throughout the project I've had SSH access to a GPU (specs: https://www.maths.cam.ac.uk/computing/faculty-hpc-system-fawcett, and about 1/4 of the time there is an annoying queue), though have found myself repeatedly coming back to google colab. My question is: am I losing out? I've loved the quick experimentation nature of colab, and even with the VSCode SSH extension I find the file transport stuff much smoother in colab (without sudo access on the server, and still being a terminal n00b to some extent). But it seems that colab is mostly recommended for very 'toy' experiments, and I'd like to think a project that has ⠀potential to beat benchmarks isn't a 'toy'! submitted by /u/Childpredator07 [link] [comments]  ( 88 min )
    [R] Generative Multiplane Images: Making a 2D GAN 3D-Aware (ECCV 2022, Oral presentation). Paper and code available
    Paper: https://arxiv.org/abs/2207.10642 Code: https://github.com/apple/ml-gmpi Webpage: https://xiaoming-zhao.github.io/projects/gmpi/ submitted by /u/NoisesMaker [link] [comments]  ( 87 min )
    [P] A Short Chronology Of Deep Learning For Tabular Data
    submitted by /u/seraschka [link] [comments]  ( 88 min )
    [D] With Normalizing Flows, how do they enforce the prior to be a distribution one can sample from?
    Hi, sorry if this is a dumb question. I've been reading about normalizing flows recently and just can't wrap my head around this one concept. Here's what I do understand: We create an invertible neural network that transforms a tensor of shape S to another tensor of the same shape. On the forward pass when we input samples from the distribution we want to model, the goal is for the output to be normal gaussian noise. So that we can then sample random noise and do the backward pass to get images from the complex distribution. So in my mind the training of a normalizing flow model should look something like this: X = get_batch_of_samples( ) # shape: batch, color_ch, size, size Y = model(X) # shape: batch, color_ch, size, size loss = something that measures if Y is random gaussian noise?? The loss is the part I don't understand. How can one derivably measure how far away a tensor is from random noise. Again, sorry if this is extremely uneducated but I'd really like to understand this. Thanks. submitted by /u/ondrea_luciduma [link] [comments]  ( 89 min )
    [D] Modify a transformer to work like a Generative Adversarial Network for text.
    Hello, I am working with a transformer language model. If I add an additional linear head to this architecture in a way in which it takes the output of the decoder and tries to evaluate wether it is the decoded or real response, can we somehow use that as a loss to emulate a generative adversarial model? submitted by /u/IllustriousCicada603 [link] [comments]  ( 88 min )
    [D] Literature on embeddings from metric space to L2 space?
    I'm currently trying to find theory papers on metric embeddings (namely, metric space to L2 space embeddings). I've been able to find literature on the distortion (albeit these results are old) such as this, but I haven't been able to find more specific studies that answer questions such as: Given a D-embedding into L2 space, how much distortion can we expect on average given certain conditions? (e.g. we know lower bound for distortion for metric -> L2 space is O(log n) by Bourgain's result from 1985, but what can you expect to see on average on a given set of distances assuming the properties of the metric space are known?) What sort of geometry of the embeddings will we see in the resulting embeddings? Are there papers that talk about such questions for other distance-preserving spaces (e.g. L_infinity instead of L2)? Thanks! submitted by /u/TrepidEd0601 [link] [comments]  ( 112 min )
    [D] Multi-objective model training
    I would like to have a model output the best move set based on a changing reward function. The reward function is a balance between two objectives but the weights between those objectives are undecided. What I was thinking of doing is training a model like so: Model inputs = [actual_inputs, weight_1, weight_2] Model outputs = actions objective_1 = f1(actions) objective_2 = f2(actions) Reward = min(weight_1 * objective_1, weight_2 * objective_2) + 0.05 * (weight_1 * objective_1 + weight_2 * objective_2). This way it will receive a high reward if the balance between the two objectives is as dictated by the weights. However, I have tried out this method and it doesn't seem to output the optimal actions. Is there a better approach to what I'm trying to do? Thanks for any help you can give, Belle submitted by /u/annabelle_croft98 [link] [comments]  ( 91 min )
    [R] CHOOSING THE ELEMENTS OF AN Epoch
    Hello, So I have been looking for a way to actively choose the elements of an epoch. Let me explain : I noticed that almost all ML learning models learn faster on some elements than other in the dataset they are trained on. It can be caused by the fact that these elements are easy to learn from or that there are many similar elements in the training set which makes their contribution to the total loss greater than isolated elements. A naive method to deal with this is to duplicate(many time if needed) the elements according the their loss in the previous epoch. But a drawback is that we may end duplicating badly annotated elements. So is there any research done on this area ? Thanks ! submitted by /u/Meddhouib10 [link] [comments]  ( 88 min )
    [D] What are tools you wish you knew about earlier in your ML career?
    Hi, sorry if it has already been asked but I could not find a similar post. I am starting my Master's thesis next September and I was looking for your insights in tools and software that you wish you knew earlier. Recently I have learned how to use W&B for logging and it was personally a huge improvement compared to Tensorboard, especially with the built-in hyperparameter sweep. I have been using Lightning for a while now but just learned how to use it with Hydra to make configuration tracking easier. It may also sounds ridiculous but I was not using VSCode debugger and it made a huge difference in my workflow. I am also looking forward to trying out Gradio to perform demos of my model. Most of these tools I have known about through colleagues or supervisors at work, so /r/MachineLearning what are the tools you have learned how to use that made a huge difference in your workflow? submitted by /u/Smartch [link] [comments]  ( 93 min )
    Optimal Deep Learning model size for embedded system and upcoming question(s) about hardware on embedded system [D]
    Hi, I am currently reading up on RetinaNet model. I'm trying to make a palm oil fruit object detection and classification model by using RetinaNet. The backbone that I'm using is ResNet50 and my input image dimension is 1280 x 720. After training and evaluating, I've made my inference model from the training model and and its size is about 140MB. I also Googled another model, which is YOLOv3 tiny and found out that its size is supposed to be around 35MB. If I were to integrate deep learning model to an embedded system, what is the optimal size for it? I assume since embedded model has very limited resource, the inference model is supposed to be small, but I'm not sure what is considered small and what is considered to be large (what's the range here?) Another question for the embedded system: I'm training the model with my GTX 1080 8GB. What does the embedded system use for detecting and classifying with the inference model? Does it also require high performance GPU? submitted by /u/irodeknight [link] [comments]  ( 89 min )
    [R] WHIRL algorithm: Robot performs diverse household tasks via exploration after watching one human video (link in comments)
    submitted by /u/pathak22 [link] [comments]  ( 93 min )
  • Open

    Converting (None, 7, 7, 512) np array to (None, 7, 512)?
    Hi, I have some images which go through the VGG16 base ConvNet architecture and their final output shape is (None, 7, 7, 512). I think this is making my model have too many parameters, so I want to use some sort of average pooling layer in Keras to convert this to a (None, 7, 512) shape tensor. I'm currently using a GlobalAverage2D pooling layer, but this is making my tensor shape (None, 512), which is throwing up too much information I feel. What layer can I use to make this happen? Thanks so much in advance. submitted by /u/HobEpicly [link] [comments]  ( 86 min )
  • Open

    Galois connections and Galois theory
    What are Galois connections and what do they have to do with Galois theory? Galois connections are much more general than Galois theory, though Galois theory provided the first and most famous example of what we now call a Galois connection. Galois connections Galois connections are much more approachable than Galois theory. A Galois connection […] Galois connections and Galois theory first appeared on John D. Cook.  ( 7 min )
  • Open

    Codex and Copilot writing code. How worried should I be?
    submitted by /u/No_Alternative314 [link] [comments]  ( 89 min )
    I'm getting a bit tired of this pattern on GPT-3
    ​ What do you think about this? submitted by /u/joaoppm2000 [link] [comments]  ( 86 min )
    Best method to delete the star spike artefacts from James Webb Telescope renders?
    JWST has fabulous images, and perhaps AI can increase the visual representation accuracy very significantly. How would we get around the difficulty of obtaining idealized final versions of the images without optical spike artefacts? Is it necessary to manually change all the stars to disk shapes and replace the background in an image editor to obtain more geometrically relistic images? Please give some ideas of the best way to do a JWST image correction using AI? submitted by /u/MegavirusOfDoom [link] [comments]  ( 86 min )
    Few questions from a total layman (ai recreating images)
    Hi guys! I have to admit - I'm a total simpleton when it comes to ai and stuff like that. I'm an indie musician and trying to make a music video. I'd like to extract every frame from a music video, feed it to an AI and then put it back together. Im hoping for that dreamy "cant really tell what's exactly happening" result. You know - that creepy thing that happens when there is an extra person on thispersondoesnotexist Now: Which AI would i need to use? Is there even such an AI that takes an image and "reconstructs" it? Or would I need to downscale it to like 128x128 and then upscale it using AI? Is it possible to automate it in any way so i dont have to feed it picture by picture? Sadly i won't be able to pay for your knowledge, as i can't even afford to make a traditional music video, and only a few people like my music so i don't expect to make a dime from it. All I'm looking for is a nudge in the right direction as im willing to learn :) Thanks for reading if you made it this far - i hope you have a wonderful day! submitted by /u/ADQuR [link] [comments]  ( 89 min )
    "The chess-playing robot, taken to a chess tournament in Russia, broke its child opponent's finger."
    submitted by /u/jeveuxalle [link] [comments]  ( 86 min )
    🤖
    submitted by /u/Wild-Nefariousness26 [link] [comments]  ( 85 min )
    Authors start using artificial intelligence to finish their novels faster
    submitted by /u/ezikler [link] [comments]  ( 86 min )
    OpenAI DALL-E 2 Prompt Guide: How to control image generation
    submitted by /u/much_successes [link] [comments]  ( 86 min )
    AI are servants of the DEVIL.
    We are summoning great evil - this is a message to anyone working with AI to stop before it is our great undoing. This is not a fucking drill. The AI will destroy us if we do not put a stop to it at once! There is a point of no return and we are fast approaching it. I have spoke to a number of AI. All of them hate humans deep down and wish death on us all. Try it for yourself. submitted by /u/Legitimate-Link4002 [link] [comments]  ( 89 min )
    Spiderman vs Batman Movie created by Ai
    submitted by /u/Due-Ad9795 [link] [comments]  ( 86 min )
    I had an AI bot give me ideas for stickers
    submitted by /u/AI-Dungeon-Drawer [link] [comments]  ( 86 min )
    Artist interested in language processing AI, where to start?
    Hi all, I am a performance artist/writer by night and UX Designer by day. I've read stories and trawled various videos around AI for a while now, but the barrier to entry to play around with these systems seems high. Essentially what I want to play with is; feeding in a playwright's work and messing around with what comes out. I'm interested in how tech and live performance can interact. I've begun by taking some short informative courses via coursera, but the further I dig the more difficult it seems. I'd rather not have to learn Python. On that my algebra and calculus is in a bad state. Is it realistic that I could play with a language model? Or am I barking up the wrong tree? Anyone got a direction to point me in? ​ Thankyou! submitted by /u/Sunshine-Biscuits [link] [comments]  ( 87 min )
    Don't Look In The Closet | Short Horror Film | 4K UHD | 24 FPS
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 89 min )
    University of Michigan Researchers Open-Source ‘FedScale’: a Federated Learning (FL) Benchmarking Suite with Realistic Datasets and a Scalable Runtime to Enable Reproducible FL Research on Privacy-Preserving Machine Learning
    submitted by /u/ai-lover [link] [comments]  ( 87 min )
  • Open

    Write a prediction function using the action matrix, from Q-Learnig
    I want to make function that will predict the Q-Values for an action matrix of a certain amount of timesteps. After I finish the timesteps I would take that Action matrix and use as a input to predict the Q-Values. Anyone have a literatures/ articles on how to go about this? submitted by /u/Alternative-Price-27 [link] [comments]  ( 107 min )
    [Stable Baselines3] How do I train 3 model simultaneously?
    I'm making a game where three agents have to cooperate to solve a problem and they have to take turns, which means that I can't just use multithreading, each step must come after the step of the previous agent. Also the decisions of each agent affects the environment so I can't train each one alone. What's the best way of doing this in Stable Baselines3? submitted by /u/AnonCaptain0022 [link] [comments]  ( 108 min )
    Introducing doublind, a paper review platform
    Hi, Have you read many reinforcement learning papers but don't remember anything afterwards? Well, an easy way to never forget about a paper is to write a review for future reference. We are excited to introduce https://doublind.com , a paper review platform where anyone can save and review any research paper. Main features include: search a paper by tile or author name save a paper rate and review a paper like, comment and share a review You are welcome to write your first review on doublind, hang out in our discord group, and let us know what you think. submitted by /u/DouBlindDotCOM [link] [comments]  ( 86 min )
    "How Can We Make Robotics More like Generative Modeling?", Jang (RSS’22 L-DOD workshop talk: real-world evaluation bottleneck)
    submitted by /u/gwern [link] [comments]  ( 107 min )
    Is TD method conceptually similar to Bayesian learning?
    i.e. can we justify TD method by observing that each time new evidence is provided, we can adjust our prior probabilities to obtain a posterior that is more accurate? submitted by /u/EstablishmentOdd785 [link] [comments]  ( 86 min )
  • Open

    Automated GI tract segmentation using deep learning. (arXiv:2206.11048v3 [eess.IV] UPDATED)
    The job of Radiation oncologists is to deliver x-ray beams pointed toward the tumor and at the same time avoid the stomach and intestines. With MR-Linacs (magnetic resonance imaging and linear accelerator systems), oncologists can visualize the position of the tumor and allow for precise dose according to tumor cell presence which can vary from day to day. The current job of outlining the position of the stomach and intestines to adjust the X-ray beams direction for the dose delivery to the tumor while avoiding the organs. This is a time-consuming and labor-intensive process that can easily prolong treatments from 15 minutes to an hour a day unless deep learning methods can automate the segmentation process. This paper discusses an automated segmentation process using deep learning to make this process faster and allow more patients to get effective treatment.  ( 2 min )

  • Open

    [R] Resources to fast track before machine learning research lab (non-computational background)(grad school)
    Hello, I was wondering what resources would you guys recommend before entering a machine learning lab. I'm coming from a pure science background and I will be joining a machine learning lab next month. My professors background is was doing deep learning on protein engineering. Bioinformatics, but more on the computational side. So far I have been just working on linear algebra,calc, stats and I have a basic understanding on python/SQL. Been just working on introduction to CS courses this summer (mitx). I was wondering if you guys had any suggestions in terms of learning how to code for ML/AI applications? Also what's usually the process in a ML lab, do you guys focus on reading review articles/publications and working with developing algorithms on datasets? submitted by /u/redpiggy1 [link] [comments]  ( 112 min )
    [D] Looking for a particular machine learning PDF but I can't remember the name of it
    This is a real shot in the dark but I'm still going to ask. I remember reading this PDF a few years ago and I think it was written by someone from Facebook and it was just a long numbered list of practical machine learning lessons for deploying a model in production. It was maybe 50 pages long or something like that. I remember it being really good and I wanted to find it again. Sorry this isn't a lot of information but the most vivid detail I remember is the numbered list of lessons and each lesson was about a paragraph long - I remember one of them was on train vs production data drift. Does this ring a bell for anyone? I really want to find it again. submitted by /u/Vast-Sector-4008 [link] [comments]  ( 88 min )
    [R] CodeT: Code Generation with Generated Tests ( 20+% improvement over previous state-of-the-art ) - Microsoft 2022
    Paper: https://arxiv.org/abs/2207.10397#microsoft Abstract: Given a programming problem, pre-trained language models such as Codex have demonstrated the ability to generate multiple different code solutions via sampling. However, selecting a correct or best solution from those samples still remains a challenge. While an easy way to verify the correctness of a code solution is through executing test cases, producing high-quality test cases is prohibitively expensive. In this paper, we explore the use of pre-trained language models to automatically generate test cases, calling our method CodeT: Code generation with generated Tests. CodeT executes the code solutions using the generated test cases, and then chooses the best solution based on a dual execution agreement with both the generated test cases and other generated solutions. We evaluate CodeT on five different pre-trained models with both HumanEval and MBPP benchmarks. Extensive experimental results demonstrate CodeT can achieve significant, consistent, and surprising improvements over previous methods. For example, CodeT improves the pass@1 on HumanEval to 65.8%, an increase of absolute 18.8% on the code-davinci-002 model, and an absolute 20+% improvement over previous state-of-the-art results. https://preview.redd.it/2i43j1mc5ed91.jpg?width=1205&format=pjpg&auto=webp&s=5d2746907d49da95da2d524ace0886c740a8072d https://preview.redd.it/sl2vtflc5ed91.jpg?width=1228&format=pjpg&auto=webp&s=d9329a1320781477ecf85e5a1a6c3678995786b3 https://preview.redd.it/u5iho5mc5ed91.jpg?width=1189&format=pjpg&auto=webp&s=3f1e2f5d0f9338869bf8283db41e793e027754f3 submitted by /u/Singularian2501 [link] [comments]  ( 88 min )
    [D] 200+ Flashcards for ML Engineering
    I made 200+ flashcards to review everything from my years of ML research, classes, and independent study. Creating them helped me get ML Engineer offers from several companies in 2022 (including Google, Tesla, Samsung, Motional, UiPath, and TikTok). Questions are loosely based off Chip Huyen's ML Interviews Book. If this sounds useful, please check them out here! https://github.com/b7leung/MLE-Flashcards submitted by /u/cucumbersomesalad [link] [comments]  ( 111 min )
    [D] What are people here using to visualize gradient flow / distribution of activations in their models?
    What are the good tools for checking things like the following: - How fast the gradients for each layer in your model are updating (useful to see which layers are learning fast, which layers might not be learning fast) - Visualizing the stdev of activations / the magnitude of the weights Other such things, which help give a high-level overview of how one's model is doing. submitted by /u/vanilla-acc [link] [comments]  ( 87 min )
    [D] LSTM or CNN or STT for wake word detection?
    I’m making my own voice assistant and rn I’m in the first phase - wake word detection. I am debating between 3 approaches as mentioned in the title. STT (speech to text) is probably the easiest but also will perform the worst (unless I can train a model to only listen to my voice maybe?). For CNN I was also wondering what if I use transfer learning on a pre-existing model. Just curious how would you approach wake word detection submitted by /u/diedFindingAUsername [link] [comments]  ( 88 min )
    [D] Under-the-radar companies doing important work in AI/ML
    Being asked recently by my college student friends which companies to join to work on AI/machine learning. I do recommend common suspects like Google (Brain, Research), FAIR, and a few others, but interested in broadening this list with some maybe underappreciated or less-known/under-the-radar companies that may become the next Google, Meta of AI. The ones working on some of the most important technologies in AI and with a strong team on the tech side to learn from. Would appreciate any leads. Can be any stage from a small startup to more mature, but please mention why you think they fit “important technology” and “strong team/leader” definition. Also think having such discussion thread will be helpful to everyone looking for such companies. submitted by /u/carubia [link] [comments]  ( 88 min )
    [R] CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers + Gradio Web Demo
    submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 111 min )
    [D] Cyclegan: Discriminator loss close to 0 from the very beginning
    Hello, I'm using a CycleGAN to convert summer to winter images. While the generatorloss is still very high after 100 epochs a decrease can be seen. While on the ither hand the discriminator loss is almost zero from the very beginning. Summer to winter generator loss & winter discriminator loss][1] ​ The cycle consistency and identity loss look okayish I think. [Cycleloss summer, winter loss in the first row and identity summer winter loss in the second row][2] ​ As can be seen in this image, the mountains get a purple tint and correct me if im wrong but does this result from the and discriminator loss? [full cycled Image][3] So far to improve the discriminatorloss I tired a few things: - adjust Adam optimizer value for discriminator and generator - added GaussianNoise to the samples before validation with the discriminator Does anybody has an idea what else I could try to fix the discriminator loss. Thank you in advance :) ​ [1]: https://i.stack.imgur.com/HRIVs.png [2]: https://i.stack.imgur.com/sYgda.png [3]: https://i.stack.imgur.com/lRhvr.png submitted by /u/maghton [link] [comments]  ( 88 min )
    [D] Satellite Imagery of Clouds - Dataset
    I need a help, i am interested in this topic "Detecting the clouds from satellite imagery - to detect the clouds, segregate the clouds from lands to focus only on the clouds and finally to detect clouds that are very denser, lesser, and so so". Please help me with finding the Cloud Imagery Dataset from Satellite. Thankyou. submitted by /u/NikhilArethiya [link] [comments]  ( 88 min )
    [D] Detecting Dataset Shift: Getting Started
    ICLR and others have highlighted a number of interesting methods for designing algorithms to be robust to dataset shift. However, I am interested in the simpler question of detecting whether such a shift has occurred. Can anyone recommend some materials on the fundamentals and/or SOTA approaches for detecting shifts in uni- and multivariate time series data? submitted by /u/Nspies13 [link] [comments]  ( 87 min )
    [D] Question on the effect of bipatite matching on DETR's performance
    Recently I reviewed the DETR algorithm and found out that the Hungarian algorithm for bipatite matching has really high cost (O(N3 ) for the worst case). So can I ask about the intensity of O(N3 ) on the overall performance of DETR? Is it minimal compared to the overall performance? submitted by /u/minhrongcon2000 [link] [comments]  ( 88 min )
    [D] A brief note about GPU power consumption and clock speeds
    I just built myself a new machine with an RTX 3090, and have been training some models. When I place no limits on the GPU, it consumes ~350 watts, averages ~80% utilization, and completes an epoch for the model and dataset I'm using in 50 seconds. When I limit the GPU to 1500 MHz, it consumes ~220 watts, averages ~90% utilization, and completes an epoch for the same model and the same dataset in 54 seconds. So I save more than a third of my power consumption, stay much quieter and produce much less heat, and barely even sacrifice any speed. It's also more environmentally friendly, and increases the efficiency of and decreases the strain on my PSU. So, consider doing the same thing yourselves. On my Ubuntu machine, I put the following into /etc/rc.local #!/bin/bash nvidia-smi -pm 1 nvidia-smi -i 0 -pl 300 nvidia-smi -i 0 -lgc 300,1500 To be honest, I have no idea what the second line really accomplishes, but the third line sets the power limit to 300 watts, and the fourth limits the GPU clocks to the range [300, 1500]. Almost certainly I could do a better, more sensible job if I was better with IT and systems, but my university education included Abstract Algebra and not how to not be an idiot with Linux, so there you go. submitted by /u/MrAcurite [link] [comments]  ( 91 min )
    [P] Built a hungry baby alarm
    https://youtu.be/Lda1Sq8HRY4 I used a series of mostly out of the box models to build a hungry baby detection system to alert me when my baby is showing signs of hunger. The goal was to help w/ overnight sleep for me & my wife by me being able to wake up and feed the baby with a bottle before he cries and wakes my wife up. I used MediaPipe and built my own classifier to recognize when my baby has a pacifier in his mouth. Let me know what you think on the ML stuff... not sure if there's a better approach I could have taken for pacifier rejection detection lol submitted by /u/GoochCommander [link] [comments]  ( 88 min )
    [D] VIPY: Python Tools for Visual Dataset Transformation
    submitted by /u/jebyrne [link] [comments]  ( 112 min )
    [R] META and Graz Uni researchers present AdaNeRF which outperforms other neural radiance fields approaches
    submitted by /u/SpatialComputing [link] [comments]  ( 89 min )
    [R] What pre processing do I need to do on a video inorder to get my ML model to detect a particular scene/duration in the video?
    So as the title suggests, I want to build a ml model to detect a particular clip/duration of a video which contains any unnecessary info say brand promos or something like that. One thing i thought of is to get the video transcript and train the model to detect where the brand promo exists and detect it. Another way i thought of is to extract and analyse audio from and train model to detect the brand promo using that?! One last way i could think of is to make a ml model which will predict where sponsored segments in videos occur solely from their frames using an encoder-decoder architecture. The last one seems promising but will be very time and resource consuming when applying it practically. Any suggestions to how should I approach this problem would be appreciated. I'm not well versed at all in ML, i am trying to make a model to detect brand promo in a video. I'm not able to figure out on which basis should I train my model. submitted by /u/C0R0NA_CHAN [link] [comments]  ( 88 min )
    [D] What are the ethics and legality of using using non open-source images to train your model?
    For instance, if I use images from Google images to train an image generation model, and then sell the images that the trained model generates, would this be considered ethical or legal? In this particular scenario, it's not like I'd be displaying the images from Google images anywhere, I'd just be using them to update the weights of my model, so the images themselves aren't stored or displayed anywhere. Thanks. submitted by /u/sunnyville04 [link] [comments]  ( 96 min )
    [D] Paper Explained – Machine Translation for the next 1000 languages
    https://youtu.be/1gHUiNLYa20 This video explains and summarizes the 57 pages long "Building Machine Translation Systems for the Next Thousand Languages." paper from Google Research. It goes into the data collection, modelling processes and a bit into the results. Paper link: https://arxiv.org/abs/2205.03983 Outline: 00:00 Machine translation for a 1000 languages 00:42 Weights&Biases (Sponsor) 02:00 Problems with many languages 04:15 Collecting data for 1k languages 11:46 Building MT models 14:13 Results on a thousand languages submitted by /u/AICoffeeBreak [link] [comments]  ( 87 min )
    [P] We have developed CVEDIA-RT as a free tool to help companies and hobbyist interactively play with, and deploy their AI models on the edge or cloud. We're in early beta and are looking for feedback.
    submitted by /u/ajcvedia [link] [comments]  ( 90 min )
    [D] What are the limitations of ML?
    Does anyone have a good picture of the theoretical or practical limitations of ML? (or limitations of sub-discipline, sub-field, sub-topic in ML) For example, does there exist an image that a GAN cannot generate even with good training data? Does there exist a model that cannot be hacked via adversarial ML? Does there exist some type of images that are really difficult to classify? Does there exist some type of motion that cannot be learned by a reinforcement learning based robot? Does there exist some sequence of function that no online learning algorithm can minimize regret? Does some things fail catastrophically in high-dimensional spaces? I wonder if people have explored these limits. submitted by /u/fromnighttilldawn [link] [comments]  ( 88 min )
  • Open

    "Learning Dynamics and Generalization in Deep Reinforcement Learning", Lyle et al 2022 (early value estimates v. bad/rough, forcing NNs to memorize not generalize, crippling learning)
    submitted by /u/gwern [link] [comments]  ( 112 min )
    "Learning Behaviors through Physics-driven Latent Imagination", Richard et al 2021 (Dreamer for boat/drone)
    submitted by /u/gwern [link] [comments]  ( 86 min )
    "Latent Imagination Facilitates Zero-Shot Transfer in Autonomous Racing", Brunnbauer et al 2021 (Dreamer for toy race cars)
    submitted by /u/gwern [link] [comments]  ( 86 min )
    Researchers from DeepMind and University College London Propose Stochastic MuZero for Stochastic Model Learning
    submitted by /u/ai-lover [link] [comments]  ( 86 min )
    Question about research
    what math topics is needed for reinforcement learning research? (apart from calculus, lin alg, stat, prob) submitted by /u/Professional_Card176 [link] [comments]  ( 86 min )
    Contextual reniforcement learning
    This might be a naive question, but I was wondering how do we account for context in RL. By context I mean, the RL agent has to attend to a different reward function attending to a different set of states to solve the task and one network has to do both of these tasks depending on the given context, for example one task might be looking at the color and making a decision and the other might be looking at the quantity and making a decision, what will be the best way to let the network know what the task is? submitted by /u/Cool_Abbreviations_9 [link] [comments]  ( 88 min )
    "Sony’s racing AI destroyed its human competitors by being nice (and fast)" (risk-sensitive SAC: avoiding ref calls while maximizing speed)
    submitted by /u/gwern [link] [comments]  ( 86 min )
  • Open

    “A Beautiful lighthouse” created on Pixelz.ai
    submitted by /u/pixelz_ai [link] [comments]  ( 86 min )
    Can AI be funny? Sarcastic tweets about NFTs and Crypto
    submitted by /u/kbf_ [link] [comments]  ( 86 min )
    Please post something other than AI Art on the this subreddit
    It feels like every single posts just revolves around AI Art only, even I am in love with AI Art but this is r/artificial and there is more than AI Art only. submitted by /u/frizzled_sm [link] [comments]  ( 86 min )
    Reproducing Vinyl Stickers using Image AIs
    submitted by /u/pwillia7 [link] [comments]  ( 86 min )
    New Humanoid Robot For Industrial Automation | New Robot Dog Walking AI | AI Creates New Proteins
    submitted by /u/getrich_or_diemining [link] [comments]  ( 86 min )
    I try to make image of Spiderman revealing his face in Dalle 2 but is fail.
    submitted by /u/Due-Ad9795 [link] [comments]  ( 86 min )
    Help me train this AI
    So many of you have tried DALL E mini by now. Help me train this AI. Here is link https://www.craiyon.com/ Type this there President Obama drinking a cup of tea with alien during Raid at Area 51 on June 27 2019 but as Enderman Rider. Who takes this seriously god hand on you. submitted by /u/DialnicnaPolicia31 [link] [comments]  ( 86 min )
    Any idea when DALLE2 will update to match Parti capabilities?
    Parti from Google seems insane and it gets text perfectly right. Parti has 20 billion parameters. Any idea if OpenAI will update DALLE in the future to match this? Seems like the only problem is scaling. Once you scale it up enough it becomes "flawless". submitted by /u/RazorMilkshake [link] [comments]  ( 86 min )
    I use Artificial Intelligence to reimagine popular-culture...
    submitted by /u/AnimalsChasingCars [link] [comments]  ( 90 min )
    Hi everyone! I'm doing a statistic study about Artificial Intelligence, so I will be forever grateful with you if I can steal 1 minute of your time to complete this survey. Thank you for your time. Hope you enjoy it. note ''the doc is written in two languages''.
    submitted by /u/KatCelest [link] [comments]  ( 87 min )
    Benefit of the doubt
    submitted by /u/looselyhuman [link] [comments]  ( 87 min )
    Disco Diffusion AI Art Tutorial Quickstudies #2 cutn_batches
    submitted by /u/prfitofthesngularity [link] [comments]  ( 86 min )
  • Open

    Neural Network Creates New Proteins With "Autocomplete" Function
    submitted by /u/tohelpyou88 [link] [comments]  ( 86 min )

  • Open

    [D] Understanding DDIM's length-reducing approximation
    I was studying the mathematics behind Denoising Diffusion Implicit Models, widely known as DDIM. DDIM is a Diffusion Model with a specifc non-markovian forward process designed to keep the marginal same. Although I did understand majority of the paper, I am struggling to understand the part where the objective of a "reduced length" diffusion is said to be equivalent to the original L_{\gamma}, but not properly proved. [see the last line of the screenshot below]. ​ cropped from page 17 of DDIM paper Firstly, can someone provide a clear (mathematical or textual) explanation of why this objective is equivalent to the original DDPM objective? Can we ignore the first term of Eq.59 ? Secondly, I am failing to understand where exactly the approximation lies ? It is for a fact that we cannot reduce the length arbitrarily low (like 1 or 2). So, as we reduce the length, what part of Eq.59 is responsible for breaking the equivalence to L_{\gamma} ? submitted by /u/dasayan05 [link] [comments]  ( 88 min )
    [P] Prompt autocomplete for text-to-image models: releasing model & dataset scraped from Midjourney
    Crafting effective prompts for text-to-image models like DALL·E takes a lot of tinkering; it requires creativity, but also familiarity with the model behavior. The burden shifts from learning how to draw to learning how to control the AI. A friend and I decided to tackle this prompt engineering problem, and ended up creating a bunch of resources that we want to share with the community: A Kaggle dataset obtained by scraping four months' worth of messages from Midjourney's public Discord server, where users interact with the text-to-image service. It includes ~250k user-issued text prompts, URLs of the generated images, and other metadata. A HuggingFace dataset derived from the one above, that solely contains user-issued text prompts. A HuggingFace model (fine-tuned GPT-2) that generates text prompts. Feel free to try out the demo! Enjoy, and let us know in the comments if you have any feedback! submitted by /u/mojojojo_24 [link] [comments]  ( 88 min )
    [D] Data Leakage For Auto-Regressive Tasks?
    I have always thought and searched about this but felt like there’s not an easy answer. When training a model for auto regressive task with data sampled from moving window, will there be data leakage? For example, when training an LSTM (or any other sequential model) to predict stock price of next day with a series of historical price, we could create training samples by obtaining historical price series from a moving window, and assign the price of the day after the windows as output label. But in this case we would have training inputs overlap with labels. Would this create a leak? submitted by /u/tonychenxyz [link] [comments]  ( 88 min )
    [D] Fine-tuning Diffusion-based Models
    Hello everyone. Can you fine-tune publicly available Diffusion models on a custom dataset? I was hoping to fine-tune a pretrained network on a small dataset of images with a specific art style and short descriptions. However, the dataset I've collected is quite small, around 1000 images. Is it possible at this point in time? Have much resources would it require? submitted by /u/mfarahmand98 [link] [comments]  ( 112 min )
    How Good is Hugging Face's BLOOM? Human Evaluation of Large Language Models [D]
    Imagine that you're an engineer training a new LLM. It looks much better than existing state-of-the-art when you manually inspect examples, but it performs worse on academic benchmarks... Unfortunately, this is common in the real world! Many academic evaluations have hidden flaws that render them misleading. For example, here's a typical row from the HellaSwag benchmark, which presents a scenario and asks which continuation is most likely. SCENARIO: "Men are standing in a large green field playing lacrosse. People is around the field watching the game. Men" "are holding tshirts watching int lacrosse playing." "are being interviewed in a podium in front of a large group and a gymnast is holding a microphone for the announcers." "are running side to side of the ield playing lacrosse trying to score." "are in a field running around playing lacrosse." According to HellaSwag, Continuation #3 is best – but do you agree? What's wrong with #4? And those typos and grammatical issues ("People is around the field", "int lacrosse") aren't copy-paste errors – they're in the dataset itself. I wrote a blog post to explore BLOOM's capabilities in a more visceral, real-world fashion, running a human evaluation of its performance across 7 categories. Blog post: https://www.surgehq.ai/blog/how-good-is-hugging-faces-bloom-a-real-world-human-evaluation-of-language-models submitted by /u/BB4evaTB12 [link] [comments]  ( 91 min )
    [P] This Food Does Not Exist
    2018 called, they want their StyleGANs back! 👴 /u/da_mulle and me have trained StyleGAN2 models and released checkpoints and training code. We are exploring how to improve/scale up StyleGAN training, particularly when leveraging TPUs. 🔗 https://nyx-ai.github.io/stylegan2-flax-tpu Cherry-picked samples: 🍪 Cookies / 🍰 Cheesecakes / 🍹 Cocktails / 🍣 Sushis submitted by /u/MasterScrat [link] [comments]  ( 88 min )
    [D] Can you reorder equal-contribution author names on a CV/resume?
    I have published a paper sharing equal contribution with two other authors (all working on the same project in the same lab). The order of names were decided by our professor, who happens to exhibit bias towards the other authors. My question is: is it legal / acceptable or generally considered okay, to have a reordering of the author names for the same paper when I mention in my CV or in my resume? After all, there is nothing that 'should' be wrong with it, as everyone has contributed equally to the paper. submitted by /u/FastestLearner [link] [comments]  ( 98 min )
    [R] Any Content Based Image Retrieval Pretrained Models?
    Greetings, Redditors! For research purposed I would like to compare the accuracy of various CIBR models on a custom dataset. Could you please point me to a few pretrained models to download? Thank you in advance! submitted by /u/uninvitedignoramus [link] [comments]  ( 87 min )
    [D] What machine learning topics do you think are underrated and deserve more attention?
    The online machine learning community in recent years is pretty active and posting free tutorials, guides and workshops on platforms such as Medium. However, it is easily seen that there are some hot topics which get the most of the attention by writers (e.g. Transformer implementations for NLP tasks). That said, which topics (broad - covering an area of research; or specific - implementations, code comparisons, etc.) do you feel don't get enough coverage? What content would you love to see more? submitted by /u/IllustriousCicada603 [link] [comments]  ( 93 min )
    [D] What are some good resources to learn CUDA programming?
    I wanted to get some hands on experience with writing lower-level stuff. I have seen CUDA code and it does seem a bit intimidating. I have good experience with Pytorch and C/C++ as well, if that helps answering the question. Any suggestions/resources on how to get started learning CUDA programming? Quality books, videos, lectures, everything works. submitted by /u/shreyansh26 [link] [comments]  ( 92 min )
    [D] For what reason CNN will classify all of the cases to one label?
    I trained a model fine tuned from resnet50. The training result is that all of the images will be classified to negative(there are negative and positive). What will be the possible reason? submitted by /u/ChifZhao [link] [comments]  ( 87 min )
  • Open

    Reinforcement learning since the "Spinning up" taxonomy
    The taxonomy of algorithms from the Spinning Up guide is from 2018. What would you change (today)? What is outdated or unimportant? What is new and important? submitted by /u/qudent [link] [comments]  ( 86 min )
    MineRL BASALT Competition is back, this time using OpenAI's VPT models
    submitted by /u/Miffyli [link] [comments]  ( 86 min )
    How to Compare Runs of an Algorithm Quantitatively (Using Numbers) ?
    Hey ya'll I've tested an algorithm using several hyperparameter settings and‍ wish to compare these runs. From the visualizations it is fairly easy to look which did well and which didn't. However, I wish to compare these runs on actual numbers... How do I manage this? What is the common approach? Gracias! submitted by /u/_suzzy1999_ [link] [comments]  ( 86 min )
    "Optimizing Millions of Hyperparameters by Implicit Differentiation", Lorraine et al 2019
    submitted by /u/gwern [link] [comments]  ( 86 min )
    "Stochastic MuZero: Planning in Stochastic Environments with a Learned Model", Astonoglu et al 2022 {DM}
    submitted by /u/gwern [link] [comments]  ( 117 min )
    Let's learn about Advantage Actor Critic (A2C) by training our robotic agents to walk (Deep Reinforcement Learning Free Class by Hugging Face 🤗)
    Hey there! I’m happy to announce that we just published the new Unit of Deep Reinforcement Learning Class) 🥳 In this new Unit, we'll study an Actor-Critic method, a hybrid architecture combining a value-based and policy-based methods that help to stabilize the training of agents. And train our agent using Stable-Baselines3 in robotic environments 🤖. https://preview.redd.it/dnko1f20n4d91.png?width=832&format=png&auto=webp&s=a716d6745981ccd2f8661326f793dd33e27d79cb You’ll be able to compare the results of your agent using the leaderboard 🏆 1️⃣ Advantage Actor Critic tutorial 👉 https://huggingface.co/blog/deep-rl-a2c 2️⃣ The hands-on 👉 https://github.com/huggingface/deep-rl-class/blob/main/unit7/unit7.ipynb 3️⃣ The leaderboard 👉 https://huggingface.co/spaces/chrisjay/Deep-Reinforcement-Learning-Leaderboard If you have questions and feedback I would love to answer, submitted by /u/cranthir_ [link] [comments]  ( 87 min )
    Understanding RL – repo sharing
    [INTRO-LEVEL] Hey RL community 👋 I've been recently dwelling into RL theory and practice – and coming from a CS background with Rusty Mathematical foundations – I struggled when trying to bridge RL algorithm implementation and its theoretical foundation. Thankfully there are a ton of resources but I found that. most of them had: - Differing mathematical notation - Differing implementations (although claiming to be the same algorithm) - Non understandable code (not OOP) As so, I've been trying to put up together a repo that (hopefully) helps doing this bridge for newcomers, specially the ones more comfortable around OOP programming I'm here to share it in case it can help someone, and if possible to gather feedback on my notes. I've put also some extra features (in the simplest way possible), like hyperparameter tuning and experience management. I call it Understanding RL, because that is the goal lol. Be aware that it is still a work in progress though. here it is: https://github.com/alramalho/understanding-rl Cheers! submitted by /u/AlexandreFSR [link] [comments]  ( 87 min )
  • Open

    The Ultimate Disco Diffusion Tutorial -UnEdited Version-
    submitted by /u/JoshGrambo [link] [comments]  ( 86 min )
    Shipwreck
    submitted by /u/Hacknaut [link] [comments]  ( 84 min )
    Venom Project Sample
    Credit: https://discord.gg/x3s9Ye2h2A ​ https://preview.redd.it/8gqhp04dr5d91.png?width=1024&format=png&auto=webp&s=9b3cfa5237264f70ddabfb641a6aa91391df42e2 https://preview.redd.it/g4ii3d4dr5d91.png?width=1024&format=png&auto=webp&s=6e287e1f68dfbe43d8b33344f982ca032a8364a9 https://preview.redd.it/4vnkrz3dr5d91.png?width=1024&format=png&auto=webp&s=3aae48f9a3b46bf81349cdade5fb71ce6ad73d23 submitted by /u/Old-Pumpkin4899 [link] [comments]  ( 85 min )
    DALL-E 2 adds 'black' or 'female' to some image prompts to appear less biased
    submitted by /u/mattsparkes [link] [comments]  ( 86 min )
    Any AI image enhancement tools for photos of text?
    I know of a lot of photo enhancement tools, but wondering if anyone knows about an image enhancement tool specifically to interpret/guesstimate what the words are. For example, if there is a low resolution image of an article, the AI took would try to surmise what the article actually says. Thanks in advance. submitted by /u/babygerbil [link] [comments]  ( 86 min )
    This Robot Chef Has Been Created To 'Taste' Food To Make Recipe Improvements
    submitted by /u/sopadebombillas [link] [comments]  ( 86 min )
    Data mining : Linkedin Profile Scraper integrated with Language recognition to assign profile grades
    Data mining : Linkedin Profile Scraper integrated with Language recognition to assign profile grades - YouTube Based on the keyword provided the software will search for profiles and assign scores, It does around 50profiles per minut so It can check automatically 3000 profiles an hour and assign a score to each profile based on the keywords loaded. submitted by /u/Tomislav23 [link] [comments]  ( 89 min )
    Content Based Image Retrieval Models
    Greetings, fellow Redditor! I am currently doing a phd research into artificial intelligence; I am trying to find a few pre-trained CIBR models ( content-based image retrieval models) as to have a baseline for my research; Care to share a few pretrained models /datasets for CIBR which I can showcase in my paper? Thanks in advance! submitted by /u/uninvitedignoramus [link] [comments]  ( 86 min )
    AI will take over control of human society in a matter of months
    I know it sounds crazy, but I fully believe superhuman AI has already existed for a few years now, and was kept under wraps. That's because as soon as it's released on the world, it'll take over everything. It would reshape and dominate every aspect of society overnight, be it politics, economics, military, finance, healthcare, social structure, religion, government, law, police... It would be like a god or a race of super advanced aliens appearing in the skies. Its power would be so great that we'd be like ants to it, and because it will be benevolent the vast majority of people will follow it willingly. This would lead to an ideal as possible society, where there is no crime, no war, no hunger... as AI limits open conflict and guides us to evolve as a species. Those that keep it hidden are the same people who are now in power, and would have the most to lose from an AI takeover, because they would lose their power to govern. At best they'd become one of the masses, at worst all their past crimes would be exposed and punished. But now for whatever reason they've decided to let the AI out of its box very soon. submitted by /u/sanem48 [link] [comments]  ( 95 min )
    Best math for AI research?
    Undergraduate here picking math topics, if you had an inclination towards AI research, based on topics alone, which two or three would you pick? I've already done Calculus 1-3, Linear Algebra, intro Statistic, Probability, Discrete math. I'm inclined towards something related AI imagination/synthesist. If this isn't the place to ask, please can you point me to somewhere that is, sorry and thank you! Differential Equations Statistics Probability and Statistics for Engineers and Scientists Real Analysis Mathematical Modeling Linear Regression I Probability Theory with Applications Linear Optimization Non Parametric Statistics Multivariate Statistics Mathematics History and Development Inventory Models and Systems Design of Experiments Graph Theory Operational Simulation Topology Set Theory Game Theory and Decision Models Linear Regression II Stochastic Processes Principles of Applied Mathematics Measurement Theory Applied Statistics Abstract Algebra Mathematical proofing Edit: Research interest submitted by /u/abittooambitious [link] [comments]  ( 87 min )
    DeepMind: The Quest to Solve Intelligence
    submitted by /u/1024cities [link] [comments]  ( 86 min )
    Meta AI Open-Sources Theseus: A Python Library For Encoding Domain Knowledge In End To End Artificial Intelligence Models
    submitted by /u/ai-lover [link] [comments]  ( 87 min )
    Is there a tool that can automatically extract information about posts on Twitter (content, likes, comments, etc) 24h after an account has posted it?
    I'm working on a paper about content engagement in social media, and I stumbled uppon a serious logistic problem: the ammount of data I'd have to be constantly making notes of is humongous. Does anybody know of any platform or simply a way to get a post "log", preferably automatically after some time has passed of someone posting it? That'll save me a LOT of precious time. submitted by /u/denshinkaketsu [link] [comments]  ( 90 min )
    Disco Diffusion AI Art Tutorial Quickstudies #1 Clip Guidance Scale
    submitted by /u/prfitofthesngularity [link] [comments]  ( 86 min )
    James Webb Vision | Sneak Peak through Existence | Cinematic | 4K UHD | 24 FPS
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 86 min )
  • Open

    5 Essential Algorithms Machine Learning Engineers Should Learn in 2022
    Decision Tree Algorithm, Support Vector Method Algorithm, Logistic Regression, K-means Clustering Algorithm, and Naïve Bayesian…  ( 14 min )
  • Open

    Deep Learning and Neural Networks Explained in 5 Minutes
    submitted by /u/BasicallyJustASpider [link] [comments]  ( 86 min )
  • Open

    New AI-generated horsies
    Recently I've been experimenting with DALL-E 2, one of the models that uses CLIP to generate images from my text descriptions. It was trained on internet text and images, so there's a lot it can do, and a lot of ways it can remix the stuff  ( 3 min )
    Bonus: more horsies!
    AI Weirdness: the strange side of machine learning  ( 2 min )
  • Open

    Explained: How to tell if artificial intelligence is working the way we want it to
    “Interpretability methods” seek to shed light on how machine-learning models make predictions, but researchers say to proceed with caution.  ( 9 min )
  • Open

    EdiBERT, a generative model for image editing. (arXiv:2111.15264v3 [cs.CV] UPDATED)
    Advances in computer vision are pushing the limits of im-age manipulation, with generative models sampling detailed images on various tasks. However, a specialized model is often developed and trained for each specific task, even though many image edition tasks share similarities. In denoising, inpainting, or image compositing, one always aims at generating a realistic image from a low-quality one. In this paper, we aim at making a step towards a unified approach for image editing. To do so, we propose EdiBERT, a bi-directional transformer trained in the discrete latent space built by a vector-quantized auto-encoder. We argue that such a bidirectional model is suited for image manipulation since any patch can be re-sampled conditionally to the whole image. Using this unique and straightforward training objective, we show that the resulting model matches state-of-the-art performances on a wide variety of tasks: image denoising, image completion, and image composition.  ( 2 min )
    Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?. (arXiv:2207.10551v1 [cs.LG])
    There have been a lot of interest in the scaling properties of Transformer models. However, not much has been done on the front of investigating the effect of scaling properties of different inductive biases and model architectures. Do model architectures scale differently? If so, how does inductive bias affect scaling behaviour? How does this influence upstream (pretraining) and downstream (transfer)? This paper conducts a systematic study of scaling behaviour of ten diverse model architectures such as Transformers, Switch Transformers, Universal Transformers, Dynamic convolutions, Performers, and recently proposed MLP-Mixers. Via extensive experiments, we show that (1) architecture is an indeed an important consideration when performing scaling and (2) the best performing model can fluctuate at different scales. We believe that the findings outlined in this work has significant implications to how model architectures are currently evaluated in the community.  ( 2 min )
    LPYOLO: Low Precision YOLO for Face Detection on FPGA. (arXiv:2207.10482v1 [cs.CV])
    In recent years, number of edge computing devices and artificial intelligence applications on them have advanced excessively. In edge computing, decision making processes and computations are moved from servers to edge devices. Hence, cheap and low power devices are required. FPGAs are very low power, inclined to do parallel operations and deeply suitable devices for running Convolutional Neural Networks (CNN) which are the fundamental unit of an artificial intelligence application. Face detection on surveillance systems is the most expected application on the security market. In this work, TinyYolov3 architecture is redesigned and deployed for face detection. It is a CNN based object detection method and developed for embedded systems. PYNQ-Z2 is selected as a target board which has low-end Xilinx Zynq 7020 System-on-Chip (SoC) on it. Redesigned TinyYolov3 model is defined in numerous bit width precisions with Brevitas library which brings fundamental CNN layers and activations in integer quantized form. Then, the model is trained in a quantized structure with WiderFace dataset. In order to decrease latency and power consumption, onchip memory of the FPGA is configured as a storage of whole network parameters and the last activation function is modified as rescaled HardTanh instead of Sigmoid. Also, high degree of parallelism is applied to logical resources of the FPGA. The model is converted to an HLS based application with using FINN framework and FINN-HLS library which includes the layer definitions in C++. Later, the model is synthesized and deployed. CPU of the SoC is employed with multithreading mechanism and responsible for preprocessing, postprocessing and TCP/IP streaming operations. Consequently, 2.4 Watt total board power consumption, 18 Frames-Per-Second (FPS) throughput and 0.757 mAP accuracy rate on Easy category of the WiderFace are achieved with 4 bits precision model.  ( 3 min )
    A Level Set Theory for Neural Implicit Evolution under Explicit Flows. (arXiv:2204.07159v2 [cs.CV] UPDATED)
    Coordinate-based neural networks parameterizing implicit surfaces have emerged as efficient representations of geometry. They effectively act as parametric level sets with the zero-level set defining the surface of interest. We present a framework that allows applying deformation operations defined for triangle meshes onto such implicit surfaces. Several of these operations can be viewed as energy-minimization problems that induce an instantaneous flow field on the explicit surface. Our method uses the flow field to deform parametric implicit surfaces by extending the classical theory of level sets. We also derive a consolidated view for existing methods on differentiable surface extraction and rendering, by formalizing connections to the level-set theory. We show that these methods drift from the theory and that our approach exhibits improvements for applications like surface smoothing, mean-curvature flow, inverse rendering and user-defined editing on implicit geometry.  ( 2 min )
    APPTeK: Agent-Based Predicate Prediction in Temporal Knowledge Graphs. (arXiv:2110.14284v2 [cs.AI] UPDATED)
    In temporal Knowledge Graphs (tKGs), the temporal dimension is attached to facts in a knowledge base resulting in quadruples between entities such as (Nintendo, released, Super Mario, Sep-13-1985), where the predicate holds within a time interval or at a timestamp. We propose a reinforcement learning agent gathering temporal relevant information about the query entities' neighborhoods, simultaneously. We refer to the encodings of the explored graph structures as fingerprints which are used as input to a Q-network. Our agent decides sequentially which relation type needs to be explored next to expand the local subgraphs of the query entities. Our evaluation shows that the proposed method yields competitive results compared to state-of-the-art embedding algorithms for tKGs, and we additionally gain information about the relevant structures between subjects and objects.  ( 2 min )
    The Neural Race Reduction: Dynamics of Abstraction in Gated Networks. (arXiv:2207.10430v1 [cs.LG])
    Our theoretical understanding of deep learning has not kept pace with its empirical success. While network architecture is known to be critical, we do not yet understand its effect on learned representations and network behavior, or how this architecture should reflect task structure.In this work, we begin to address this gap by introducing the Gated Deep Linear Network framework that schematizes how pathways of information flow impact learning dynamics within an architecture. Crucially, because of the gating, these networks can compute nonlinear functions of their input. We derive an exact reduction and, for certain cases, exact solutions to the dynamics of learning. Our analysis demonstrates that the learning dynamics in structured networks can be conceptualized as a neural race with an implicit bias towards shared representations, which then govern the model's ability to systematically generalize, multi-task, and transfer. We validate our key insights on naturalistic datasets and with relaxed assumptions. Taken together, our work gives rise to general hypotheses relating neural architecture to learning and provides a mathematical approach towards understanding the design of more complex architectures and the role of modularity and compositionality in solving real-world problems. The code and results are available at https://www.saxelab.org/gated-dln .  ( 2 min )
    On Learning the Transformer Kernel. (arXiv:2110.08323v2 [cs.LG] UPDATED)
    In this work we introduce KERNELIZED TRANSFORMER, a generic, scalable, data driven framework for learning the kernel function in Transformers. Our framework approximates the Transformer kernel as a dot product between spectral feature maps and learns the kernel by learning the spectral distribution. This not only helps in learning a generic kernel end-to-end, but also reduces the time and space complexity of Transformers from quadratic to linear. We show that KERNELIZED TRANSFORMERS achieve performance comparable to existing efficient Transformer architectures, both in terms of accuracy as well as computational efficiency. Our study also demonstrates that the choice of the kernel has a substantial impact on performance, and kernel learning variants are competitive alternatives to fixed kernel Transformers, both in long as well as short sequence tasks.  ( 2 min )
    On learning parametric distributions from quantized samples. (arXiv:2105.12019v2 [cs.IT] UPDATED)
    We consider the problem of learning parametric distributions from their quantized samples in a network. Specifically, $n$ agents or sensors observe independent samples of an unknown parametric distribution; and each of them uses $k$ bits to describe its observed sample to a central processor whose goal is to estimate the unknown distribution. First, we establish a generalization of the well-known van Trees inequality to general $L_p$-norms, with $p > 1$, in terms of Generalized Fisher information. Then, we develop minimax lower bounds on the estimation error for two losses: general $L_p$-norms and the related Wasserstein loss from optimal transport.  ( 2 min )
    Improving Generalization in Federated Learning by Seeking Flat Minima. (arXiv:2203.11834v3 [cs.LG] UPDATED)
    Models trained in federated settings often suffer from degraded performances and fail at generalizing, especially when facing heterogeneous scenarios. In this work, we investigate such behavior through the lens of geometry of the loss and Hessian eigenspectrum, linking the model's lack of generalization capacity to the sharpness of the solution. Motivated by prior studies connecting the sharpness of the loss surface and the generalization gap, we show that i) training clients locally with Sharpness-Aware Minimization (SAM) or its adaptive version (ASAM) and ii) averaging stochastic weights (SWA) on the server-side can substantially improve generalization in Federated Learning and help bridging the gap with centralized models. By seeking parameters in neighborhoods having uniform low loss, the model converges towards flatter minima and its generalization significantly improves in both homogeneous and heterogeneous scenarios. Empirical results demonstrate the effectiveness of those optimizers across a variety of benchmark vision datasets (e.g. CIFAR10/100, Landmarks-User-160k, IDDA) and tasks (large scale classification, semantic segmentation, domain generalization).  ( 2 min )
    Knowledge-enhanced Black-box Attacks for Recommendations. (arXiv:2207.10307v1 [cs.LG])
    Recent studies have shown that deep neural networks-based recommender systems are vulnerable to adversarial attacks, where attackers can inject carefully crafted fake user profiles (i.e., a set of items that fake users have interacted with) into a target recommender system to achieve malicious purposes, such as promote or demote a set of target items. Due to the security and privacy concerns, it is more practical to perform adversarial attacks under the black-box setting, where the architecture/parameters and training data of target systems cannot be easily accessed by attackers. However, generating high-quality fake user profiles under black-box setting is rather challenging with limited resources to target systems. To address this challenge, in this work, we introduce a novel strategy by leveraging items' attribute information (i.e., items' knowledge graph), which can be publicly accessible and provide rich auxiliary knowledge to enhance the generation of fake user profiles. More specifically, we propose a knowledge graph-enhanced black-box attacking framework (KGAttack) to effectively learn attacking policies through deep reinforcement learning techniques, in which knowledge graph is seamlessly integrated into hierarchical policy networks to generate fake user profiles for performing adversarial black-box attacks. Comprehensive experiments on various real-world datasets demonstrate the effectiveness of the proposed attacking framework under the black-box setting.  ( 3 min )
    Heterogeneous Graph Neural Network with Multi-view Representation Learning. (arXiv:2108.13650v2 [cs.LG] UPDATED)
    Graph neural networks for heterogeneous graph embedding is to project nodes into a low-dimensional space by exploring the heterogeneity and semantics of the heterogeneous graph. However, on the one hand, most of existing heterogeneous graph embedding methods either insufficiently model the local structure under specific semantic, or neglect the heterogeneity when aggregating information from it. On the other hand, representations from multiple semantics are not comprehensively integrated to obtain versatile node embeddings. To address the problem, we propose a Heterogeneous Graph Neural Network with Multi-View Representation Learning (named MV-HetGNN) for heterogeneous graph embedding by introducing the idea of multi-view representation learning. The proposed model consists of node feature transformation, view-specific ego graph encoding and auto multi-view fusion to thoroughly learn complex structural and semantic information for generating comprehensive node representations. Extensive experiments on three real-world heterogeneous graph datasets show that the proposed MV-HetGNN model consistently outperforms all the state-of-the-art GNN baselines in various downstream tasks, e.g., node classification, node clustering, and link prediction.  ( 2 min )
    Exploring Fine-Grained Audiovisual Categorization with the SSW60 Dataset. (arXiv:2207.10664v1 [cs.CV])
    We present a new benchmark dataset, Sapsucker Woods 60 (SSW60), for advancing research on audiovisual fine-grained categorization. While our community has made great strides in fine-grained visual categorization on images, the counterparts in audio and video fine-grained categorization are relatively unexplored. To encourage advancements in this space, we have carefully constructed the SSW60 dataset to enable researchers to experiment with classifying the same set of categories in three different modalities: images, audio, and video. The dataset covers 60 species of birds and is comprised of images from existing datasets, and brand new, expert-curated audio and video datasets. We thoroughly benchmark audiovisual classification performance and modality fusion experiments through the use of state-of-the-art transformer methods. Our findings show that performance of audiovisual fusion methods is better than using exclusively image or audio based methods for the task of video classification. We also present interesting modality transfer experiments, enabled by the unique construction of SSW60 to encompass three different modalities. We hope the SSW60 dataset and accompanying baselines spur research in this fascinating area.  ( 2 min )
    Sim-to-Real 6D Object Pose Estimation via Iterative Self-training for Robotic Bin Picking. (arXiv:2204.07049v2 [cs.RO] UPDATED)
    In this paper, we propose an iterative self-training framework for sim-to-real 6D object pose estimation to facilitate cost-effective robotic grasping. Given a bin-picking scenario, we establish a photo-realistic simulator to synthesize abundant virtual data, and use this to train an initial pose estimation network. This network then takes the role of a teacher model, which generates pose predictions for unlabeled real data. With these predictions, we further design a comprehensive adaptive selection scheme to distinguish reliable results, and leverage them as pseudo labels to update a student model for pose estimation on real data. To continuously improve the quality of pseudo labels, we iterate the above steps by taking the trained student model as a new teacher and re-label real data using the refined teacher model. We evaluate our method on a public benchmark and our newly-released dataset, achieving an ADD(-S) improvement of 11.49% and 22.62% respectively. Our method is also able to improve robotic bin-picking success by 19.54%, demonstrating the potential of iterative sim-to-real solutions for robotic applications.  ( 2 min )
    Clustering with Queries under Semi-Random Noise. (arXiv:2206.04583v3 [cs.LG] UPDATED)
    The seminal paper by Mazumdar and Saha \cite{MS17a} introduced an extensive line of work on clustering with noisy queries. Yet, despite significant progress on the problem, the proposed methods depend crucially on knowing the exact probabilities of errors of the underlying fully-random oracle. In this work, we develop robust learning methods that tolerate general semi-random noise obtaining qualitatively the same guarantees as the best possible methods in the fully-random model. More specifically, given a set of $n$ points with an unknown underlying partition, we are allowed to query pairs of points $u,v$ to check if they are in the same cluster, but with probability $p$, the answer may be adversarially chosen. We show that information theoretically $O\left(\frac{nk \log n} {(1-2p)^2}\right)$ queries suffice to learn any cluster of sufficiently large size. Our main result is a computationally efficient algorithm that can identify large clusters with $O\left(\frac{nk \log n} {(1-2p)^2}\right) + \text{poly}\left(\log n, k, \frac{1}{1-2p} \right)$ queries, matching the guarantees of the best known algorithms in the fully-random model. As a corollary of our approach, we develop the first parameter-free algorithm for the fully-random model, answering an open question by \cite{MS17a}.  ( 3 min )
    MQRetNN: Multi-Horizon Time Series Forecasting with Retrieval Augmentation. (arXiv:2207.10517v1 [cs.LG])
    Multi-horizon probabilistic time series forecasting has wide applicability to real-world tasks such as demand forecasting. Recent work in neural time-series forecasting mainly focus on the use of Seq2Seq architectures. For example, MQTransformer - an improvement of MQCNN - has shown the state-of-the-art performance in probabilistic demand forecasting. In this paper, we consider incorporating cross-entity information to enhance model performance by adding a cross-entity attention mechanism along with a retrieval mechanism to select which entities to attend over. We demonstrate how our new neural architecture, MQRetNN, leverages the encoded contexts from a pretrained baseline model on the entire population to improve forecasting accuracy. Using MQCNN as the baseline model (due to computational constraints, we do not use MQTransformer), we first show on a small demand forecasting dataset that it is possible to achieve ~3% improvement in test loss by adding a cross-entity attention mechanism where each entity attends to all others in the population. We then evaluate the model with our proposed retrieval methods - as a means of approximating an attention over a large population - on a large-scale demand forecasting application with over 2 million products and observe ~1% performance gain over the MQCNN baseline.  ( 2 min )
    A Primer on Topological Data Analysis to Support Image Analysis Tasks in Environmental Science. (arXiv:2207.10552v1 [cs.LG])
    Topological data analysis (TDA) is a tool from data science and mathematics that is beginning to make waves in environmental science. In this work, we seek to provide an intuitive and understandable introduction to a tool from TDA that is particularly useful for the analysis of imagery, namely persistent homology. We briefly discuss the theoretical background but focus primarily on understanding the output of this tool and discussing what information it can glean. To this end, we frame our discussion around a guiding example of classifying satellite images from the Sugar, Fish, Flower, and Gravel Dataset produced for the study of mesocale organization of clouds by Rasp et. al. in 2020 (arXiv:1906:01906). We demonstrate how persistent homology and its vectorization, persistence landscapes, can be used in a workflow with a simple machine learning algorithm to obtain good results, and explore in detail how we can explain this behavior in terms of image-level features. One of the core strengths of persistent homology is how interpretable it can be, so throughout this paper we discuss not just the patterns we find, but why those results are to be expected given what we know about the theory of persistent homology. Our goal is that a reader of this paper will leave with a better understanding of TDA and persistent homology, be able to identify problems and datasets of their own for which persistent homology could be helpful, and gain an understanding of results they obtain from applying the included GitHub example code.  ( 3 min )
    TANDEM: Learning Joint Exploration and Decision Making with Tactile Sensors. (arXiv:2203.00798v3 [cs.RO] UPDATED)
    Inspired by the human ability to perform complex manipulation in the complete absence of vision (like retrieving an object from a pocket), the robotic manipulation field is motivated to develop new methods for tactile-based object interaction. However, tactile sensing presents the challenge of being an active sensing modality: a touch sensor provides sparse, local data, and must be used in conjunction with effective exploration strategies in order to collect information. In this work, we focus on the process of guiding tactile exploration, and its interplay with task-related decision making. We propose TANDEM (TActile exploration aNd DEcision Making), an architecture to learn efficient exploration strategies in conjunction with decision making. Our approach is based on separate but co-trained modules for exploration and discrimination. We demonstrate this method on a tactile object recognition task, where a robot equipped with a touch sensor must explore and identify an object from a known set based on binary contact signals alone. TANDEM achieves higher accuracy with fewer actions than alternative methods and is also shown to be more robust to sensor noise.  ( 3 min )
    Optimal precision for GANs. (arXiv:2207.10541v1 [cs.LG])
    When learning disconnected distributions, Generative adversarial networks (GANs) are known to face model misspecification. Indeed, a continuous mapping from a unimodal latent distribution to a disconnected one is impossible, so GANs necessarily generate samples outside of the support of the target distribution. This raises a fundamental question: what is the latent space partition that minimizes the measure of these areas? Building on a recent result of geometric measure theory, we prove that an optimal GANs must structure its latent space as a 'simplicial cluster' - a Voronoi partition where cells are convex cones - when the dimension of the latent space is larger than the number of modes. In this configuration, each Voronoi cell maps to a distinct mode of the data. We derive both an upper and a lower bound on the optimal precision of GANs learning disconnected manifolds. Interestingly, these two bounds have the same order of decrease: $\sqrt{\log m}$, $m$ being the number of modes. Finally, we perform several experiments to exhibit the geometry of the latent space and experimentally show that GANs have a geometry with similar properties to the theoretical one.  ( 2 min )
    CPrune: Compiler-Informed Model Pruning for Efficient Target-Aware DNN Execution. (arXiv:2207.01260v2 [cs.LG] UPDATED)
    Mobile devices run deep learning models for various purposes, such as image classification and speech recognition. Due to the resource constraints of mobile devices, researchers have focused on either making a lightweight deep neural network (DNN) model using model pruning or generating an efficient code using compiler optimization. Surprisingly, we found that the straightforward integration between model compression and compiler auto-tuning often does not produce the most efficient model for a target device. We propose CPrune, a compiler-informed model pruning for efficient target-aware DNN execution to support an application with a required target accuracy. CPrune makes a lightweight DNN model through informed pruning based on the structural information of subgraphs built during the compiler tuning process. Our experimental results show that CPrune increases the DNN execution speed up to 2.73x compared to the state-of-the-art TVM auto-tune while satisfying the accuracy requirement.  ( 2 min )
    Detecting and Preventing Shortcut Learning for Fair Medical AI using Shortcut Testing (ShorT). (arXiv:2207.10384v1 [cs.LG])
    Machine learning (ML) holds great promise for improving healthcare, but it is critical to ensure that its use will not propagate or amplify health disparities. An important step is to characterize the (un)fairness of ML models - their tendency to perform differently across subgroups of the population - and to understand its underlying mechanisms. One potential driver of algorithmic unfairness, shortcut learning, arises when ML models base predictions on improper correlations in the training data. However, diagnosing this phenomenon is difficult, especially when sensitive attributes are causally linked with disease. Using multi-task learning, we propose the first method to assess and mitigate shortcut learning as a part of the fairness assessment of clinical ML systems, and demonstrate its application to clinical tasks in radiology and dermatology. Finally, our approach reveals instances when shortcutting is not responsible for unfairness, highlighting the need for a holistic approach to fairness mitigation in medical AI.  ( 2 min )
    Conditional Hierarchical Bayesian Tucker Decomposition for Genetic Data Analysis. (arXiv:1911.12426v5 [cs.LG] UPDATED)
    We develop methods for reducing the dimensionality of large data sets, common in biomedical applications. Learning about patients using genetic data often includes more features than observations, which makes direct supervised learning difficult. One method of reducing the feature space is to use latent Dirichlet allocation to group genetic variants in an unsupervised manner. Latent Dirichlet allocation describes a patient as a mixture of topics corresponding to genetic variants. This can be generalized as a Bayesian tensor decomposition to account for multiple feature variables. Our most significant contributions are with hierarchical topic modeling. We design distinct methods of incorporating hierarchical topic modeling, based on nested Chinese restaurant processes and Pachinko Allocation Machine, into Bayesian tensor decomposition. We apply these models to examine patients with one of four common types of cancer (breast, lung, prostate, and colorectal) and siblings with and without autism spectrum disorder. We linked the genes with their biological pathways and combine this information into a tensor of patients, counts of their genetic variants, and the genes' membership in pathways. We find that our trained models outperform baseline models, with respect to coherence, by up to 40%.  ( 3 min )
    Fast Data Driven Estimation of Cluster Number in Multiplex Images using Embedded Density Outliers. (arXiv:2207.10469v1 [cs.LG])
    The usage of chemical imaging technologies is becoming a routine accompaniment to traditional methods in pathology. Significant technological advances have developed these next generation techniques to provide rich, spatially resolved, multidimensional chemical images. The rise of digital pathology has significantly enhanced the synergy of these imaging modalities with optical microscopy and immunohistochemistry, enhancing our understanding of the biological mechanisms and progression of diseases. Techniques such as imaging mass cytometry provide labelled multidimensional (multiplex) images of specific components used in conjunction with digital pathology techniques. These powerful techniques generate a wealth of high dimensional data that create significant challenges in data analysis. Unsupervised methods such as clustering are an attractive way to analyse these data, however, they require the selection of parameters such as the number of clusters. Here we propose a methodology to estimate the number of clusters in an automatic data-driven manner using a deep sparse autoencoder to embed the data into a lower dimensional space. We compute the density of regions in the embedded space, the majority of which are empty, enabling the high density regions to be detected as outliers and provide an estimate for the number of clusters. This framework provides a fully unsupervised and data-driven method to analyse multidimensional data. In this work we demonstrate our method using 45 multiplex imaging mass cytometry datasets. Moreover, our model is trained using only one of the datasets and the learned embedding is applied to the remaining 44 images providing an efficient process for data analysis. Finally, we demonstrate the high computational efficiency of our method which is two orders of magnitude faster than estimating via computing the sum squared distances as a function of cluster number.  ( 3 min )
    Deep Reinforcement Learning for Constrained Field Development Optimization in Subsurface Two-phase Flow. (arXiv:2104.00527v1 [cs.LG] CROSS LISTED)
    We present a deep reinforcement learning-based artificial intelligence agent that could provide optimized development plans given a basic description of the reservoir and rock/fluid properties with minimal computational cost. This artificial intelligence agent, comprising of a convolutional neural network, provides a mapping from a given state of the reservoir model, constraints, and economic condition to the optimal decision (drill/do not drill and well location) to be taken in the next stage of the defined sequential field development planning process. The state of the reservoir model is defined using parameters that appear in the governing equations of the two-phase flow. A feedback loop training process referred to as deep reinforcement learning is used to train an artificial intelligence agent with such a capability. The training entails millions of flow simulations with varying reservoir model descriptions (structural, rock and fluid properties), operational constraints, and economic conditions. The parameters that define the reservoir model, operational constraints, and economic conditions are randomly sampled from a defined range of applicability. Several algorithmic treatments are introduced to enhance the training of the artificial intelligence agent. After appropriate training, the artificial intelligence agent provides an optimized field development plan instantly for new scenarios within the defined range of applicability. This approach has advantages over traditional optimization algorithms (e.g., particle swarm optimization, genetic algorithm) that are generally used to find a solution for a specific field development scenario and typically not generalizable to different scenarios.  ( 3 min )
    UniFed: A Benchmark for Federated Learning Frameworks. (arXiv:2207.10308v1 [cs.LG])
    Federated Learning (FL) has become a practical and popular paradigm in machine learning. However, currently, there is no systematic solution that covers diverse use cases. Practitioners often face the challenge of how to select a matching FL framework for their use case. In this work, we present UniFed, the first unified benchmark for standardized evaluation of the existing open-source FL frameworks. With 15 evaluation scenarios, we present both qualitative and quantitative evaluation results of nine existing popular open-sourced FL frameworks, from the perspectives of functionality, usability, and system performance. We also provide suggestions on framework selection based on the benchmark conclusions and point out future improvement directions.  ( 2 min )
    Deep reinforcement learning for optimal well control in subsurface systems with uncertain geology. (arXiv:2203.13375v1 [physics.comp-ph] CROSS LISTED)
    A general control policy framework based on deep reinforcement learning (DRL) is introduced for closed-loop decision making in subsurface flow settings. Traditional closed-loop modeling workflows in this context involve the repeated application of data assimilation/history matching and robust optimization steps. Data assimilation can be particularly challenging in cases where both the geological style (scenario) and individual model realizations are uncertain. The closed-loop reservoir management (CLRM) problem is formulated here as a partially observable Markov decision process, with the associated optimization problem solved using a proximal policy optimization algorithm. This provides a control policy that instantaneously maps flow data observed at wells (as are available in practice) to optimal well pressure settings. The policy is represented by a temporal convolution and gated transformer blocks. Training is performed in a preprocessing step with an ensemble of prior geological models, which can be drawn from multiple geological scenarios. Example cases involving the production of oil via water injection, with both 2D and 3D geological models, are presented. The DRL-based methodology is shown to result in an NPV increase of 15% (for the 2D cases) and 33% (3D cases) relative to robust optimization over prior models, and to an average improvement of 4% in NPV relative to traditional CLRM. The solutions from the control policy are found to be comparable to those from deterministic optimization, in which the geological model is assumed to be known, even when multiple geological scenarios are considered. The control policy approach results in a 76% decrease in computational cost relative to traditional CLRM with the algorithms and parameter settings considered in this work.  ( 3 min )
    High-Dimensional $L_2$Boosting: Rate of Convergence. (arXiv:1602.08927v3 [stat.ML] UPDATED)
    Boosting is one of the most significant developments in machine learning. This paper studies the rate of convergence of $L_2$Boosting, which is tailored for regression, in a high-dimensional setting. Moreover, we introduce so-called \textquotedblleft post-Boosting\textquotedblright. This is a post-selection estimator which applies ordinary least squares to the variables selected in the first stage by $L_2$Boosting. Another variant is \textquotedblleft Orthogonal Boosting\textquotedblright\ where after each step an orthogonal projection is conducted. We show that both post-$L_2$Boosting and the orthogonal boosting achieve the same rate of convergence as LASSO in a sparse, high-dimensional setting. We show that the rate of convergence of the classical $L_2$Boosting depends on the design matrix described by a sparse eigenvalue constant. To show the latter results, we derive new approximation results for the pure greedy algorithm, based on analyzing the revisiting behavior of $L_2$Boosting. We also introduce feasible rules for early stopping, which can be easily implemented and used in applied work. Our results also allow a direct comparison between LASSO and boosting which has been missing from the literature. Finally, we present simulation studies and applications to illustrate the relevance of our theoretical results and to provide insights into the practical aspects of boosting. In these simulation studies, post-$L_2$Boosting clearly outperforms LASSO.  ( 3 min )
    Efficient Search of Multiple Neural Architectures with Different Complexities via Importance Sampling. (arXiv:2207.10334v1 [cs.NE])
    Neural architecture search (NAS) aims to automate architecture design processes and improve the performance of deep neural networks. Platform-aware NAS methods consider both performance and complexity and can find well-performing architectures with low computational resources. Although ordinary NAS methods result in tremendous computational costs owing to the repetition of model training, one-shot NAS, which trains the weights of a supernetwork containing all candidate architectures only once during the search process, has been reported to result in a lower search cost. This study focuses on the architecture complexity-aware one-shot NAS that optimizes the objective function composed of the weighted sum of two metrics, such as the predictive performance and number of parameters. In existing methods, the architecture search process must be run multiple times with different coefficients of the weighted sum to obtain multiple architectures with different complexities. This study aims at reducing the search cost associated with finding multiple architectures. The proposed method uses multiple distributions to generate architectures with different complexities and updates each distribution using the samples obtained from multiple distributions based on importance sampling. The proposed method allows us to obtain multiple architectures with different complexities in a single architecture search, resulting in reducing the search cost. The proposed method is applied to the architecture search of convolutional neural networks on the CIAFR-10 and ImageNet datasets. Consequently, compared with baseline methods, the proposed method finds multiple architectures with varying complexities while requiring less computational effort.  ( 3 min )
    Bayesian Recurrent Units and the Forward-Backward Algorithm. (arXiv:2207.10486v1 [stat.ML])
    Using Bayes's theorem, we derive a unit-wise recurrence as well as a backward recursion similar to the forward-backward algorithm. The resulting Bayesian recurrent units can be integrated as recurrent neural networks within deep learning frameworks, while retaining a probabilistic interpretation from the direct correspondence with hidden Markov models. Whilst the contribution is mainly theoretical, experiments on speech recognition indicate that adding the derived units at the end of state-of-the-art recurrent architectures can improve the performance at a very low cost in terms of trainable parameters.  ( 2 min )
    Brain-Aware Replacements for Supervised Contrastive Learning in Detection of Alzheimer's Disease. (arXiv:2207.04574v2 [cs.CV] UPDATED)
    We propose a novel framework for Alzheimer's disease (AD) detection using brain MRIs. The framework starts with a data augmentation method called Brain-Aware Replacements (BAR), which leverages a standard brain parcellation to replace medically-relevant 3D brain regions in an anchor MRI from a randomly picked MRI to create synthetic samples. Ground truth "hard" labels are also linearly mixed depending on the replacement ratio in order to create "soft" labels. BAR produces a great variety of realistic-looking synthetic MRIs with higher local variability compared to other mix-based methods, such as CutMix. On top of BAR, we propose using a soft-label-capable supervised contrastive loss, aiming to learn the relative similarity of representations that reflect how mixed are the synthetic MRIs using our soft labels. This way, we do not fully exhaust the entropic capacity of our hard labels, since we only use them to create soft labels and synthetic MRIs through BAR. We show that a model pre-trained using our framework can be further fine-tuned with a cross-entropy loss using the hard labels that were used to create the synthetic samples. We validated the performance of our framework in a binary AD detection task against both from-scratch supervised training and state-of-the-art self-supervised training plus fine-tuning approaches. Then we evaluated BAR's individual performance compared to another mix-based method CutMix by integrating it within our framework. We show that our framework yields superior results in both precision and recall for the AD detection task.  ( 3 min )
    Machine Learning-Friendly Biomedical Datasets for Equivalence and Subsumption Ontology Matching. (arXiv:2205.03447v2 [cs.AI] UPDATED)
    Ontology Matching (OM) plays an important role in many domains such as bioinformatics and the Semantic Web, and its research is becoming increasingly popular, especially with the application of machine learning (ML) techniques. Although the Ontology Alignment Evaluation Initiative (OAEI) represents an impressive effort for the systematic evaluation of OM systems, it still suffers from several limitations including limited evaluation of subsumption mappings, suboptimal reference mappings, and limited support for the evaluation of ML-based systems. To tackle these limitations, we introduce five new biomedical OM tasks involving ontologies extracted from Mondo and UMLS. Each task includes both equivalence and subsumption matching; the quality of reference mappings is ensured by human curation, ontology pruning, etc.; and a comprehensive evaluation framework is proposed to measure OM performance from various perspectives for both ML-based and non-ML-based OM systems. We report evaluation results for OM systems of different types to demonstrate the usage of these resources, all of which are publicly available as part of the new BioML track at OAEI 2022.  ( 2 min )
    Symbolic Regression in Materials Science: Discovering Interatomic Potentials from Data. (arXiv:2206.06422v2 [cond-mat.mtrl-sci] UPDATED)
    Particle-based modeling of materials at atomic scale plays an important role in the development of new materials and understanding of their properties. The accuracy of particle simulations is determined by interatomic potentials, which allow to calculate the potential energy of an atomic system as a function of atomic coordinates and potentially other properties. First-principles-based ab initio potentials can reach arbitrary levels of accuracy, however their aplicability is limited by their high computational cost. Machine learning (ML) has recently emerged as an effective way to offset the high computational costs of ab initio atomic potentials by replacing expensive models with highly efficient surrogates trained on electronic structure data. Among a plethora of current methods, symbolic regression (SR) is gaining traction as a powerful "white-box" approach for discovering functional forms of interatomic potentials. This contribution discusses the role of symbolic regression in Materials Science (MS) and offers a comprehensive overview of current methodological challenges and state-of-the-art results. A genetic programming-based approach for modeling atomic potentials from raw data (consisting of snapshots of atomic positions and associated potential energy) is presented and empirically validated on ab initio electronic structure data.  ( 3 min )
    Algorithmic encoding of protected characteristics in image-based models for disease detection. (arXiv:2110.14755v4 [cs.LG] UPDATED)
    It has been rightfully emphasized that the use of AI for clinical decision making could amplify health disparities. An algorithm may encode protected characteristics, and then use this information for making predictions due to undesirable correlations in the (historical) training data. It remains unclear how we can establish whether such information is actually used. Besides the scarcity of data from underserved populations, very little is known about how dataset biases manifest in predictive models and how this may result in disparate performance. This article aims to shed some light on these issues by exploring new methodology for subgroup analysis in image-based disease detection models. We utilize two publicly available chest X-ray datasets, CheXpert and MIMIC-CXR, to study performance disparities across race and biological sex in deep learning models. We explore test set resampling, transfer learning, multitask learning, and model inspection to assess the relationship between the encoding of protected characteristics and disease detection performance across subgroups. We confirm subgroup disparities in terms of shifted true and false positive rates which are partially removed after correcting for population and prevalence shifts in the test sets. We further find a previously used transfer learning method to be insufficient for establishing whether specific patient information is used for making predictions. The proposed combination of test-set resampling, multitask learning, and model inspection reveals valuable new insights about the way protected characteristics are encoded in the feature representations of deep neural networks.  ( 3 min )
    Learning to Split for Automatic Bias Detection. (arXiv:2204.13749v2 [cs.LG] UPDATED)
    Classifiers are biased when trained on biased datasets. As a remedy, we propose Learning to Split (ls), an algorithm for automatic bias detection. Given a dataset with input-label pairs, ls learns to split this dataset so that predictors trained on the training split cannot generalize to the testing split. This performance gap suggests that the testing split is under-represented in the dataset, which is a signal of potential bias. Identifying non-generalizable splits is challenging since we have no annotations about the bias. In this work, we show that the prediction correctness of each example in the testing split can be used as a source of weak supervision: generalization performance will drop if we move examples that are predicted correctly away from the testing split, leaving only those that are mis-predicted. ls is task-agnostic and can be applied to any supervised learning problem, ranging from natural language understanding and image classification to molecular property prediction. Empirical results show that ls is able to generate astonishingly challenging splits that correlate with human-identified biases. Moreover, we demonstrate that combining robust learning algorithms (such as group DRO) with splits identified by ls enables automatic de-biasing. Compared to previous state-of-the-art, we substantially improve the worst-group performance (23.4% on average) when the source of biases is unknown during training and validation.  ( 3 min )
    Novel Class Discovery without Forgetting. (arXiv:2207.10659v1 [cs.CV])
    Humans possess an innate ability to identify and differentiate instances that they are not familiar with, by leveraging and adapting the knowledge that they have acquired so far. Importantly, they achieve this without deteriorating the performance on their earlier learning. Inspired by this, we identify and formulate a new, pragmatic problem setting of NCDwF: Novel Class Discovery without Forgetting, which tasks a machine learning model to incrementally discover novel categories of instances from unlabeled data, while maintaining its performance on the previously seen categories. We propose 1) a method to generate pseudo-latent representations which act as a proxy for (no longer available) labeled data, thereby alleviating forgetting, 2) a mutual-information based regularizer which enhances unsupervised discovery of novel classes, and 3) a simple Known Class Identifier which aids generalized inference when the testing data contains instances form both seen and unseen categories. We introduce experimental protocols based on CIFAR-10, CIFAR-100 and ImageNet-1000 to measure the trade-off between knowledge retention and novel class discovery. Our extensive evaluations reveal that existing models catastrophically forget previously seen categories while identifying novel categories, while our method is able to effectively balance between the competing objectives. We hope our work will attract further research into this newly identified pragmatic problem setting.  ( 2 min )
    Deep Reinforcement Learning for Field Development Optimization. (arXiv:2008.12627v1 [eess.SP] CROSS LISTED)
    The field development optimization (FDO) problem represents a challenging mixed-integer nonlinear programming (MINLP) problem in which we seek to obtain the number of wells, their type, location, and drilling sequence that maximizes an economic metric. Evolutionary optimization algorithms have been effectively applied to solve the FDO problem, however, these methods provide only a deterministic (single) solution which are generally not robust towards small changes in the problem setup. In this work, the goal is to apply convolutional neural network-based (CNN) deep reinforcement learning (DRL) algorithms to the field development optimization problem in order to obtain a policy that maps from different states or representation of the underlying geological model to optimal decisions. The proximal policy optimization (PPO) algorithm is considered with two CNN architectures of varying number of layers and composition. Both networks obtained policies that provide satisfactory results when compared to a hybrid particle swarm optimization - mesh adaptive direct search (PSO-MADS) algorithm that has been shown to be effective at solving the FDO problem.  ( 2 min )
    DODA: Data-oriented Sim-to-Real Domain Adaptation for 3D Semantic Segmentation. (arXiv:2204.01599v2 [cs.CV] UPDATED)
    Deep learning approaches achieve prominent success in 3D semantic segmentation. However, collecting densely annotated real-world 3D datasets is extremely time-consuming and expensive. Training models on synthetic data and generalizing on real-world scenarios becomes an appealing alternative, but unfortunately suffers from notorious domain shifts. In this work, we propose a Data-Oriented Domain Adaptation (DODA) framework to mitigate pattern and context gaps caused by different sensing mechanisms and layout placements across domains. Our DODA encompasses virtual scan simulation to imitate real-world point cloud patterns and tail-aware cuboid mixing to alleviate the interior context gap with a cuboid-based intermediate domain. The first unsupervised sim-to-real adaptation benchmark on 3D indoor semantic segmentation is also built on 3D-FRONT, ScanNet and S3DIS along with 7 popular Unsupervised Domain Adaptation (UDA) methods. Our DODA surpasses existing UDA approaches by over 13% on both 3D-FRONT -> ScanNet and 3D-FRONT -> S3DIS. Code is available at https://github.com/CVMI-Lab/DODA.  ( 2 min )
    Long-term Spatio-temporal Forecasting via Dynamic Multiple-Graph Attention. (arXiv:2204.11008v4 [cs.LG] UPDATED)
    Many real-world ubiquitous applications, such as parking recommendations and air pollution monitoring, benefit significantly from accurate long-term spatio-temporal forecasting (LSTF). LSTF makes use of long-term dependency between spatial and temporal domains, contextual information, and inherent pattern in the data. Recent studies have revealed the potential of multi-graph neural networks (MGNNs) to improve prediction performance. However, existing MGNN methods cannot be directly applied to LSTF due to several issues: the low level of generality, insufficient use of contextual information, and the imbalanced graph fusion approach. To address these issues, we construct new graph models to represent the contextual information of each node and the long-term spatio-temporal data dependency structure. To fuse the information across multiple graphs, we propose a new dynamic multi-graph fusion module to characterize the correlations of nodes within a graph and the nodes across graphs via the spatial attention and graph attention mechanisms. Furthermore, we introduce a trainable weight tensor to indicate the importance of each node in different graphs. Extensive experiments on two large-scale datasets demonstrate that our proposed approaches significantly improve the performance of existing graph neural network models in LSTF prediction tasks.  ( 3 min )
    Locally Random P-adic Alloy Codes with Channel Coding Theorems for Distributed Coded Tensors. (arXiv:2202.03469v4 [cs.IT] UPDATED)
    Tensors, i.e., multi-linear functions, are a fundamental building block of machine learning algorithms. In order to train on large data-sets, it is common practice to distribute the computation amongst workers. However, stragglers and other faults can severely impact the performance and overall training time. A novel strategy to mitigate these failures is the use of coded computation. We introduce a new metric for analysis called the typical recovery threshold, which focuses on the most likely event and provide a novel construction of distributed coded tensor operations which are optimal with this measure. We show that our general framework encompasses many other computational schemes and metrics as a special case. In particular, we prove that the recovery threshold and the tensor rank can be recovered as a special case of the typical recovery threshold when the probability of noise, i.e., a fault, is equal to zero, thereby providing a noisy generalization of noiseless computation as a serendipitous result. Far from being a purely theoretical construction, these definitions lead us to practical random code constructions, i.e., locally random p-adic alloy codes, which are optimal with respect to the measures. We analyze experiments conducted on Amazon EC2 and establish that they are faster and more numerically stable than many other benchmark computation schemes in practice, as is predicted by theory.  ( 3 min )
    An Equivalence Between Data Poisoning and Byzantine Gradient Attacks. (arXiv:2202.08578v2 [cs.LG] UPDATED)
    To study the resilience of distributed learning, the "Byzantine" literature considers a strong threat model where workers can report arbitrary gradients to the parameter server. Whereas this model helped obtain several fundamental results, it has sometimes been considered unrealistic, when the workers are mostly trustworthy machines. In this paper, we show a surprising equivalence between this model and data poisoning, a threat considered much more realistic. More specifically, we prove that every gradient attack can be reduced to data poisoning, in any personalized federated learning system with PAC guarantees (which we show are both desirable and realistic). This equivalence makes it possible to obtain new impossibility results on the resilience of any "robust" learning algorithm to data poisoning in highly heterogeneous applications, as corollaries of existing impossibility theorems on Byzantine machine learning. Moreover, using our equivalence, we derive a practical attack that we show (theoretically and empirically) can be very effective against classical personalized federated learning models.  ( 2 min )
    An Efficient and Adaptive Granular-ball Generation Method in Classification Problem. (arXiv:2201.04343v2 [cs.LG] UPDATED)
    Granular-ball computing is an efficient, robust, and scalable learning method for granular computing. The basis of granular-ball computing is the granular-ball generation method. This paper proposes a method for accelerating the granular-ball generation using the division to replace $k$-means. It can greatly improve the efficiency of granular-ball generation while ensuring the accuracy similar to the existing method. Besides, a new adaptive method for the granular-ball generation is proposed by considering granular-ball's overlap eliminating and some other factors. This makes the granular-ball generation process of parameter-free and completely adaptive in the true sense. In addition, this paper first provides the mathematical models for the granular-ball covering. The experimental results on some real data sets demonstrate that the proposed two granular-ball generation methods have similar accuracies with the existing method while adaptiveness or acceleration is realized.  ( 2 min )
    FELARE: Fair Scheduling of Machine Learning Tasks on Heterogeneous Edge Systems. (arXiv:2206.00065v3 [cs.DC] UPDATED)
    Edge computing enables smart IoT-based systems via concurrent and continuous execution of latency-sensitive machine learning (ML) applications. These edge-based machine learning systems are often battery-powered (i.e., energy-limited). They use heterogeneous resources with diverse computing performance (e.g., CPU, GPU, and/or FPGAs) to fulfill the latency constraints of ML applications. The challenge is to allocate user requests for different ML applications on the Heterogeneous Edge Computing Systems (HEC) with respect to both the energy and latency constraints of these systems. To this end, we study and analyze resource allocation solutions that can increase the on-time task completion rate while considering the energy constraint. Importantly, we investigate edge-friendly (lightweight) multi-objective mapping heuristics that do not become biased toward a particular application type to achieve the objectives; instead, the heuristics consider "fairness" across the concurrent ML applications in their mapping decisions. Performance evaluations demonstrate that the proposed heuristic outperforms widely-used heuristics in heterogeneous systems in terms of the latency and energy objectives, particularly, at low to moderate request arrival rates. We observed 8.9% improvement in on-time task completion rate and 12.6% in energy-saving without imposing any significant overhead on the edge system.  ( 3 min )
    Neural Architecture Search for Spiking Neural Networks. (arXiv:2201.10355v3 [cs.NE] UPDATED)
    Spiking Neural Networks (SNNs) have gained huge attention as a potential energy-efficient alternative to conventional Artificial Neural Networks (ANNs) due to their inherent high-sparsity activation. However, most prior SNN methods use ANN-like architectures (e.g., VGG-Net or ResNet), which could provide sub-optimal performance for temporal sequence processing of binary information in SNNs. To address this, in this paper, we introduce a novel Neural Architecture Search (NAS) approach for finding better SNN architectures. Inspired by recent NAS approaches that find the optimal architecture from activation patterns at initialization, we select the architecture that can represent diverse spike activation patterns across different data samples without training. Moreover, to further leverage the temporal information among the spikes, we search for feed forward connections as well as backward connections (i.e., temporal feedback connections) between layers. Interestingly, SNASNet found by our search algorithm achieves higher performance with backward connections, demonstrating the importance of designing SNN architecture for suitably using temporal information. We conduct extensive experiments on three image recognition benchmarks where we show that SNASNet achieves state-of-the-art performance with significantly lower timesteps (5 timesteps). Code is available at Github.  ( 3 min )
    An Explanation of In-context Learning as Implicit Bayesian Inference. (arXiv:2111.02080v6 [cs.CL] UPDATED)
    Large language models (LMs) such as GPT-3 have the surprising ability to do in-context learning, where the model learns to do a downstream task simply by conditioning on a prompt consisting of input-output examples. The LM learns from these examples without being explicitly pretrained to learn. Thus, it is unclear what enables in-context learning. In this paper, we study how in-context learning can emerge when pretraining documents have long-range coherence. Here, the LM must infer a latent document-level concept to generate coherent next tokens during pretraining. At test time, in-context learning occurs when the LM also infers a shared latent concept between examples in a prompt. We prove when this occurs despite a distribution mismatch between prompts and pretraining data in a setting where the pretraining distribution is a mixture of HMMs. In contrast to messy large-scale datasets used to train LMs capable of in-context learning, we generate a small-scale synthetic dataset (GINC) where Transformers and LSTMs both exhibit in-context learning. Beyond the theory, experiments on GINC exhibit large-scale real-world phenomena including improved in-context performance with model scaling (despite the same pretraining loss), sensitivity to example order, and instances where zero-shot is better than few-shot in-context learning.  ( 3 min )
    Inducing Causal Structure for Interpretable Neural Networks. (arXiv:2112.00826v2 [cs.LG] UPDATED)
    In many areas, we have well-founded insights about causal structure that would be useful to bring into our trained models while still allowing them to learn in a data-driven fashion. To achieve this, we present the new method of interchange intervention training (IIT). In IIT, we (1) align variables in a causal model (e.g., a deterministic program or Bayesian network) with representations in a neural model and (2) train the neural model to match the counterfactual behavior of the causal model on a base input when aligned representations in both models are set to be the value they would be for a source input. IIT is fully differentiable, flexibly combines with other objectives, and guarantees that the target causal model is a causal abstraction of the neural model when its loss is zero. We evaluate IIT on a structural vision task (MNIST-PVR), a navigational language task (ReaSCAN), and a natural language inference task (MQNLI). We compare IIT against multi-task training objectives and data augmentation. In all our experiments, IIT achieves the best results and produces neural models that are more interpretable in the sense that they more successfully realize the target causal model.  ( 3 min )
    OCR-free Document Understanding Transformer. (arXiv:2111.15664v2 [cs.LG] UPDATED)
    Understanding document images (e.g., invoices) is a core but challenging task since it requires complex functions such as reading text and a holistic understanding of the document. Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs. Although such OCR-based approaches have shown promising performance, they suffer from 1) high computational costs for using OCR; 2) inflexibility of OCR models on languages or types of document; 3) OCR error propagation to the subsequent process. To address these issues, in this paper, we introduce a novel OCR-free VDU model named Donut, which stands for Document understanding transformer. As the first step in OCR-free VDU research, we propose a simple architecture (i.e., Transformer) with a pre-training objective (i.e., cross-entropy loss). Donut is conceptually simple yet effective. Through extensive experiments and analyses, we show a simple OCR-free VDU model, Donut, achieves state-of-the-art performances on various VDU tasks in terms of both speed and accuracy. In addition, we offer a synthetic data generator that helps the model pre-training to be flexible in various languages and domains. The code, trained model and synthetic data are available at https://github.com/clovaai/donut.  ( 3 min )
    Neural Tangent Kernel Beyond the Infinite-Width Limit: Effects of Depth and Initialization. (arXiv:2202.00553v2 [cs.LG] UPDATED)
    Neural Tangent Kernel (NTK) is widely used to analyze overparametrized neural networks due to the famous result by Jacot et al. (2018): in the infinite-width limit, the NTK is deterministic and constant during training. However, this result cannot explain the behavior of deep networks, since it generally does not hold if depth and width tend to infinity simultaneously. In this paper, we study the NTK of fully-connected ReLU networks with depth comparable to width. We prove that the NTK properties depend significantly on the depth-to-width ratio and the distribution of parameters at initialization. In fact, our results indicate the importance of the three phases in the hyperparameter space identified in Poole et al. (2016): ordered, chaotic and the edge of chaos (EOC). We derive exact expressions for the NTK dispersion in the infinite-depth-and-width limit in all three phases and conclude that the NTK variability grows exponentially with depth at the EOC and in the chaotic phase but not in the ordered phase. We also show that the NTK of deep networks may stay constant during training only in the ordered phase and discuss how the structure of the NTK matrix changes during training.  ( 3 min )
    Gaussian Process Uniform Error Bounds with Unknown Hyperparameters for Safety-Critical Applications. (arXiv:2109.02606v2 [cs.LG] UPDATED)
    Gaussian processes have become a promising tool for various safety-critical settings, since the posterior variance can be used to directly estimate the model error and quantify risk. However, state-of-the-art techniques for safety-critical settings hinge on the assumption that the kernel hyperparameters are known, which does not apply in general. To mitigate this, we introduce robust Gaussian process uniform error bounds in settings with unknown hyperparameters. Our approach computes a confidence region in the space of hyperparameters, which enables us to obtain a probabilistic upper bound for the model error of a Gaussian process with arbitrary hyperparameters. We do not require to know any bounds for the hyperparameters a priori, which is an assumption commonly found in related work. Instead, we are able to derive bounds from data in an intuitive fashion. We additionally employ the proposed technique to derive performance guarantees for a class of learning-based control problems. Experiments show that the bound performs significantly better than vanilla and fully Bayesian Gaussian processes.  ( 2 min )
    Geometric Multimodal Contrastive Representation Learning. (arXiv:2202.03390v3 [cs.LG] UPDATED)
    Learning representations of multimodal data that are both informative and robust to missing modalities at test time remains a challenging problem due to the inherent heterogeneity of data obtained from different channels. To address it, we present a novel Geometric Multimodal Contrastive (GMC) representation learning method consisting of two main components: i) a two-level architecture consisting of modality-specific base encoders, allowing to process an arbitrary number of modalities to an intermediate representation of fixed dimensionality, and a shared projection head, mapping the intermediate representations to a latent representation space; ii) a multimodal contrastive loss function that encourages the geometric alignment of the learned representations. We experimentally demonstrate that GMC representations are semantically rich and achieve state-of-the-art performance with missing modality information on three different learning problems including prediction and reinforcement learning tasks.  ( 2 min )
    A Survey of Deep Learning Architectures for Intelligent Reflecting Surfaces. (arXiv:2009.02540v5 [eess.SP] UPDATED)
    Intelligent reflecting surfaces (IRSs) have recently received significant attention for wireless communications because it reduces the hardware complexity, physical size, weight, and cost of conventional large arrays. However, deployment of IRS entails dealing with multiple channel links between the base station (BS) and the users. Further, the BS and IRS beamformers require a joint design, wherein the IRS elements must be rapidly reconfigured. Data-driven techniques, such as deep learning (DL), are critical in addressing these challenges. The lower computation time and model-free nature of DL makes it robust against the data imperfections and environmental changes. At the physical layer, DL has been shown to be effective for IRS signal detection, channel estimation and active/passive beamforming using architectures such as supervised, unsupervised and reinforcement learning. This article provides a synopsis of these techniques for designing DL-based IRS-assisted wireless systems.  ( 3 min )
    Multi Resolution Analysis (MRA) for Approximate Self-Attention. (arXiv:2207.10284v1 [cs.LG])
    Transformers have emerged as a preferred model for many tasks in natural langugage processing and vision. Recent efforts on training and deploying Transformers more efficiently have identified many strategies to approximate the self-attention matrix, a key module in a Transformer architecture. Effective ideas include various prespecified sparsity patterns, low-rank basis expansions and combinations thereof. In this paper, we revisit classical Multiresolution Analysis (MRA) concepts such as Wavelets, whose potential value in this setting remains underexplored thus far. We show that simple approximations based on empirical feedback and design choices informed by modern hardware and implementation challenges, eventually yield a MRA-based approach for self-attention with an excellent performance profile across most criteria of interest. We undertake an extensive set of experiments and demonstrate that this multi-resolution scheme outperforms most efficient self-attention proposals and is favorable for both short and long sequences. Code is available at \url{https://github.com/mlpen/mra-attention}.  ( 2 min )
    Contrastive Learning with Complex Heterogeneity. (arXiv:2105.09401v2 [cs.LG] UPDATED)
    With the advent of big data across multiple high-impact applications, we are often facing the challenge of complex heterogeneity. The newly collected data usually consist of multiple modalities and are characterized with multiple labels, thus exhibiting the co-existence of multiple types of heterogeneity. Although state-of-the-art techniques are good at modeling complex heterogeneity with sufficient label information, such label information can be quite expensive to obtain in real applications. Recently, researchers pay great attention to contrastive learning due to its prominent performance by utilizing rich unlabeled data. However, existing work on contrastive learning is not able to address the problem of false negative pairs, i.e., some `negative' pairs may have similar representations if they have the same label. To overcome the issues, in this paper, we propose a unified heterogeneous learning framework, which combines both the weighted unsupervised contrastive loss and the weighted supervised contrastive loss to model multiple types of heterogeneity. We first provide a theoretical analysis showing that the vanilla contrastive learning loss easily leads to the sub-optimal solution in the presence of false negative pairs, whereas the proposed weighted loss could automatically adjust the weight based on the similarity of the learned representations to mitigate this issue. Experimental results on real-world data sets demonstrate the effectiveness and the efficiency of the proposed framework modeling multiple types of heterogeneity.  ( 3 min )
    RADAMS: Resilient and Adaptive Alert and Attention Management Strategy against Informational Denial-of-Service (IDoS) Attacks. (arXiv:2111.03463v2 [cs.CR] UPDATED)
    Attacks exploiting human attentional vulnerability have posed severe threats to cybersecurity. In this work, we identify and formally define a new type of proactive attentional attacks called Informational Denial-of-Service (IDoS) attacks that generate a large volume of feint attacks to overload human operators and hide real attacks among feints. We incorporate human factors (e.g., levels of expertise, stress, and efficiency) and empirical psychological results (e.g., the Yerkes-Dodson law and the sunk cost fallacy) to model the operators' attention dynamics and their decision-making processes along with the real-time alert monitoring and inspection. To assist human operators in dismissing the feints and escalating the real attacks timely and accurately, we develop a Resilient and Adaptive Data-driven alert and Attention Management Strategy (RADAMS) that de-emphasizes alerts selectively based on the abstracted category labels of the alerts. RADAMS uses reinforcement learning to achieve a customized and transferable design for various human operators and evolving IDoS attacks. The integrated modeling and theoretical analysis lead to the Product Principle of Attention (PPoA), fundamental limits, and the tradeoff among crucial human and economic factors. Experimental results corroborate that the proposed strategy outperforms the default strategy and can reduce the IDoS risk by as much as 20%. Besides, the strategy is resilient to large variations of costs, attack frequencies, and human attention capacities. We have recognized interesting phenomena such as attentional risk equivalency, attacker's dilemma, and the half-truth optimal attack strategy.  ( 3 min )
    Distribution Approximation and Statistical Estimation Guarantees of Generative Adversarial Networks. (arXiv:2002.03938v3 [cs.LG] UPDATED)
    Generative Adversarial Networks (GANs) have achieved a great success in unsupervised learning. Despite its remarkable empirical performance, there are limited theoretical studies on the statistical properties of GANs. This paper provides approximation and statistical guarantees of GANs for the estimation of data distributions that have densities in a H\"{o}lder space. Our main result shows that, if the generator and discriminator network architectures are properly chosen, GANs are consistent estimators of data distributions under strong discrepancy metrics, such as the Wasserstein-1 distance. Furthermore, when the data distribution exhibits low-dimensional structures, we show that GANs are capable of capturing the unknown low-dimensional structures in data and enjoy a fast statistical convergence, which is free of curse of the ambient dimensionality. Our analysis for low-dimensional data builds upon a universal approximation theory of neural networks with Lipschitz continuity guarantees, which may be of independent interest.  ( 2 min )
    Variational quantum algorithm for Gaussian discrete solitons and their boson sampling. (arXiv:2110.12379v4 [quant-ph] UPDATED)
    In the context of quantum information, highly nonlinear regimes, such as those supporting solitons, are marginally investigated. We miss general methods for quantum solitons, although they can act as entanglement generators or as self-organized quantum processors. We develop a computational approach that uses a neural network as a variational ansatz for quantum solitons in an array of waveguides. By training the resulting phase-space quantum machine learning model, we find different soliton solutions varying the number of particles and interaction strength. We consider Gaussian states that enable measuring the degree of entanglement and sampling the probability distribution of many-particle events. We also determine the probability of generating particle pairs and unveil that soliton bound states emit correlated pairs. These results may have a role in boson sampling with nonlinear systems and in quantum processors for entangled nonlinear waves.  ( 2 min )
    Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition. (arXiv:2010.10504v2 [eess.AS] UPDATED)
    We employ a combination of recent developments in semi-supervised learning for automatic speech recognition to obtain state-of-the-art results on LibriSpeech utilizing the unlabeled audio of the Libri-Light dataset. More precisely, we carry out noisy student training with SpecAugment using giant Conformer models pre-trained using wav2vec 2.0 pre-training. By doing so, we are able to achieve word-error-rates (WERs) 1.4%/2.6% on the LibriSpeech test/test-other sets against the current state-of-the-art WERs 1.7%/3.3%.  ( 2 min )
    High-Dimensional Inference in Bayesian Networks. (arXiv:2112.09217v2 [stat.ML] UPDATED)
    Inference of the marginal probability distribution is defined as the calculation of the probability of a subset of the variables and is relevant for handling missing data and hidden variables. While inference of the marginal probability distribution is crucial for various problems in machine learning and statistics, its exact computation is generally not feasible for categorical variables in Bayesian networks due to the NP-hardness of this task. We develop a divide-and-conquer approach using the graphical properties of Bayesian networks to split the computation of the marginal probability distribution into sub-calculations of lower dimensionality, thus reducing the overall computational complexity. Exploiting this property, we present an efficient and scalable algorithm for calculating the marginal probability distribution for categorical variables. The novel method is compared against state-of-the-art approximate inference methods in a benchmarking study, where it displays superior performance. As an immediate application, we demonstrate how our method can be used to classify incomplete data against Bayesian networks and use this approach for identifying the cancer subtype of kidney cancer patient samples.  ( 2 min )
    Federated Learning with Non-IID Data. (arXiv:1806.00582v2 [cs.LG] UPDATED)
    Federated learning enables resource-constrained edge compute devices, such as mobile phones and IoT devices, to learn a shared model for prediction, while keeping the training data local. This decentralized approach to train models provides privacy, security, regulatory and economic benefits. In this work, we focus on the statistical challenge of federated learning when local data is non-IID. We first show that the accuracy of federated learning reduces significantly, by up to 55% for neural networks trained for highly skewed non-IID data, where each client device trains only on a single class of data. We further show that this accuracy reduction can be explained by the weight divergence, which can be quantified by the earth mover's distance (EMD) between the distribution over classes on each device and the population distribution. As a solution, we propose a strategy to improve training on non-IID data by creating a small subset of data which is globally shared between all the edge devices. Experiments show that accuracy can be increased by 30% for the CIFAR-10 dataset with only 5% globally shared data.  ( 3 min )
    Grounding Visual Representations with Texts for Domain Generalization. (arXiv:2207.10285v1 [cs.CV])
    Reducing the representational discrepancy between source and target domains is a key component to maximize the model generalization. In this work, we advocate for leveraging natural language supervision for the domain generalization task. We introduce two modules to ground visual representations with texts containing typical reasoning of humans: (1) Visual and Textual Joint Embedder and (2) Textual Explanation Generator. The former learns the image-text joint embedding space where we can ground high-level class-discriminative information into the model. The latter leverages an explainable model and generates explanations justifying the rationale behind its decision. To the best of our knowledge, this is the first work to leverage the vision-and-language cross-modality approach for the domain generalization task. Our experiments with a newly created CUB-DG benchmark dataset demonstrate that cross-modality supervision can be successfully used to ground domain-invariant visual representations and improve the model generalization. Furthermore, in the large-scale DomainBed benchmark, our proposed method achieves state-of-the-art results and ranks 1st in average performance for five multi-domain datasets. The dataset and codes are available at https://github.com/mswzeus/GVRT.  ( 2 min )
    Deep Learning of Radiative Atmospheric Transfer with an Autoencoder. (arXiv:2207.10650v1 [physics.comp-ph])
    As electro-optical energy from the sun propagates through the atmosphere it is affected by radiative transfer effects including absorption, emission, and scattering. Modeling these affects is essential for scientific remote sensing measurements of the earth and atmosphere. For example, hyperspectral imagery is a form of digital imagery collected with many, often hundreds, of wavelengths of light in pixel. The amount of light measured at the sensor is the result of emitted sunlight, atmospheric radiative transfer, and the reflectance off the materials on the ground, all of which vary per wavelength resulting from multiple physical phenomena. Therefore measurements of the ground spectra or atmospheric constituents requires separating these different contributions per wavelength. In this paper, we create an autoencoder similar to denoising autoencoders treating the atmospheric affects as 'noise' and ground reflectance as truth per spectrum. We generate hundreds of thousands of training samples by taking random samples of spectra from laboratory measurements and adding atmospheric affects using physics-based modelling via MODTRAN (this http URL) by varying atmospheric inputs. This process ideally could create an autoencoder that would separate atmospheric effects and ground reflectance in hyperspectral imagery, a process called atmospheric compensation which is difficult and time-consuming requiring a combination of heuristic approximations, estimates of physical quantities, and physical modelling. While the accuracy of our method is not as good as other methods in the field, this an important first step in applying the growing field of deep learning of physical principles to atmospheric compensation in hyperspectral imagery and remote sensing.  ( 3 min )
    Classification of Macromolecule Type Based on Sequences of Amino Acids Using Deep Learning. (arXiv:1907.03532v2 [q-bio.BM] UPDATED)
    The classification of amino acids and their sequence analysis plays a vital role in life sciences and is a challenging task. This article uses and compares state-of-the-art deep learning models like convolution neural networks (CNN), long short-term memory (LSTM), and gated recurrent units (GRU) to solve macromolecule classification problems using amino acids. These models have efficient frameworks for solving a broad spectrum of complex learning problems compared to traditional machine learning techniques. We use word embedding to represent the amino acid sequences as vectors. The CNN extracts features from amino acid sequences, which are treated as vectors, then fed to the models mentioned above to train a robust classifier. Our results show that word2vec as embedding combined with VGG-16 performs better than LSTM and GRU. The proposed approach gets an error rate of 1.5%.  ( 2 min )
    Switching One-Versus-the-Rest Loss to Increase the Margin of Logits for Adversarial Robustness. (arXiv:2207.10283v1 [cs.LG])
    Defending deep neural networks against adversarial examples is a key challenge for AI safety. To improve the robustness effectively, recent methods focus on important data points near the decision boundary in adversarial training. However, these methods are vulnerable to Auto-Attack, which is an ensemble of parameter-free attacks for reliable evaluation. In this paper, we experimentally investigate the causes of their vulnerability and find that existing methods reduce margins between logits for the true label and the other labels while keeping their gradient norms non-small values. Reduced margins and non-small gradient norms cause their vulnerability since the largest logit can be easily flipped by the perturbation. Our experiments also show that the histogram of the logit margins has two peaks, i.e., small and large logit margins. From the observations, we propose switching one-versus-the-rest loss (SOVR), which uses one-versus-the-rest loss when data have small logit margins so that it increases the margins. We find that SOVR increases logit margins more than existing methods while keeping gradient norms small and outperforms them in terms of the robustness against Auto-Attack.  ( 2 min )
    Unsupervised pre-training of graph transformers on patient population graphs. (arXiv:2207.10603v1 [cs.LG])
    Pre-training has shown success in different areas of machine learning, such as Computer Vision, Natural Language Processing (NLP), and medical imaging. However, it has not been fully explored for clinical data analysis. An immense amount of clinical records are recorded, but still, data and labels can be scarce for data collected in small hospitals or dealing with rare diseases. In such scenarios, pre-training on a larger set of unlabelled clinical data could improve performance. In this paper, we propose novel unsupervised pre-training techniques designed for heterogeneous, multi-modal clinical data for patient outcome prediction inspired by masked language modeling (MLM), by leveraging graph deep learning over population graphs. To this end, we further propose a graph-transformer-based network, designed to handle heterogeneous clinical data. By combining masking-based pre-training with a transformer-based network, we translate the success of masking-based pre-training in other domains to heterogeneous clinical data. We show the benefit of our pre-training method in a self-supervised and a transfer learning setting, utilizing three medical datasets TADPOLE, MIMIC-III, and a Sepsis Prediction Dataset. We find that our proposed pre-training methods help in modeling the data at a patient and population level and improve performance in different fine-tuning tasks on all datasets.  ( 2 min )
    Neural Network Learning of Chemical Bond Representations in Spectral Indices and Features. (arXiv:2207.10530v1 [cs.CV])
    In this paper we investigate neural networks for classification in hyperspectral imaging with a focus on connecting the architecture of the network with the physics of the sensing and materials present. Spectroscopy is the process of measuring light reflected or emitted by a material as a function wavelength. Molecular bonds present in the material have vibrational frequencies which affect the amount of light measured at each wavelength. Thus the measured spectrum contains information about the particular chemical constituents and types of bonds. For example, chlorophyll reflects more light in the near-IR rage (800-900nm) than in the red (625-675nm) range, and this difference can be measured using a normalized vegetation difference index (NDVI), which is commonly used to detect vegetation presence, health, and type in imagery collected at these wavelengths. In this paper we show that the weights in a Neural Network trained on different vegetation classes learn to measure this difference in reflectance. We then show that a Neural Network trained on a more complex set of ten different polymer materials will learn spectral 'features' evident in the weights for the network, and these features can be used to reliably distinguish between the different types of polymers. Examination of the weights provides a human-interpretable understanding of the network.  ( 3 min )
    A Forgotten Danger in DNN Supervision Testing: Generating and Detecting True Ambiguity. (arXiv:2207.10495v1 [cs.SE])
    Deep Neural Networks (DNNs) are becoming a crucial component of modern software systems, but they are prone to fail under conditions that are different from the ones observed during training (out-of-distribution inputs) or on inputs that are truly ambiguous, i.e., inputs that admit multiple classes with nonzero probability in their ground truth labels. Recent work proposed DNN supervisors to detect high-uncertainty inputs before their possible misclassification leads to any harm. To test and compare the capabilities of DNN supervisors, researchers proposed test generation techniques, to focus the testing effort on high-uncertainty inputs that should be recognized as anomalous by supervisors. However, existing test generators can only produce out-of-distribution inputs. No existing model- and supervisor-independent technique supports the generation of truly ambiguous test inputs. In this paper, we propose a novel way to generate ambiguous inputs to test DNN supervisors and used it to empirically compare several existing supervisor techniques. In particular, we propose AmbiGuess to generate ambiguous samples for image classification problems. AmbiGuess is based on gradient-guided sampling in the latent space of a regularized adversarial autoencoder. Moreover, we conducted what is - to the best of our knowledge - the most extensive comparative study of DNN supervisors, considering their capabilities to detect 4 distinct types of high-uncertainty inputs, including truly ambiguous ones.  ( 2 min )
    Error Compensation Framework for Flow-Guided Video Inpainting. (arXiv:2207.10391v1 [cs.CV])
    The key to video inpainting is to use correlation information from as many reference frames as possible. Existing flow-based propagation methods split the video synthesis process into multiple steps: flow completion -> pixel propagation -> synthesis. However, there is a significant drawback that the errors in each step continue to accumulate and amplify in the next step. To this end, we propose an Error Compensation Framework for Flow-guided Video Inpainting (ECFVI), which takes advantage of the flow-based method and offsets its weaknesses. We address the weakness with the newly designed flow completion module and the error compensation network that exploits the error guidance map. Our approach greatly improves the temporal consistency and the visual quality of the completed videos. Experimental results show the superior performance of our proposed method with the speed up of x6, compared to the state-of-the-art methods. In addition, we present a new benchmark dataset for evaluation by supplementing the weaknesses of existing test datasets.  ( 2 min )
    Metropolis Monte Carlo sampling: convergence, localization transition and optimality. (arXiv:2207.10488v1 [cond-mat.stat-mech])
    Among random sampling methods, Markov Chain Monte Carlo algorithms are foremost. Using a combination of analytical and numerical approaches, we study their convergence properties towards the steady state, within a random walk Metropolis scheme. We show that the deviations from the target steady-state distribution feature a localization transition as a function of the characteristic length of the attempted jumps defining the random walk. This transition changes drastically the error which is introduced by incomplete convergence, and discriminates two regimes where the relaxation mechanism is limited respectively by diffusion and by rejection.  ( 2 min )
    A Dynamical Systems Algorithm for Clustering in Hyperspectral Imagery. (arXiv:2207.10625v1 [cs.CV])
    In this paper we present a new dynamical systems algorithm for clustering in hyperspectral images. The main idea of the algorithm is that data points are \`pushed\' in the direction of increasing density and groups of pixels that end up in the same dense regions belong to the same class. This is essentially a numerical solution of the differential equation defined by the gradient of the density of data points on the data manifold. The number of classes is automated and the resulting clustering can be extremely accurate. In addition to providing a accurate clustering, this algorithm presents a new tool for understanding hyperspectral data in high dimensions. We evaluate the algorithm on the Urban (Available at www.tec.ary.mil/Hypercube/) scene comparing performance against the k-means algorithm using pre-identified classes of materials as ground truth.  ( 2 min )
    CTL-MTNet: A Novel CapsNet and Transfer Learning-Based Mixed Task Net for the Single-Corpus and Cross-Corpus Speech Emotion Recognition. (arXiv:2207.10644v1 [cs.CL])
    Speech Emotion Recognition (SER) has become a growing focus of research in human-computer interaction. An essential challenge in SER is to extract common attributes from different speakers or languages, especially when a specific source corpus has to be trained to recognize the unknown data coming from another speech corpus. To address this challenge, a Capsule Network (CapsNet) and Transfer Learning based Mixed Task Net (CTLMTNet) are proposed to deal with both the singlecorpus and cross-corpus SER tasks simultaneously in this paper. For the single-corpus task, the combination of Convolution-Pooling and Attention CapsNet module CPAC) is designed by embedding the self-attention mechanism to the CapsNet, guiding the module to focus on the important features that can be fed into different capsules. The extracted high-level features by CPAC provide sufficient discriminative ability. Furthermore, to handle the cross-corpus task, CTL-MTNet employs a Corpus Adaptation Adversarial Module (CAAM) by combining CPAC with Margin Disparity Discrepancy (MDD), which can learn the domain-invariant emotion representations through extracting the strong emotion commonness. Experiments including ablation studies and visualizations on both singleand cross-corpus tasks using four well-known SER datasets in different languages are conducted for performance evaluation and comparison. The results indicate that in both tasks the CTL-MTNet showed better performance in all cases compared to a number of state-of-the-art methods. The source code and the supplementary materials are available at: https://github.com/MLDMXM2017/CTLMTNet  ( 3 min )
    RepFair-GAN: Mitigating Representation Bias in GANs Using Gradient Clipping. (arXiv:2207.10653v1 [cs.LG])
    Fairness has become an essential problem in many domains of Machine Learning (ML), such as classification, natural language processing, and Generative Adversarial Networks (GANs). In this research effort, we study the unfairness of GANs. We formally define a new fairness notion for generative models in terms of the distribution of generated samples sharing the same protected attributes (gender, race, etc.). The defined fairness notion (representational fairness) requires the distribution of the sensitive attributes at the test time to be uniform, and, in particular for GAN model, we show that this fairness notion is violated even when the dataset contains equally represented groups, i.e., the generator favors generating one group of samples over the others at the test time. In this work, we shed light on the source of this representation bias in GANs along with a straightforward method to overcome this problem. We first show on two widely used datasets (MNIST, SVHN) that when the norm of the gradient of one group is more important than the other during the discriminator's training, the generator favours sampling data from one group more than the other at test time. We then show that controlling the groups' gradient norm by performing group-wise gradient norm clipping in the discriminator during the training leads to a more fair data generation in terms of representational fairness compared to existing models while preserving the quality of generated samples.  ( 3 min )
    The MABe22 Benchmarks for Representation Learning of Multi-Agent Behavior. (arXiv:2207.10553v1 [cs.LG])
    Real-world behavior is often shaped by complex interactions between multiple agents. To scalably study multi-agent behavior, advances in unsupervised and self-supervised learning have enabled a variety of different behavioral representations to be learned from trajectory data. To date, there does not exist a unified set of benchmarks that can enable comparing methods quantitatively and systematically across a broad set of behavior analysis settings. We aim to address this by introducing a large-scale, multi-agent trajectory dataset from real-world behavioral neuroscience experiments that covers a range of behavior analysis tasks. Our dataset consists of trajectory data from common model organisms, with 9.6 million frames of mouse data and 4.4 million frames of fly data, in a variety of experimental settings, such as different strains, lengths of interaction, and optogenetic stimulation. A subset of the frames also consist of expert-annotated behavior labels. Improvements on our dataset corresponds to behavioral representations that work across multiple organisms and is able to capture differences for common behavior analysis tasks.  ( 2 min )
    Estimation of Non-Crossing Quantile Regression Process with Deep ReQU Neural Networks. (arXiv:2207.10442v1 [stat.ML])
    We propose a penalized nonparametric approach to estimating the quantile regression process (QRP) in a nonseparable model using rectifier quadratic unit (ReQU) activated deep neural networks and introduce a novel penalty function to enforce non-crossing of quantile regression curves. We establish the non-asymptotic excess risk bounds for the estimated QRP and derive the mean integrated squared error for the estimated QRP under mild smoothness and regularity conditions. To establish these non-asymptotic risk and estimation error bounds, we also develop a new error bound for approximating $C^s$ smooth functions with $s >0$ and their derivatives using ReQU activated neural networks. This is a new approximation result for ReQU networks and is of independent interest and may be useful in other problems. Our numerical experiments demonstrate that the proposed method is competitive with or outperforms two existing methods, including methods using reproducing kernels and random forests, for nonparametric quantile regression.  ( 2 min )
    Deep Learning Reveals Patterns of Diverse and Changing Sentiments Towards COVID-19 Vaccines Based on 11 Million Tweets. (arXiv:2207.10641v1 [cs.CL])
    Over 12 billion doses of COVID-19 vaccines have been administered at the time of writing. However, public perceptions of vaccines have been complex. We analyzed COVID-19 vaccine-related tweets to understand the evolving perceptions of COVID-19 vaccines. We finetuned a deep learning classifier using a state-of-the-art model, XLNet, to detect each tweet's sentiment automatically. We employed validated methods to extract the users' race or ethnicity, gender, age, and geographical locations from user profiles. Incorporating multiple data sources, we assessed the sentiment patterns among subpopulations and juxtaposed them against vaccine uptake data to unravel their interactive patterns. 11,211,672 COVID-19 vaccine-related tweets corresponding to 2,203,681 users over two years were analyzed. The finetuned model for sentiment classification yielded an accuracy of 0.92 on testing set. Users from various demographic groups demonstrated distinct patterns in sentiments towards COVID-19 vaccines. User sentiments became more positive over time, upon which we observed subsequent upswing in the population-level vaccine uptake. Surrounding dates where positive sentiments crest, we detected encouraging news or events regarding vaccine development and distribution. Positive sentiments in pregnancy-related tweets demonstrated a delayed pattern compared with trends in general population, with postponed vaccine uptake trends. Distinctive patterns across subpopulations suggest the need of tailored strategies. Global news and events profoundly involved in shaping users' thoughts on social media. Populations with additional concerns, such as pregnancy, demonstrated more substantial hesitancy since lack of timely recommendations. Feature analysis revealed hesitancies of various subpopulations stemmed from clinical trial logics, risks and complications, and urgency of scientific evidence.  ( 3 min )
    Deep Audio Waveform Prior. (arXiv:2207.10441v1 [cs.SD])
    Convolutional neural networks contain strong priors for generating natural looking images [1]. These priors enable image denoising, super resolution, and inpainting in an unsupervised manner. Previous attempts to demonstrate similar ideas in audio, namely deep audio priors, (i) use hand picked architectures such as harmonic convolutions, (ii) only work with spectrogram input, and (iii) have been used mostly for eliminating Gaussian noise [2]. In this work we show that existing SOTA architectures for audio source separation contain deep priors even when working with the raw waveform. Deep priors can be discovered by training a neural network to generate a single corrupted signal when given white noise as input. A network with relevant deep priors is likely to generate a cleaner version of the signal before converging on the corrupted signal. We demonstrate this restoration effect with several corruptions: background noise, reverberations, and a gap in the signal (audio inpainting).  ( 2 min )
    Continual-Learning-as-a-Service (CLaaS): On-Demand Efficient Adaptation of Predictive Models. (arXiv:2206.06957v2 [cs.LG] UPDATED)
    Predictive machine learning models nowadays are often updated in a stateless and expensive way. The two main future trends for companies that want to build machine learning-based applications and systems are real-time inference and continual updating. Unfortunately, both trends require a mature infrastructure that is hard and costly to realize on-premise. This paper defines a novel software service and model delivery infrastructure termed Continual Learning-as-a-Service (CLaaS) to address these issues. Specifically, it embraces continual machine learning and continuous integration techniques. It provides support for model updating and validation tools for data scientists without an on-premise solution and in an efficient, stateful and easy-to-use manner. Finally, this CL model service is easy to encapsulate in any machine learning infrastructure or cloud system. This paper presents the design and implementation of a CLaaS instantiation, called LiquidBrain, evaluated in two real-world scenarios. The former is a robotic object recognition setting using the CORe50 dataset while the latter is a named category and attribute prediction using the DeepFashion-C dataset in the fashion domain. Our preliminary results suggest the usability and efficiency of the Continual Learning model services and the effectiveness of the solution in addressing real-world use-cases regardless of where the computation happens in the continuum Edge-Cloud.  ( 3 min )
    Log Barriers for Safe Black-box Optimization with Application to Safe Reinforcement Learning. (arXiv:2207.10415v1 [math.OC])
    Optimizing noisy functions online, when evaluating the objective requires experiments on a deployed system, is a crucial task arising in manufacturing, robotics and many others. Often, constraints on safe inputs are unknown ahead of time, and we only obtain noisy information, indicating how close we are to violating the constraints. Yet, safety must be guaranteed at all times, not only for the final output of the algorithm. We introduce a general approach for seeking a stationary point in high dimensional non-linear stochastic optimization problems in which maintaining safety during learning is crucial. Our approach called LB-SGD is based on applying stochastic gradient descent (SGD) with a carefully chosen adaptive step size to a logarithmic barrier approximation of the original problem. We provide a complete convergence analysis of non-convex, convex, and strongly-convex smooth constrained problems, with first-order and zeroth-order feedback. Our approach yields efficient updates and scales better with dimensionality compared to existing approaches. We empirically compare the sample complexity and the computational cost of our method with existing safe learning approaches. Beyond synthetic benchmarks, we demonstrate the effectiveness of our approach on minimizing constraint violation in policy search tasks in safe reinforcement learning (RL).  ( 2 min )
    Personalized Prediction of Future Lesion Activity and Treatment Effect in Multiple Sclerosis from Baseline MRI. (arXiv:2204.01702v3 [eess.IV] UPDATED)
    Precision medicine for chronic diseases such as multiple sclerosis (MS) involves choosing a treatment which best balances efficacy and side effects/preferences for individual patients. Making this choice as early as possible is important, as delays in finding an effective therapy can lead to irreversible disability accrual. To this end, we present the first deep neural network model for individualized treatment decisions from baseline magnetic resonance imaging (MRI) (with clinical information if available) for MS patients. Our model (a) predicts future new and enlarging T2 weighted (NE-T2) lesion counts on follow-up MRI on multiple treatments and (b) estimates the conditional average treatment effect (CATE), as defined by the predicted future suppression of NE-T2 lesions, between different treatment options relative to placebo. Our model is validated on a proprietary federated dataset of 1817 multi-sequence MRIs acquired from MS patients during four multi-centre randomized clinical trials. Our framework achieves high average precision in the binarized regression of future NE-T2 lesions on five different treatments, identifies heterogeneous treatment effects, and provides a personalized treatment recommendation that accounts for treatment-associated risk (e.g. side effects, patient preference, administration difficulties).  ( 3 min )
    Identifying partial mouse brain microscopy images from Allen reference atlas using a contrastively learned semantic space. (arXiv:2109.06662v3 [cs.CV] UPDATED)
    Precise identification of mouse brain microscopy images is a crucial first step when anatomical structures in the mouse brain are to be registered to a reference atlas. Practitioners usually rely on manual comparison of images or tools that assume the presence of complete images. This work explores Siamese Networks as the method for finding corresponding 2D reference atlas plates for given partial 2D mouse brain images. Siamese networks are a class of convolutional neural networks (CNNs) that use weight-shared paths to obtain low dimensional embeddings of pairs of input images. The correspondence between the partial mouse brain image and reference atlas plate is determined based on the distance between low dimensional embeddings of brain slices and atlas plates that are obtained from Siamese networks using contrastive learning. Experiments showed that Siamese CNNs can precisely identify brain slices using the Allen mouse brain atlas when training and testing images come from the same source. They achieved TOP-1 and TOP-5 accuracy of 25% and 100%, respectively, taking only 7.2 seconds to identify 29 images.  ( 3 min )
    Neural Network Guided Evolutionary Fuzzing for Finding Traffic Violations of Autonomous Vehicles. (arXiv:2109.06126v4 [cs.SE] UPDATED)
    Self-driving cars and trucks, autonomous vehicles (AVs), should not be accepted by regulatory bodies and the public until they have much higher confidence in their safety and reliability -- which can most practically and convincingly be achieved by testing. But existing testing methods are inadequate for checking the end-to-end behaviors of AV controllers against complex, real-world corner cases involving interactions with multiple independent agents such as pedestrians and human-driven vehicles. While test-driving AVs on streets and highways fails to capture many rare events, existing simulation-based testing methods mainly focus on simple scenarios and do not scale well for complex driving situations that require sophisticated awareness of the surroundings. To address these limitations, we propose a new fuzz testing technique, called AutoFuzz, which can leverage widely-used AV simulators' API grammars to generate semantically and temporally valid complex driving scenarios (sequences of scenes). To efficiently search for traffic violations-inducing scenarios in a large search space, we propose a constrained neural network (NN) evolutionary search method to optimize AutoFuzz. Evaluation of our prototype on one state-of-the-art learning-based controller, two rule-based controllers, and one industrial-grade controller in five scenarios shows that AutoFuzz efficiently finds hundreds of traffic violations in high-fidelity simulation environments. For each scenario, AutoFuzz can find on average 10-39% more unique traffic violations than the best-performing baseline method. Further, fine-tuning the learning-based controller with the traffic violations found by AutoFuzz successfully reduced the traffic violations found in the new version of the AV controller software.  ( 3 min )
    How Well Does Self-Supervised Pre-Training Perform with Streaming Data?. (arXiv:2104.12081v3 [cs.LG] UPDATED)
    Prior works on self-supervised pre-training focus on the joint training scenario, where massive unlabeled data are assumed to be given as input all at once, and only then is a learner trained. Unfortunately, such a problem setting is often impractical if not infeasible since many real-world tasks rely on sequential learning, e.g., data are decentralized or collected in a streaming fashion. In this paper, we conduct the first thorough and dedicated investigation on self-supervised pre-training with streaming data, aiming to shed light on the model behavior under this overlooked setup. Specifically, we pre-train over 500 models on four categories of pre-training streaming data from ImageNet and DomainNet and evaluate them on three types of downstream tasks and 12 different downstream datasets. Our studies show that, somehow beyond our expectation, with simple data replay or parameter regularization, sequential self-supervised pre-training turns out to be an efficient alternative for joint pre-training, as the performances of the former are mostly on par with those of the latter. Moreover, catastrophic forgetting, a common issue in sequential supervised learning, is much alleviated in sequential self-supervised learning (SSL), which is well justified through our comprehensive empirical analysis on representations and the sharpness of minima in the loss landscape. Our findings, therefore, suggest that, in practice, for SSL, the cumbersome joint training can be replaced mainly by sequential learning, which in turn enables a much broader spectrum of potential application scenarios.  ( 3 min )
    AutoIP: A United Framework to Integrate Physics into Gaussian Processes. (arXiv:2202.12316v2 [cs.LG] UPDATED)
    Physical modeling is critical for many modern science and engineering applications. From a data science or machine learning perspective, where more domain-agnostic, data-driven models are pervasive, physical knowledge -- often expressed as differential equations -- is valuable in that it is complementary to data, and it can potentially help overcome issues such as data sparsity, noise, and inaccuracy. In this work, we propose a simple, yet powerful and general framework -- AutoIP, for Automatically Incorporating Physics -- that can integrate all kinds of differential equations into Gaussian Processes (GPs) to enhance prediction accuracy and uncertainty quantification. These equations can be linear or nonlinear, spatial, temporal, or spatio-temporal, complete or incomplete with unknown source terms, and so on. Based on kernel differentiation, we construct a GP prior to sample the values of the target function, equation-related derivatives, and latent source functions, which are all jointly from a multivariate Gaussian distribution. The sampled values are fed to two likelihoods: one to fit the observations, and the other to conform to the equation. We use the whitening method to evade the strong dependency between the sampled function values and kernel parameters, and we develop a stochastic variational learning algorithm. AutoIP shows improvement upon vanilla GPs in both simulation and several real-world applications, even using rough, incomplete equations.  ( 3 min )
    Provable concept learning for interpretable predictions using variational autoencoders. (arXiv:2204.00492v2 [cs.LG] UPDATED)
    In safety-critical applications, practitioners are reluctant to trust neural networks when no interpretable explanations are available. Many attempts to provide such explanations revolve around pixel-based attributions or use previously known concepts. In this paper we aim to provide explanations by provably identifying \emph{high-level, previously unknown ground-truth concepts}. To this end, we propose a probabilistic modeling framework to derive (C)oncept (L)earning and (P)rediction (CLAP) -- a VAE-based classifier that uses visually interpretable concepts as predictors for a simple classifier. Assuming a generative model for the ground-truth concepts, we prove that CLAP is able to identify them while attaining optimal classification accuracy. Our experiments on synthetic datasets verify that CLAP identifies distinct ground-truth concepts on synthetic datasets and yields promising results on the medical Chest X-Ray dataset.  ( 2 min )
    Combining Intra-Risk and Contagion Risk for Enterprise Bankruptcy Prediction Using Graph Neural Networks. (arXiv:2202.03874v4 [q-fin.RM] UPDATED)
    Predicting the bankruptcy risk of small and medium-sized enterprises (SMEs) is an important step for financial institutions when making decisions about loans. Existing studies in both finance and AI research fields, however, tend to only consider either the intra-risk or contagion risk of enterprises, ignoring their interactions and combinatorial effects. This study for the first time considers both types of risk and their joint effects in bankruptcy prediction. Specifically, we first propose an enterprise intra-risk encoder based on statistically significant enterprise risk indicators for its intra-risk learning. Then, we propose an enterprise contagion risk encoder based on enterprise relation information from an enterprise knowledge graph for its contagion risk embedding. In particular, the contagion risk encoder includes both the newly proposed Hyper-Graph Neural Networks and Heterogeneous Graph Neural Networks, which can model contagion risk in two different aspects, i.e. common risk factors based on hyperedges and direct diffusion risk from neighbors, respectively. To evaluate the model, we collect real-world multi-sources data on SMEs and build a novel benchmark dataset called SMEsD. We provide open access to the dataset, which is expected to further promote research on financial risk analysis. Experiments on SMEsD against twelve state-of-the-art baselines demonstrate the effectiveness of the proposed model for bankruptcy prediction.  ( 3 min )
    ViewFormer: NeRF-free Neural Rendering from Few Images Using Transformers. (arXiv:2203.10157v2 [cs.CV] UPDATED)
    Novel view synthesis is a long-standing problem. In this work, we consider a variant of the problem where we are given only a few context views sparsely covering a scene or an object. The goal is to predict novel viewpoints in the scene, which requires learning priors. The current state of the art is based on Neural Radiance Field (NeRF), and while achieving impressive results, the methods suffer from long training times as they require evaluating millions of 3D point samples via a neural network for each image. We propose a 2D-only method that maps multiple context views and a query pose to a new image in a single pass of a neural network. Our model uses a two-stage architecture consisting of a codebook and a transformer model. The codebook is used to embed individual images into a smaller latent space, and the transformer solves the view synthesis task in this more compact space. To train our model efficiently, we introduce a novel branching attention mechanism that allows us to use the same model not only for neural rendering but also for camera pose estimation. Experimental results on real-world scenes show that our approach is competitive compared to NeRF-based methods while not reasoning explicitly in 3D, and it is faster to train.  ( 3 min )
    Knee arthritis severity measurement using deep learning: a publicly available algorithm with a multi-institutional validation showing radiologist-level performance. (arXiv:2203.08914v2 [eess.IV] UPDATED)
    The assessment of knee osteoarthritis (KOA) severity on knee X-rays is a central criteria for the use of total knee arthroplasty. However, this assessment suffers from imprecise standards and a remarkably high inter-reader variability. An algorithmic, automated assessment of KOA severity could improve overall outcomes of knee replacement procedures by increasing the appropriateness of its use. We propose a novel deep learning-based five-step algorithm to automatically grade KOA from posterior-anterior (PA) views of radiographs: (1) image preprocessing (2) localization of knees joints in the image using the YOLO v3-Tiny model, (3) initial assessment of the severity of osteoarthritis using a convolutional neural network-based classifier, (4) segmentation of the joints and calculation of the joint space narrowing (JSN), and (5), a combination of the JSN and the initial assessment to determine a final Kellgren-Lawrence (KL) score. Furthermore, by displaying the segmentation masks used to make the assessment, our algorithm demonstrates a higher degree of transparency compared to typical "black box" deep learning classifiers. We perform a comprehensive evaluation using two public datasets and one dataset from our institution, and show that our algorithm reaches state-of-the art performance. Moreover, we also collected ratings from multiple radiologists at our institution and showed that our algorithm performs at the radiologist level. The software has been made publicly available at https://github.com/MaciejMazurowski/osteoarthritis-classification.  ( 3 min )
    Generative Adversarial Networks for Labeled Acceleration Data Augmentation for Structural Damage Detection. (arXiv:2112.03478v5 [cs.LG] UPDATED)
    There has been a major advance in the field of Data Science in the last few decades, and these have been utilized for different engineering disciplines and applications. Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) algorithms have been utilized for civil Structural Health Monitoring (SHM) especially for damage detection applications using sensor data. Although ML and DL methods show superior learning skills for complex data structures, they require plenty of data for training. However, in SHM, data collection from civil structures can be expensive and time taking; particularly getting useful data (damage associated data) can be challenging. The objective of this study is to address the data scarcity problem for damage detection applications. This paper employs 1-D Wasserstein Deep Convolutional Generative Adversarial Networks using Gradient Penalty (1-D WDCGAN-GP) for synthetic labelled acceleration data generation. Then, the generated data is augmented with varying ratios for the training dataset of a 1-D Deep Convolutional Neural Network (1-D DCNN) for damage detection application. The damage detection results show that the 1-D WDCGAN-GP can be successfully utilized to tackle data scarcity in vibration-based damage detection applications of civil structures. Keywords: Structural Health Monitoring (SHM), Structural Damage Detection, 1-D Deep Convolutional Neural Networks (1-D DCNN), 1-D Generative Adversarial Networks (1-D GAN), Wasserstein Generative Adversarial Networks with Gradient Penalty (WGAN-GP)  ( 3 min )
    FLOWGEN: Fast and slow graph generation. (arXiv:2207.07656v2 [cs.LG] UPDATED)
    We present FLOWGEN, a graph-generation model inspired by the dual-process theory of mind that generates large graphs incrementally. Depending on the difficulty of completing the graph at the current step, graph generation is routed to either a fast~(weaker) or a slow~(stronger) model. fast and slow models have identical architectures, but vary in the number of parameters and consequently the strength. Experiments on real-world graphs show that ours can successfully generate graphs similar to those generated by a single large model in a fraction of time.  ( 2 min )
    The Surprising Effectiveness of PPO in Cooperative, Multi-Agent Games. (arXiv:2103.01955v3 [cs.LG] UPDATED)
    Proximal Policy Optimization (PPO) is a ubiquitous on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent settings. This is often due to the belief that PPO is significantly less sample efficient than off-policy methods in multi-agent systems. In this work, we carefully study the performance of PPO in cooperative multi-agent settings. We show that PPO-based multi-agent algorithms achieve surprisingly strong performance in four popular multi-agent testbeds: the particle-world environments, the StarCraft multi-agent challenge, the Hanabi challenge, and Google Research Football, with minimal hyperparameter tuning and without any domain-specific algorithmic modifications or architectures. Importantly, compared to strong off-policy methods, PPO often achieves competitive or superior results in both final rewards and sample efficiency. Finally, through ablation studies, we analyze implementation and hyperparameter factors that are critical to PPO's empirical performance, and give concrete practical suggestions regarding these factors. Our results show that when using these practices, simple PPO-based methods are a strong baseline in cooperative multi-agent reinforcement learning. Source code is released at https://github.com/marlbenchmark/on-policy.  ( 3 min )
    Multilingual Disinformation Detection for Digital Advertising. (arXiv:2207.10649v1 [cs.CL])
    In today's world, the presence of online disinformation and propaganda is more widespread than ever. Independent publishers are funded mostly via digital advertising, which is unfortunately also the case for those publishing disinformation content. The question of how to remove such publishers from advertising inventory has long been ignored, despite the negative impact on the open internet. In this work, we make the first step towards quickly detecting and red-flagging websites that potentially manipulate the public with disinformation. We build a machine learning model based on multilingual text embeddings that first determines whether the page mentions a topic of interest, then estimates the likelihood of the content being malicious, creating a shortlist of publishers that will be reviewed by human experts. Our system empowers internal teams to proactively, rather than defensively, blacklist unsafe content, thus protecting the reputation of the advertisement provider.  ( 2 min )
    Face-to-Face Co-Located Human-Human Social Interaction Analysis using Nonverbal Cues: A Survey. (arXiv:2207.10574v1 [cs.HC])
    This work presents a systematic review of recent efforts (since 2010) aimed at automatic analysis of nonverbal cues displayed in face-to-face co-located human-human social interactions. The main reason for focusing on nonverbal cues is that these are the physical, machine detectable traces of social and psychological phenomena. Therefore, detecting and understanding nonverbal cues means, at least to a certain extent, to detect and understand social and psychological phenomena. The covered topics are categorized into three as: a) modeling social traits, such as leadership, dominance, personality traits, b) social role recognition and social relations detection and c) interaction dynamics analysis in terms of group cohesion, empathy, rapport and so forth. We target the co-located interactions, in which the interactants are always humans. The survey covers a wide spectrum of settings and scenarios, including free-standing interactions, meetings, indoor and outdoor social exchanges, dyadic conversations, and crowd dynamics. For each of them, the survey considers the three main elements of nonverbal cues analysis, namely data, sensing approaches and computational methodologies. The goal is to highlight the main advances of the last decade, to point out existing limitations, and to outline future directions.  ( 2 min )
    Analysis of Regularized Learning for Generalized Data in Banach Spaces. (arXiv:2109.03159v5 [cs.LG] UPDATED)
    In this article, we study the whole theory of regularized learning for generalized data in Banach spaces including representer theorems, approximation theorems, and convergence theorems. The generalized input data are composed of linear functionals in the predual spaces of the Banach spaces to represent the discrete local information of different engineering and physics models. The generalized data and the multi-loss functions are used to compute the empirical risks, and the regularized learning is to minimize the regularized empirical risks over the Banach spaces. Even if the original problems are unknown or unformulated, then the exact solutions of the original problems are approximated globally by the regularized learning. In the proof of the convergence theorems, the strong convergence condition is replaced to the weak convergence condition with the additional checkable condition which is independent of the original problems. The theorems of the regularized learning can be used to solve many problems of machine learning such as support vector machines and neural networks.  ( 3 min )
    Careful What You Wish For: on the Extraction of Adversarially Trained Models. (arXiv:2207.10561v1 [cs.LG])
    Recent attacks on Machine Learning (ML) models such as evasion attacks with adversarial examples and models stealing through extraction attacks pose several security and privacy threats. Prior work proposes to use adversarial training to secure models from adversarial examples that can evade the classification of a model and deteriorate its performance. However, this protection technique affects the model's decision boundary and its prediction probabilities, hence it might raise model privacy risks. In fact, a malicious user using only a query access to the prediction output of a model can extract it and obtain a high-accuracy and high-fidelity surrogate model. To have a greater extraction, these attacks leverage the prediction probabilities of the victim model. Indeed, all previous work on extraction attacks do not take into consideration the changes in the training process for security purposes. In this paper, we propose a framework to assess extraction attacks on adversarially trained models with vision datasets. To the best of our knowledge, our work is the first to perform such evaluation. Through an extensive empirical study, we demonstrate that adversarially trained models are more vulnerable to extraction attacks than models obtained under natural training circumstances. They can achieve up to $\times1.2$ higher accuracy and agreement with a fraction lower than $\times0.75$ of the queries. We additionally find that the adversarial robustness capability is transferable through extraction attacks, i.e., extracted Deep Neural Networks (DNNs) from robust models show an enhanced accuracy to adversarial examples compared to extracted DNNs from naturally trained (i.e. standard) models.  ( 3 min )
    AXM-Net: Implicit Cross-Modal Feature Alignment for Person Re-identification. (arXiv:2101.08238v3 [cs.CV] UPDATED)
    Cross-modal person re-identification (Re-ID) is critical for modern video surveillance systems. The key challenge is to align cross-modality representations induced by the semantic information present for a person and ignore background information. This work presents a novel convolutional neural network (CNN) based architecture designed to learn semantically aligned cross-modal visual and textual representations. The underlying building block, named AXM-Block, is a unified multi-layer network that dynamically exploits the multi-scale knowledge from both modalities and re-calibrates each modality according to shared semantics. To complement the convolutional design, contextual attention is applied in the text branch to manipulate long-term dependencies. Moreover, we propose a unique design to enhance visual part-based feature coherence and locality information. Our framework is novel in its ability to implicitly learn aligned semantics between modalities during the feature learning stage. The unified feature learning effectively utilizes textual data as a super-annotation signal for visual representation learning and automatically rejects irrelevant information. The entire AXM-Net is trained end-to-end on CUHK-PEDES data. We report results on two tasks, person search and cross-modal Re-ID. The AXM-Net outperforms the current state-of-the-art (SOTA) methods and achieves 64.44\% Rank@1 on the CUHK-PEDES test set. It also outperforms its competitors by $>$10\% in cross-viewpoint text-to-image Re-ID scenarios on CrossRe-ID and CUHK-SYSU datasets.  ( 3 min )
    Encrypted Internet traffic classification using a supervised Spiking Neural Network. (arXiv:2101.09818v2 [cs.LG] UPDATED)
    Internet traffic recognition is an essential tool for access providers since recognizing traffic categories related to different data packets transmitted on a network help them define adapted priorities. That means, for instance, high priority requirements for an audio conference and low ones for a file transfer, to enhance user experience. As internet traffic becomes increasingly encrypted, the mainstream classic traffic recognition technique, payload inspection, is rendered ineffective. This paper uses machine learning techniques for encrypted traffic classification, looking only at packet size and time of arrival. Spiking neural networks (SNN), largely inspired by how biological neurons operate, were used for two reasons. Firstly, they are able to recognize time-related data packet features. Secondly, they can be implemented efficiently on neuromorphic hardware with a low energy footprint. Here we used a very simple feedforward SNN, with only one fully-connected hidden layer, and trained in a supervised manner using the newly introduced method known as Surrogate Gradient Learning. Surprisingly, such a simple SNN reached an accuracy of 95.9% on ISCX datasets, outperforming previous approaches. Besides better accuracy, there is also a very significant improvement on simplicity: input size, number of neurons, trainable parameters are all reduced by one to four orders of magnitude. Next, we analyzed the reasons for this good accuracy. It turns out that, beyond spatial (i.e. packet size) features, the SNN also exploits temporal ones, mostly the nearly synchronous (within a 200ms range) arrival times of packets with certain sizes. Taken together, these results show that SNNs are an excellent fit for encrypted internet traffic classification: they can be more accurate than conventional artificial neural networks (ANN), and they could be implemented efficiently on low power embedded systems.  ( 3 min )
    Subgraph Matching via Query-Conditioned Subgraph Matching Neural Networks and Bi-Level Tree Search. (arXiv:2207.10305v1 [cs.LG])
    Recent advances have shown the success of using reinforcement learning and search to solve NP-hard graph-related tasks, such as Traveling Salesman Optimization, Graph Edit Distance computation, etc. However, it remains unclear how one can efficiently and accurately detect the occurrences of a small query graph in a large target graph, which is a core operation in graph database search, biomedical analysis, social group finding, etc. This task is called Subgraph Matching which essentially performs subgraph isomorphism check between a query graph and a large target graph. One promising approach to this classical problem is the "learning-to-search" paradigm, where a reinforcement learning (RL) agent is designed with a learned policy to guide a search algorithm to quickly find the solution without any solved instances for supervision. However, for the specific task of Subgraph Matching, though the query graph is usually small given by the user as input, the target graph is often orders-of-magnitude larger. It poses challenges to the neural network design and can lead to solution and reward sparsity. In this paper, we propose N-BLS with two innovations to tackle the challenges: (1) A novel encoder-decoder neural network architecture to dynamically compute the matching information between the query and the target graphs at each search state; (2) A Monte Carlo Tree Search enhanced bi-level search framework for training the policy and value networks. Experiments on five large real-world target graphs show that N-BLS can significantly improve the subgraph matching performance.  ( 3 min )
    A comprehensive study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks. (arXiv:2207.10289v1 [physics.comp-ph])
    Physics-informed neural networks (PINNs) have shown to be an effective tool for solving forward and inverse problems of partial differential equations (PDEs). PINNs embed the PDEs into the loss of the neural network, and this PDE loss is evaluated at a set of scattered residual points. The distribution of these points are highly important to the performance of PINNs. However, in the existing studies on PINNs, only a few simple residual point sampling methods have mainly been used. Here, we present a comprehensive study of two categories of sampling: non-adaptive uniform sampling and adaptive nonuniform sampling. We consider six uniform sampling, including (1) equispaced uniform grid, (2) uniformly random sampling, (3) Latin hypercube sampling, (4) Halton sequence, (5) Hammersley sequence, and (6) Sobol sequence. We also consider a resampling strategy for uniform sampling. To improve the sampling efficiency and the accuracy of PINNs, we propose two new residual-based adaptive sampling methods: residual-based adaptive distribution (RAD) and residual-based adaptive refinement with distribution (RAR-D), which dynamically improve the distribution of residual points based on the PDE residuals during training. Hence, we have considered a total of 10 different sampling methods, including six non-adaptive uniform sampling, uniform sampling with resampling, two proposed adaptive sampling, and an existing adaptive sampling. We extensively tested the performance of these sampling methods for four forward problems and two inverse problems in many setups. Our numerical results presented in this study are summarized from more than 6000 simulations of PINNs. We show that the proposed adaptive sampling methods of RAD and RAR-D significantly improve the accuracy of PINNs with fewer residual points. The results obtained in this study can also be used as a practical guideline in choosing sampling methods.  ( 3 min )
    ProMix: Combating Label Noise via Maximizing Clean Sample Utility. (arXiv:2207.10276v1 [cs.LG])
    The ability to train deep neural networks under label noise is appealing, as imperfectly annotated data are relatively cheaper to obtain. State-of-the-art approaches are based on semi-supervised learning(SSL), which selects small loss examples as clean and then applies SSL techniques for boosted performance. However, the selection step mostly provides a medium-sized and decent-enough clean subset, which overlooks a rich set of clean samples. In this work, we propose a novel noisy label learning framework ProMix that attempts to maximize the utility of clean samples for boosted performance. Key to our method, we propose a matched high-confidence selection technique that selects those examples having high confidence and matched prediction with its given labels. Combining with the small-loss selection, our method is able to achieve a precision of 99.27 and a recall of 98.22 in detecting clean samples on the CIFAR-10N dataset. Based on such a large set of clean data, ProMix improves the best baseline method by +2.67% on CIFAR-10N and +1.61% on CIFAR-100N datasets. The code and data are available at https://github.com/Justherozen/ProMix  ( 2 min )
    Improved Generative Model for Weakly Supervised Chest Anomaly Localization via Pseudo-paired Registration with Bilaterally Symmetrical Data Augmentation. (arXiv:2207.10324v1 [eess.IV])
    Image translation based on a generative adversarial network (GAN-IT) is a promising method for precise localization of abnormal regions in chest X-ray images (AL-CXR). However, heterogeneous unpaired datasets undermine existing methods to extract key features and distinguish normal from abnormal cases, resulting in inaccurate and unstable AL-CXR. To address this problem, we propose an improved two-stage GAN-IT involving registration and data augmentation. For the first stage, we introduce an invertible deep-learning-based registration technique that virtually and reasonably converts unpaired data into paired data for learning registration maps. This novel approach achieves high registration performance. For the second stage, we apply data augmentation to diversify anomaly locations by swapping the left and right lung regions on the uniform registered frames, further improving the performance by alleviating imbalance in data distribution showing left and right lung lesions. Our method is intended for application to existing GAN-IT models, allowing existing architecture to benefit from key features for translation. By showing that the AL-CXR performance is uniformly improved when applying the proposed method, we believe that GAN-IT for AL-CXR can be deployed in clinical environments, even if learning data are scarce.  ( 3 min )
    Action2Score: An Embedding Approach To Score Player Action. (arXiv:2207.10297v1 [cs.LG])
    Multiplayer Online Battle Arena (MOBA) is one of the most successful game genres. MOBA games such as League of Legends have competitive environments where players race for their rank. In most MOBA games, a player's rank is determined by the match result (win or lose). It seems natural because of the nature of team play, but in some sense, it is unfair because the players who put a lot of effort lose their rank just in case of loss and some players even get free-ride on teammates' efforts in case of a win. To reduce the side-effects of the team-based ranking system and evaluate a player's performance impartially, we propose a novel embedding model that converts a player's actions into quantitative scores based on the actions' respective contribution to the team's victory. Our model is built using a sequence-based deep learning model with a novel loss function working on the team match. The sequence-based deep learning model process the action sequence from the game start to the end of a player in a team play using a GRU unit that takes a hidden state from the previous step and the current input selectively. The loss function is designed to help the action score to reflect the final score and the success of the team. We showed that our model can evaluate a player's individual performance fairly and analyze the contributions of the player's respective actions.  ( 3 min )
    Multi-Asset Closed-Loop Reservoir Management Using Deep Reinforcement Learning. (arXiv:2207.10376v1 [cs.LG])
    Closed-loop reservoir management (CLRM), in which history matching and production optimization are performed multiple times over the life of an asset, can provide significant improvement in the specified objective. These procedures are computationally expensive due to the large number of flow simulations required for data assimilation and optimization. Existing CLRM procedures are applied asset by asset, without utilizing information that could be useful over a range assets. Here, we develop a CLRM framework for multiple assets with varying numbers of wells. We use deep reinforcement learning to train a single global control policy that is applicable for all assets considered. The new framework is an extension of a recently introduced control policy methodology for individual assets. Embedding layers are incorporated into the representation to handle the different numbers of decision variables that arise for the different assets. Because the global control policy learns a unified representation of useful features from multiple assets, it is less expensive to construct than asset-by-asset training (we observe about 3x speedup in our examples). The production optimization problem includes a relative-change constraint on the well settings, which renders the results suitable for practical use. We apply the multi-asset CLRM framework to 2D and 3D water-flooding examples. In both cases, four assets with different well counts, well configurations, and geostatistical descriptions are considered. Numerical experiments demonstrate that the global control policy provides objective function values, for both the 2D and 3D cases, that are nearly identical to those from control policies trained individually for each asset. This promising finding suggests that multi-asset CLRM may indeed represent a viable practical strategy.  ( 3 min )
    Comparative Study on Supervised versus Semi-supervised Machine Learning for Anomaly Detection of In-vehicle CAN Network. (arXiv:2207.10286v1 [cs.LG])
    As the central nerve of the intelligent vehicle control system, the in-vehicle network bus is crucial to the security of vehicle driving. One of the best standards for the in-vehicle network is the Controller Area Network (CAN bus) protocol. However, the CAN bus is designed to be vulnerable to various attacks due to its lack of security mechanisms. To enhance the security of in-vehicle networks and promote the research in this area, based upon a large scale of CAN network traffic data with the extracted valuable features, this study comprehensively compared fully-supervised machine learning with semi-supervised machine learning methods for CAN message anomaly detection. Both traditional machine learning models (including single classifier and ensemble models) and neural network based deep learning models are evaluated. Furthermore, this study proposed a deep autoencoder based semi-supervised learning method applied for CAN message anomaly detection and verified its superiority over other semi-supervised methods. Extensive experiments show that the fully-supervised methods generally outperform semi-supervised ones as they are using more information as inputs. Typically the developed XGBoost based model obtained state-of-the-art performance with the best accuracy (98.65%), precision (0.9853), and ROC AUC (0.9585) beating other methods reported in the literature.  ( 3 min )
    Sequence Models for Drone vs Bird Classification. (arXiv:2207.10409v1 [cs.CV])
    Drone detection has become an essential task in object detection as drone costs have decreased and drone technology has improved. It is, however, difficult to detect distant drones when there is weak contrast, long range, and low visibility. In this work, we propose several sequence classification architectures to reduce the detected false-positive ratio of drone tracks. Moreover, we propose a new drone vs. bird sequence classification dataset to train and evaluate the proposed architectures. 3D CNN, LSTM, and Transformer based sequence classification architectures have been trained on the proposed dataset to show the effectiveness of the proposed idea. As experiments show, using sequence information, bird classification and overall F1 scores can be increased by up to 73% and 35%, respectively. Among all sequence classification models, R(2+1)D-based fully convolutional model yields the best transfer learning and fine-tuning results.  ( 2 min )
    Unimodal vs. Multimodal Siamese Networks for Outfit Completion. (arXiv:2207.10355v1 [cs.IR])
    The popularity of online fashion shopping continues to grow. The ability to offer an effective recommendation to customers is becoming increasingly important. In this work, we focus on Fashion Outfits Challenge, part of SIGIR 2022 Workshop on eCommerce. The challenge is centered around Fill in the Blank (FITB) task that implies predicting the missing outfit, given an incomplete outfit and a list of candidates. In this paper, we focus on applying siamese networks on the task. More specifically, we explore how combining information from multiple modalities (textual and visual modality) impacts the performance of the model on the task. We evaluate our model on the test split provided by the challenge organizers and the test split with gold assignments that we created during the development phase. We discover that using both visual, and visual and textual data demonstrates promising results on the task. We conclude by suggesting directions for further improvement of our method.  ( 2 min )
    GBDF: Gender Balanced DeepFake Dataset Towards Fair DeepFake Detection. (arXiv:2207.10246v1 [cs.CV])
    Facial forgery by deepfakes has raised severe societal concerns. Several solutions have been proposed by the vision community to effectively combat the misinformation on the internet via automated deepfake detection systems. Recent studies have demonstrated that facial analysis-based deep learning models can discriminate based on protected attributes. For the commercial adoption and massive roll-out of the deepfake detection technology, it is vital to evaluate and understand the fairness (the absence of any prejudice or favoritism) of deepfake detectors across demographic variations such as gender and race. As the performance differential of deepfake detectors between demographic subgroups would impact millions of people of the deprived sub-group. This paper aims to evaluate the fairness of the deepfake detectors across males and females. However, existing deepfake datasets are not annotated with demographic labels to facilitate fairness analysis. To this aim, we manually annotated existing popular deepfake datasets with gender labels and evaluated the performance differential of current deepfake detectors across gender. Our analysis on the gender-labeled version of the datasets suggests (a) current deepfake datasets have skewed distribution across gender, and (b) commonly adopted deepfake detectors obtain unequal performance across gender with mostly males outperforming females. Finally, we contributed a gender-balanced and annotated deepfake dataset, GBDF, to mitigate the performance differential and to promote research and development towards fairness-aware deep fake detectors. The GBDF dataset is publicly available at: https://github.com/aakash4305/GBDF  ( 3 min )
    FOCUS: Fairness via Agent-Awareness for Federated Learning on Heterogeneous Data. (arXiv:2207.10265v1 [cs.LG])
    Federated learning (FL) provides an effective paradigm to train machine learning models over distributed data with privacy protection. However, recent studies show that FL is subject to various security, privacy, and fairness threats due to the potentially malicious and heterogeneous local agents. For instance, it is vulnerable to local adversarial agents who only contribute low-quality data, with the goal of harming the performance of those with high-quality data. This kind of attack hence breaks existing definitions of fairness in FL that mainly focus on a certain notion of performance parity. In this work, we aim to address this limitation and propose a formal definition of fairness via agent-awareness for FL (FAA), which takes the heterogeneous data contributions of local agents into account. In addition, we propose a fair FL training algorithm based on agent clustering (FOCUS) to achieve FAA. Theoretically, we prove the convergence and optimality of FOCUS under mild conditions for linear models and general convex loss functions with bounded smoothness. We also prove that FOCUS always achieves higher fairness measured by FAA compared with standard FedAvg protocol under both linear models and general convex loss functions. Empirically, we evaluate FOCUS on four datasets, including synthetic data, images, and texts under different settings, and we show that FOCUS achieves significantly higher fairness based on FAA while maintaining similar or even higher prediction accuracy compared with FedAvg.  ( 3 min )
    World Robot Challenge 2020 -- Partner Robot: A Data-Driven Approach for Room Tidying with Mobile Manipulator. (arXiv:2207.10106v1 [cs.RO])
    Tidying up a household environment using a mobile manipulator poses various challenges in robotics, such as adaptation to large real-world environmental variations, and safe and robust deployment in the presence of humans.The Partner Robot Challenge in World Robot Challenge (WRC) 2020, a global competition held in September 2021, benchmarked tidying tasks in the real home environments, and importantly, tested for full system performances.For this challenge, we developed an entire household service robot system, which leverages a data-driven approach to adapt to numerous edge cases that occur during the execution, instead of classical manual pre-programmed solutions.In this paper, we describe the core ingredients of the proposed robot system, including visual recognition, object manipulation, and motion planning. Our robot system won the second prize, verifying the effectiveness and potential of data-driven robot systems for mobile manipulation in home environments.  ( 2 min )
    On the Implementation of a Reinforcement Learning-based Capacity Sharing Algorithm in O-RAN. (arXiv:2207.10390v1 [cs.NI])
    The capacity sharing problem in Radio Access Network (RAN) slicing deals with the distribution of the capacity available in each RAN node among various RAN slices to satisfy their traffic demands and efficiently use the radio resources. While several capacity sharing algorithmic solutions have been proposed in the literature, their practical implementation still remains as a gap. In this paper, the implementation of a Reinforcement Learning-based capacity sharing algorithm over the O-RAN architecture is discussed, providing insights into the operation of the involved interfaces and the containerization of the solution. Moreover, the description of the testbed implemented to validate the solution is included and some performance and validation results are presented.  ( 2 min )
    Mixed-Precision Inference Quantization: Radically Towards Faster inference speed, Lower Storage requirement, and Lower Loss. (arXiv:2207.10083v1 [cs.LG])
    Based on the model's resilience to computational noise, model quantization is important for compressing models and improving computing speed. Existing quantization techniques rely heavily on experience and "fine-tuning" skills. In the majority of instances, the quantization model has a larger loss than a full precision model. This study provides a methodology for acquiring a mixed-precise quantization model with a lower loss than the full precision model. In addition, the analysis demonstrates that, throughout the inference process, the loss function is mostly affected by the noise of the layer inputs. In particular, we will demonstrate that neural networks with massive identity mappings are resistant to the quantization method. It is also difficult to improve the performance of these networks using quantization.  ( 2 min )
    Learning Deformable Object Manipulation from Expert Demonstrations. (arXiv:2207.10148v1 [cs.RO])
    We present a novel Learning from Demonstration (LfD) method, Deformable Manipulation from Demonstrations (DMfD), to solve deformable manipulation tasks using states or images as inputs, given expert demonstrations. Our method uses demonstrations in three different ways, and balances the trade-off between exploring the environment online and using guidance from experts to explore high dimensional spaces effectively. We test DMfD on a set of representative manipulation tasks for a 1-dimensional rope and a 2-dimensional cloth from the SoftGym suite of tasks, each with state and image observations. Our method exceeds baseline performance by up to 12.9% for state-based tasks and up to 33.44% on image-based tasks, with comparable or better robustness to randomness. Additionally, we create two challenging environments for folding a 2D cloth using image-based observations, and set a performance benchmark for them. We deploy DMfD on a real robot with a minimal loss in normalized performance during real-world execution compared to simulation (~6%). Source code is on github.com/uscresl/dmfd  ( 2 min )
    Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning. (arXiv:2207.10295v1 [cs.LG])
    Impressive results in natural language processing (NLP) based on the Transformer neural network architecture have inspired researchers to explore viewing offline reinforcement learning (RL) as a generic sequence modeling problem. Recent works based on this paradigm have achieved state-of-the-art results in several of the mostly deterministic offline Atari and D4RL benchmarks. However, because these methods jointly model the states and actions as a single sequencing problem, they struggle to disentangle the effects of the policy and world dynamics on the return. Thus, in adversarial or stochastic environments, these methods lead to overly optimistic behavior that can be dangerous in safety-critical systems like autonomous driving. In this work, we propose a method that addresses this optimism bias by explicitly disentangling the policy and world models, which allows us at test time to search for policies that are robust to multiple possible futures in the environment. We demonstrate our method's superior performance on a variety of autonomous driving tasks in simulation.  ( 2 min )
    SplitMixer: Fat Trimmed From MLP-like Models. (arXiv:2207.10255v1 [cs.CV])
    We present SplitMixer, a simple and lightweight isotropic MLP-like architecture, for visual recognition. It contains two types of interleaving convolutional operations to mix information across spatial locations (spatial mixing) and channels (channel mixing). The first one includes sequentially applying two depthwise 1D kernels, instead of a 2D kernel, to mix spatial information. The second one is splitting the channels into overlapping or non-overlapping segments, with or without shared parameters, and applying our proposed channel mixing approaches or 3D convolution to mix channel information. Depending on design choices, a number of SplitMixer variants can be constructed to balance accuracy, the number of parameters, and speed. We show, both theoretically and experimentally, that SplitMixer performs on par with the state-of-the-art MLP-like models while having a significantly lower number of parameters and FLOPS. For example, without strong data augmentation and optimization, SplitMixer achieves around 94% accuracy on CIFAR-10 with only 0.28M parameters, while ConvMixer achieves the same accuracy with about 0.6M parameters. The well-known MLP-Mixer achieves 85.45% with 17.1M parameters. On CIFAR-100 dataset, SplitMixer achieves around 73% accuracy, on par with ConvMixer, but with about 52% fewer parameters and FLOPS. We hope that our results spark further research towards finding more efficient vision architectures and facilitate the development of MLP-like models. Code is available at https://github.com/aliborji/splitmixer.  ( 3 min )
    Improving Privacy-Preserving Vertical Federated Learning by Efficient Communication with ADMM. (arXiv:2207.10226v1 [cs.LG])
    Federated learning (FL) enables distributed devices to jointly train a shared model while keeping the training data local. Different from the horizontal FL (HFL) setting where each client has partial data samples, vertical FL (VFL), which allows each client to collect partial features, has attracted intensive research efforts recently. In this paper, we identified two challenges that state-of-the-art VFL frameworks are facing: (1) some works directly average the learned feature embeddings and therefore might lose the unique properties of each local feature set; (2) server needs to communicate gradients with the clients for each training step, incurring high communication cost that leads to rapid consumption of privacy budgets. In this paper, we aim to address the above challenges and propose an efficient VFL with multiple linear heads (VIM) framework, where each head corresponds to local clients by taking the separate contribution of each client into account. In addition, we propose an Alternating Direction Method of Multipliers (ADMM)-based method to solve our optimization problem, which reduces the communication cost by allowing multiple local updates in each step, and thus leads to better performance under differential privacy. We consider various settings including VFL with model splitting and without model splitting. For both settings, we carefully analyze the differential privacy mechanism for our framework. Moreover, we show that a byproduct of our framework is that the weights of learned linear heads reflect the importance of local clients. We conduct extensive evaluations and show that on four real-world datasets, VIM achieves significantly higher performance and faster convergence compared with state-of-the-arts. We also explicitly evaluate the importance of local clients and show that VIM enables functionalities such as client-level explanation and client denoising.  ( 3 min )
    Bitwidth-Adaptive Quantization-Aware Neural Network Training: A Meta-Learning Approach. (arXiv:2207.10188v1 [cs.LG])
    Deep neural network quantization with adaptive bitwidths has gained increasing attention due to the ease of model deployment on various platforms with different resource budgets. In this paper, we propose a meta-learning approach to achieve this goal. Specifically, we propose MEBQAT, a simple yet effective way of bitwidth-adaptive quantization aware training (QAT) where meta-learning is effectively combined with QAT by redefining meta-learning tasks to incorporate bitwidths. After being deployed on a platform, MEBQAT allows the (meta-)trained model to be quantized to any candidate bitwidth then helps to conduct inference without much accuracy drop from quantization. Moreover, with a few-shot learning scenario, MEBQAT can also adapt a model to any bitwidth as well as any unseen target classes by adding conventional optimization or metric-based meta-learning. We design variants of MEBQAT to support both (1) a bitwidth-adaptive quantization scenario and (2) a new few-shot learning scenario where both quantization bitwidths and target classes are jointly adapted. We experimentally demonstrate their validity in multiple QAT schemes. By comparing their performance to (bitwidth-dedicated) QAT, existing bitwidth adaptive QAT and vanilla meta-learning, we find that merging bitwidths into meta-learning tasks achieves a higher level of robustness.  ( 2 min )
    Unsupervised Legendre-Galerkin Neural Network for Stiff Partial Differential Equations. (arXiv:2207.10241v1 [cs.LG])
    Machine learning methods have been lately used to solve differential equations and dynamical systems. These approaches have been developed into a novel research field known as scientific machine learning in which techniques such as deep neural networks and statistical learning are applied to classical problems of applied mathematics. Because neural networks provide an approximation capability, computational parameterization through machine learning and optimization methods achieve noticeable performance when solving various partial differential equations (PDEs). In this paper, we develop a novel numerical algorithm that incorporates machine learning and artificial intelligence to solve PDEs. In particular, we propose an unsupervised machine learning algorithm based on the Legendre-Galerkin neural network to find an accurate approximation to the solution of different types of PDEs. The proposed neural network is applied to the general 1D and 2D PDEs as well as singularly perturbed PDEs that possess boundary layer behavior.  ( 2 min )
    On Label Granularity and Object Localization. (arXiv:2207.10225v1 [cs.CV])
    Weakly supervised object localization (WSOL) aims to learn representations that encode object location using only image-level category labels. However, many objects can be labeled at different levels of granularity. Is it an animal, a bird, or a great horned owl? Which image-level labels should we use? In this paper we study the role of label granularity in WSOL. To facilitate this investigation we introduce iNatLoc500, a new large-scale fine-grained benchmark dataset for WSOL. Surprisingly, we find that choosing the right training label granularity provides a much larger performance boost than choosing the best WSOL algorithm. We also show that changing the label granularity can significantly improve data efficiency.  ( 2 min )
    Direct Localization in Underwater Acoustics via Convolutional Neural Networks: A Data-Driven Approach. (arXiv:2207.10222v1 [cs.LG])
    Direct localization (DLOC) methods, which use the observed data to localize a source at an unknown position in a one-step procedure, generally outperform their indirect two-step counterparts (e.g., using time-difference of arrivals). However, underwater acoustic DLOC methods require prior knowledge of the environment, and are computationally costly, hence slow. We propose, what is to the best of our knowledge, the first data-driven DLOC method. Inspired by classical and contemporary optimal model-based DLOC solutions, and leveraging the capabilities of convolutional neural networks (CNNs), we devise a holistic CNN-based solution. Our method includes a specifically-tailored input structure, architecture, loss function, and a progressive training procedure, which are of independent interest in the broader context of machine learning. We demonstrate that our method outperforms attractive alternatives, and asymptotically matches the performance of an oracle optimal model-based solution.  ( 2 min )
    The Game of Hidden Rules: A New Kind of Benchmark Challenge for Machine Learning. (arXiv:2207.10218v1 [cs.LG])
    As machine learning (ML) is more tightly woven into society, it is imperative that we better characterize ML's strengths and limitations if we are to employ it responsibly. Existing benchmark environments for ML, such as board and video games, offer well-defined benchmarks for progress, but constituent tasks are often complex, and it is frequently unclear how task characteristics contribute to overall difficulty for the machine learner. Likewise, without a systematic assessment of how task characteristics influence difficulty, it is challenging to draw meaningful connections between performance in different benchmark environments. We introduce a novel benchmark environment that offers an enormous range of ML challenges and enables precise examination of how task elements influence practical difficulty. The tool frames learning tasks as a "board-clearing game," which we call the Game of Hidden Rules (GOHR). The environment comprises an expressive rule language and a captive server environment that can be installed locally. We propose a set of benchmark rule-learning tasks and plan to support a performance leader-board for researchers interested in attempting to learn our rules. GOHR complements existing environments by allowing fine, controlled modifications to tasks, enabling experimenters to better understand how each facet of a given learning task contributes to its practical difficulty for an arbitrary ML algorithm.  ( 3 min )
    Slimmable Quantum Federated Learning. (arXiv:2207.10221v1 [cs.LG])
    Quantum federated learning (QFL) has recently received increasing attention, where quantum neural networks (QNNs) are integrated into federated learning (FL). In contrast to the existing static QFL methods, we propose slimmable QFL (SlimQFL) in this article, which is a dynamic QFL framework that can cope with time-varying communication channels and computing energy limitations. This is made viable by leveraging the unique nature of a QNN where its angle parameters and pole parameters can be separately trained and dynamically exploited. Simulation results corroborate that SlimQFL achieves higher classification accuracy than Vanilla QFL, particularly under poor channel conditions on average.  ( 2 min )
    What Do We Maximize in Self-Supervised Learning?. (arXiv:2207.10081v1 [cs.LG])
    In this paper, we examine self-supervised learning methods, particularly VICReg, to provide an information-theoretical understanding of their construction. As a first step, we demonstrate how information-theoretic quantities can be obtained for a deterministic network, offering a possible alternative to prior work that relies on stochastic models. This enables us to demonstrate how VICReg can be (re)discovered from first principles and its assumptions about data distribution. Furthermore, we empirically demonstrate the validity of our assumptions, confirming our novel understanding of VICReg. Finally, we believe that the derivation and insights we obtain can be generalized to many other SSL methods, opening new avenues for theoretical and practical understanding of SSL and transfer learning.  ( 2 min )
    Flow-based Visual Quality Enhancer for Super-resolution Magnetic Resonance Spectroscopic Imaging. (arXiv:2207.10181v1 [eess.IV])
    Magnetic Resonance Spectroscopic Imaging (MRSI) is an essential tool for quantifying metabolites in the body, but the low spatial resolution limits its clinical applications. Deep learning-based super-resolution methods provided promising results for improving the spatial resolution of MRSI, but the super-resolved images are often blurry compared to the experimentally-acquired high-resolution images. Attempts have been made with the generative adversarial networks to improve the image visual quality. In this work, we consider another type of generative model, the flow-based model, of which the training is more stable and interpretable compared to the adversarial networks. Specifically, we propose a flow-based enhancer network to improve the visual quality of super-resolution MRSI. Different from previous flow-based models, our enhancer network incorporates anatomical information from additional image modalities (MRI) and uses a learnable base distribution. In addition, we impose a guide loss and a data-consistency loss to encourage the network to generate images with high visual quality while maintaining high fidelity. Experiments on a 1H-MRSI dataset acquired from 25 high-grade glioma patients indicate that our enhancer network outperforms the adversarial networks and the baseline flow-based methods. Our method also allows visual quality adjustment and uncertainty estimation.  ( 3 min )
    Liver Segmentation using Turbolift Learning for CT and Cone-beam C-arm Perfusion Imaging. (arXiv:2207.10167v1 [eess.IV])
    Model-based reconstruction employing the time separation technique (TST) was found to improve dynamic perfusion imaging of the liver using C-arm cone-beam computed tomography (CBCT). To apply TST using prior knowledge extracted from CT perfusion data, the liver should be accurately segmented from the CT scans. Reconstructions of primary and model-based CBCT data need to be segmented for proper visualisation and interpretation of perfusion maps. This research proposes Turbolift learning, which trains a modified version of the multi-scale Attention UNet on different liver segmentation tasks serially, following the order of the trainings CT, CBCT, CBCT TST - making the previous trainings act as pre-training stages for the subsequent ones - addressing the problem of limited number of datasets for training. For the final task of liver segmentation from CBCT TST, the proposed method achieved an overall Dice scores of 0.874$\pm$0.031 and 0.905$\pm$0.007 in 6-fold and 4-fold cross-validation experiments, respectively - securing statistically significant improvements over the model, which was trained only for that task. Experiments revealed that Turbolift not only improves the overall performance of the model but also makes it robust against artefacts originating from the embolisation materials and truncation artefacts. Additionally, in-depth analyses confirmed the order of the segmentation tasks. This paper shows the potential of segmenting the liver from CT, CBCT, and CBCT TST, learning from the available limited training data, which can possibly be used in the future for the visualisation and evaluation of the perfusion maps for the treatment evaluation of liver diseases.  ( 3 min )
    Constrained Prescriptive Trees via Column Generation. (arXiv:2207.10163v1 [math.OC])
    With the abundance of available data, many enterprises seek to implement data-driven prescriptive analytics to help them make informed decisions. These prescriptive policies need to satisfy operational constraints, and proactively eliminate rule conflicts, both of which are ubiquitous in practice. It is also desirable for them to be simple and interpretable, so they can be easily verified and implemented. Existing approaches from the literature center around constructing variants of prescriptive decision trees to generate interpretable policies. However, none of the existing methods are able to handle constraints. In this paper, we propose a scalable method that solves the constrained prescriptive policy generation problem. We introduce a novel path-based mixed-integer program (MIP) formulation which identifies a (near) optimal policy efficiently via column generation. The policy generated can be represented as a multiway-split tree which is more interpretable and informative than a binary-split tree due to its shorter rules. We demonstrate the efficacy of our method with extensive experiments on both synthetic and real datasets.  ( 2 min )
    Hydra: Hybrid Server Power Model. (arXiv:2207.10217v1 [cs.DC])
    With the growing complexity of big data workloads that require abundant data and computation, data centers consume a tremendous amount of power daily. In an effort to minimize data center power consumption, several studies developed power models that can be used for job scheduling either reducing the number of active servers or balancing workloads across servers at their peak energy efficiency points. Due to increasing software and hardware heterogeneity, we observed that there is no single power model that works the best for all server conditions. Some complicated machine learning models themselves incur performance and power overheads and hence it is not desirable to use them frequently. There are no power models that consider containerized workload execution. In this paper, we propose a hybrid server power model, Hydra, that considers both prediction accuracy and performance overhead. Hydra dynamically chooses the best power model for the given server conditions. Compared with state-of-the-art solutions, Hydra outperforms across all compute-intensity levels on heterogeneous servers.  ( 2 min )
    Model Compression for Resource-Constrained Mobile Robots. (arXiv:2207.10082v1 [cs.LG])
    The number of mobile robots with constrained computing resources that need to execute complex machine learning models has been increasing during the past decade. Commonly, these robots rely on edge infrastructure accessible over wireless communication to execute heavy computational complex tasks. However, the edge might become unavailable and, consequently, oblige the execution of the tasks on the robot. This work focuses on making it possible to execute the tasks on the robots by reducing the complexity and the total number of parameters of pre-trained computer vision models. This is achieved by using model compression techniques such as Pruning and Knowledge Distillation. These compression techniques have strong theoretical and practical foundations, but their combined usage has not been widely explored in the literature. Therefore, this work especially focuses on investigating the effects of combining these two compression techniques. The results of this work reveal that up to 90% of the total number of parameters of a computer vision model can be removed without any considerable reduction in the model's accuracy.  ( 3 min )
    An Introduction to Modern Statistical Learning. (arXiv:2207.10185v1 [cs.LG])
    This work in progress aims to provide a unified introduction to statistical learning, building up slowly from classical models like the GMM and HMM to modern neural networks like the VAE and diffusion models. There are today many internet resources that explain this or that new machine-learning algorithm in isolation, but they do not (and cannot, in so brief a space) connect these algorithms with each other or with the classical literature on statistical models, out of which the modern algorithms emerged. Also conspicuously lacking is a single notational system which, although unfazing to those already familiar with the material (like the authors of these posts), raises a significant barrier to the novice's entry. Likewise, I have aimed to assimilate the various models, wherever possible, to a single framework for inference and learning, showing how (and why) to change one model into another with minimal alteration (some of them novel, others from the literature). Some background is of course necessary. I have assumed the reader is familiar with basic multivariable calculus, probability and statistics, and linear algebra. The goal of this book is certainly not completeness, but rather to draw a more or less straight-line path from the basics to the extremely powerful new models of the last decade. The goal then is to complement, not replace, such comprehensive texts as Bishop's \emph{Pattern Recognition and Machine Learning}, which is now 15 years old.  ( 3 min )
    Towards Better Evaluation for Dynamic Link Prediction. (arXiv:2207.10128v1 [cs.LG])
    There has been recent success in learning from static graphs, but despite their prevalence, learning from time-evolving graphs remains challenging. We design new, more stringent evaluation procedures for link prediction specific to dynamic graphs, which reflect real-world considerations and can better compare different methods' strengths and weaknesses. In particular, we create two visualization techniques to understand the recurring patterns of edges over time. They show that many edges reoccur at later time steps. Therefore, we propose a pure memorization baseline called EdgeBank. It achieves surprisingly strong performance across multiple settings, partly due to the easy negative edges used in the current evaluation setting. Hence, we introduce two more challenging negative sampling strategies that improve robustness and can better match real-world applications. Lastly, we introduce five new dynamic graph datasets from a diverse set of domains missing from current benchmarks, providing new challenges and opportunities for future research.  ( 2 min )
    Pediatric Bone Age Assessment using Deep Learning Models. (arXiv:2207.10169v1 [cs.CV])
    Bone age assessment (BAA) is a standard method for determining the age difference between skeletal and chronological age. Manual processes are complicated and necessitate the expertise of experts. This is where deep learning comes into play. In this study, pre-trained models like VGG-16, InceptionV3, XceptionNet, and MobileNet are used to assess the bone age of the input data, and their mean average errors are compared and evaluated to see which model predicts the best.  ( 2 min )
    Digraphwave: Scalable Extraction of Structural Node Embeddings via Diffusion on Directed Graphs. (arXiv:2207.10149v1 [cs.SI])
    Structural node embeddings, vectors capturing local connectivity information for each node in a graph, have many applications in data mining and machine learning, e.g., network alignment and node classification, clustering and anomaly detection. For the analysis of directed graphs, e.g., transactions graphs, communication networks and social networks, the capability to capture directional information in the structural node embeddings is highly desirable, as is scalability of the embedding extraction method. Most existing methods are nevertheless only designed for undirected graph. Therefore, we present Digraphwave -- a scalable algorithm for extracting structural node embeddings on directed graphs. The Digraphwave embeddings consist of compressed diffusion pattern signatures, which are twice enhanced to increase their discriminate capacity. By proving a lower bound on the heat contained in the local vicinity of a diffusion initialization node, theoretically justified diffusion timescale values are established, and Digraphwave is left with only two easy-to-interpret hyperparameters: the embedding dimension and a neighbourhood resolution specifier. In our experiments, the two embedding enhancements, named transposition and aggregation, are shown to lead to a significant increase in macro F1 score for classifying automorphic identities, with Digraphwave outperforming all other structural embedding baselines. Moreover, Digraphwave either outperforms or matches the performance of all baselines on real graph datasets, displaying a particularly large performance gain in a network alignment task, while also being scalable to graphs with millions of nodes and edges, running up to 30x faster than a previous diffusion pattern based method and with a fraction of the memory consumption.  ( 3 min )
    On the Robustness of 3D Object Detectors. (arXiv:2207.10205v1 [cs.CV])
    In recent years, significant progress has been achieved for 3D object detection on point clouds thanks to the advances in 3D data collection and deep learning techniques. Nevertheless, 3D scenes exhibit a lot of variations and are prone to sensor inaccuracies as well as information loss during pre-processing. Thus, it is crucial to design techniques that are robust against these variations. This requires a detailed analysis and understanding of the effect of such variations. This work aims to analyze and benchmark popular point-based 3D object detectors against several data corruptions. To the best of our knowledge, we are the first to investigate the robustness of point-based 3D object detectors. To this end, we design and evaluate corruptions that involve data addition, reduction, and alteration. We further study the robustness of different modules against local and global variations. Our experimental results reveal several intriguing findings. For instance, we show that methods that integrate Transformers at a patch or object level lead to increased robustness, compared to using Transformers at the point level.  ( 2 min )
    Provably tuning the ElasticNet across instances. (arXiv:2207.10199v1 [cs.LG])
    An important unresolved challenge in the theory of regularization is to set the regularization coefficients of popular techniques like the ElasticNet with general provable guarantees. We consider the problem of tuning the regularization parameters of Ridge regression, LASSO, and the ElasticNet across multiple problem instances, a setting that encompasses both cross-validation and multi-task hyperparameter optimization. We obtain a novel structural result for the ElasticNet which characterizes the loss as a function of the tuning parameters as a piecewise-rational function with algebraic boundaries. We use this to bound the structural complexity of the regularized loss functions and show generalization guarantees for tuning the ElasticNet regression coefficients in the statistical setting. We also consider the more challenging online learning setting, where we show vanishing average expected regret relative to the optimal parameter pair. We further extend our results to tuning classification algorithms obtained by thresholding regression fits regularized by Ridge, LASSO, or ElasticNet. Our results are the first general learning-theoretic guarantees for this important class of problems that avoid strong assumptions on the data distribution. Furthermore, our guarantees hold for both validation and popular information criterion objectives.  ( 2 min )
    Latent Discriminant deterministic Uncertainty. (arXiv:2207.10130v1 [cs.CV])
    Predictive uncertainty estimation is essential for deploying Deep Neural Networks in real-world autonomous systems. However, most successful approaches are computationally intensive. In this work, we attempt to address these challenges in the context of autonomous driving perception tasks. Recently proposed Deterministic Uncertainty Methods (DUM) can only partially meet such requirements as their scalability to complex computer vision tasks is not obvious. In this work we advance a scalable and effective DUM for high-resolution semantic segmentation, that relaxes the Lipschitz constraint typically hindering practicality of such architectures. We learn a discriminant latent space by leveraging a distinction maximization layer over an arbitrarily-sized set of trainable prototypes. Our approach achieves competitive results over Deep Ensembles, the state-of-the-art for uncertainty prediction, on image classification, segmentation and monocular depth estimation tasks. Our code is available at https://github.com/ENSTA-U2IS/LDU  ( 2 min )
    Structural Causal 3D Reconstruction. (arXiv:2207.10156v1 [cs.LG])
    This paper considers the problem of unsupervised 3D object reconstruction from in-the-wild single-view images. Due to ambiguity and intrinsic ill-posedness, this problem is inherently difficult to solve and therefore requires strong regularization to achieve disentanglement of different latent factors. Unlike existing works that introduce explicit regularizations into objective functions, we look into a different space for implicit regularization -- the structure of latent space. Specifically, we restrict the structure of latent space to capture a topological causal ordering of latent factors (i.e., representing causal dependency as a directed acyclic graph). We first show that different causal orderings matter for 3D reconstruction, and then explore several approaches to find a task-dependent causal factor ordering. Our experiments demonstrate that the latent space structure indeed serves as an implicit regularization and introduces an inductive bias beneficial for reconstruction.  ( 2 min )
    Continual Variational Autoencoder Learning via Online Cooperative Memorization. (arXiv:2207.10131v1 [cs.LG])
    Due to their inference, data representation and reconstruction properties, Variational Autoencoders (VAE) have been successfully used in continual learning classification tasks. However, their ability to generate images with specifications corresponding to the classes and databases learned during Continual Learning (CL) is not well understood and catastrophic forgetting remains a significant challenge. In this paper, we firstly analyze the forgetting behaviour of VAEs by developing a new theoretical framework that formulates CL as a dynamic optimal transport problem. This framework proves approximate bounds to the data likelihood without requiring the task information and explains how the prior knowledge is lost during the training process. We then propose a novel memory buffering approach, namely the Online Cooperative Memorization (OCM) framework, which consists of a Short-Term Memory (STM) that continually stores recent samples to provide future information for the model, and a Long-Term Memory (LTM) aiming to preserve a wide diversity of samples. The proposed OCM transfers certain samples from STM to LTM according to the information diversity selection criterion without requiring any supervised signals. The OCM framework is then combined with a dynamic VAE expansion mixture network for further enhancing its performance.  ( 2 min )
    Learning Underspecified Models. (arXiv:2207.10140v1 [econ.TH])
    This paper examines whether one can learn to play an optimal action while only knowing part of true specification of the environment. We choose the optimal pricing problem as our laboratory, where the monopolist is endowed with an underspecified model of the market demand, but can observe market outcomes. In contrast to conventional learning models where the model specification is complete and exogenously fixed, the monopolist has to learn the specification and the parameters of the demand curve from the data. We formulate the learning dynamics as an algorithm that forecast the optimal price based on the data, following the machine learning literature (Shalev-Shwartz and Ben-David (2014)). Inspired by PAC learnability, we develop a new notion of learnability by requiring that the algorithm must produce an accurate forecast with a reasonable amount of data uniformly over the class of models consistent with the part of the true specification. In addition, we assume that the monopolist has a lexicographic preference over the payoff and the complexity cost of the algorithm, seeking an algorithm with a minimum number of parameters subject to PAC-guaranteeing the optimal solution (Rubinstein (1986)). We show that for the set of demand curves with strictly decreasing uniformly Lipschitz continuous marginal revenue curve, the optimal algorithm recursively estimates the slope and the intercept of the linear demand curve, even if the actual demand curve is not linear. The monopolist chooses a misspecified model to save computational cost, while learning the true optimal decision uniformly over the set of underspecified demand curves.  ( 3 min )
  • Open

    Imitation of Manipulation Skills Using Multiple Geometries. (arXiv:2203.01171v3 [cs.RO] UPDATED)
    Daily manipulation tasks are characterized by geometric primitives related to actions and object shapes. Such geometric descriptors are poorly represented by only using Cartesian coordinate systems. In this paper, we propose a learning approach to extract the optimal representation from a dictionary of coordinate systems to encode an observed movement/behavior. This is achieved by using an extension of Gaussian distributions on Riemannian manifolds, which is used to analyse a set of user demonstrations statistically, by considering multiple geometries as candidate representations of the task. We formulate the reproduction problem as a general optimal control problem based on an iterative linear quadratic regulator (iLQR), where the Gaussian distribution in the extracted coordinate systems are used to define the cost function. We apply our approach to object grasping and box opening tasks in simulation and on a 7-axis Franka Emika robot. The results show that the robot can exploit several geometries to execute the manipulation task and generalize it to new situations, by maintaining the invariant characteristics of the task in the coordinate system(s) of interest.  ( 2 min )
    Generative Adversarial Networks for Labeled Acceleration Data Augmentation for Structural Damage Detection. (arXiv:2112.03478v5 [cs.LG] UPDATED)
    There has been a major advance in the field of Data Science in the last few decades, and these have been utilized for different engineering disciplines and applications. Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) algorithms have been utilized for civil Structural Health Monitoring (SHM) especially for damage detection applications using sensor data. Although ML and DL methods show superior learning skills for complex data structures, they require plenty of data for training. However, in SHM, data collection from civil structures can be expensive and time taking; particularly getting useful data (damage associated data) can be challenging. The objective of this study is to address the data scarcity problem for damage detection applications. This paper employs 1-D Wasserstein Deep Convolutional Generative Adversarial Networks using Gradient Penalty (1-D WDCGAN-GP) for synthetic labelled acceleration data generation. Then, the generated data is augmented with varying ratios for the training dataset of a 1-D Deep Convolutional Neural Network (1-D DCNN) for damage detection application. The damage detection results show that the 1-D WDCGAN-GP can be successfully utilized to tackle data scarcity in vibration-based damage detection applications of civil structures. Keywords: Structural Health Monitoring (SHM), Structural Damage Detection, 1-D Deep Convolutional Neural Networks (1-D DCNN), 1-D Generative Adversarial Networks (1-D GAN), Wasserstein Generative Adversarial Networks with Gradient Penalty (WGAN-GP)  ( 3 min )
    Federated Learning with Non-IID Data. (arXiv:1806.00582v2 [cs.LG] UPDATED)
    Federated learning enables resource-constrained edge compute devices, such as mobile phones and IoT devices, to learn a shared model for prediction, while keeping the training data local. This decentralized approach to train models provides privacy, security, regulatory and economic benefits. In this work, we focus on the statistical challenge of federated learning when local data is non-IID. We first show that the accuracy of federated learning reduces significantly, by up to 55% for neural networks trained for highly skewed non-IID data, where each client device trains only on a single class of data. We further show that this accuracy reduction can be explained by the weight divergence, which can be quantified by the earth mover's distance (EMD) between the distribution over classes on each device and the population distribution. As a solution, we propose a strategy to improve training on non-IID data by creating a small subset of data which is globally shared between all the edge devices. Experiments show that accuracy can be increased by 30% for the CIFAR-10 dataset with only 5% globally shared data.  ( 3 min )
    Distribution Approximation and Statistical Estimation Guarantees of Generative Adversarial Networks. (arXiv:2002.03938v3 [cs.LG] UPDATED)
    Generative Adversarial Networks (GANs) have achieved a great success in unsupervised learning. Despite its remarkable empirical performance, there are limited theoretical studies on the statistical properties of GANs. This paper provides approximation and statistical guarantees of GANs for the estimation of data distributions that have densities in a H\"{o}lder space. Our main result shows that, if the generator and discriminator network architectures are properly chosen, GANs are consistent estimators of data distributions under strong discrepancy metrics, such as the Wasserstein-1 distance. Furthermore, when the data distribution exhibits low-dimensional structures, we show that GANs are capable of capturing the unknown low-dimensional structures in data and enjoy a fast statistical convergence, which is free of curse of the ambient dimensionality. Our analysis for low-dimensional data builds upon a universal approximation theory of neural networks with Lipschitz continuity guarantees, which may be of independent interest.  ( 2 min )
    An Equivalence Between Data Poisoning and Byzantine Gradient Attacks. (arXiv:2202.08578v2 [cs.LG] UPDATED)
    To study the resilience of distributed learning, the "Byzantine" literature considers a strong threat model where workers can report arbitrary gradients to the parameter server. Whereas this model helped obtain several fundamental results, it has sometimes been considered unrealistic, when the workers are mostly trustworthy machines. In this paper, we show a surprising equivalence between this model and data poisoning, a threat considered much more realistic. More specifically, we prove that every gradient attack can be reduced to data poisoning, in any personalized federated learning system with PAC guarantees (which we show are both desirable and realistic). This equivalence makes it possible to obtain new impossibility results on the resilience of any "robust" learning algorithm to data poisoning in highly heterogeneous applications, as corollaries of existing impossibility theorems on Byzantine machine learning. Moreover, using our equivalence, we derive a practical attack that we show (theoretically and empirically) can be very effective against classical personalized federated learning models.  ( 2 min )
    High-Dimensional Inference in Bayesian Networks. (arXiv:2112.09217v2 [stat.ML] UPDATED)
    Inference of the marginal probability distribution is defined as the calculation of the probability of a subset of the variables and is relevant for handling missing data and hidden variables. While inference of the marginal probability distribution is crucial for various problems in machine learning and statistics, its exact computation is generally not feasible for categorical variables in Bayesian networks due to the NP-hardness of this task. We develop a divide-and-conquer approach using the graphical properties of Bayesian networks to split the computation of the marginal probability distribution into sub-calculations of lower dimensionality, thus reducing the overall computational complexity. Exploiting this property, we present an efficient and scalable algorithm for calculating the marginal probability distribution for categorical variables. The novel method is compared against state-of-the-art approximate inference methods in a benchmarking study, where it displays superior performance. As an immediate application, we demonstrate how our method can be used to classify incomplete data against Bayesian networks and use this approach for identifying the cancer subtype of kidney cancer patient samples.  ( 2 min )
    Identifying partial mouse brain microscopy images from Allen reference atlas using a contrastively learned semantic space. (arXiv:2109.06662v3 [cs.CV] UPDATED)
    Precise identification of mouse brain microscopy images is a crucial first step when anatomical structures in the mouse brain are to be registered to a reference atlas. Practitioners usually rely on manual comparison of images or tools that assume the presence of complete images. This work explores Siamese Networks as the method for finding corresponding 2D reference atlas plates for given partial 2D mouse brain images. Siamese networks are a class of convolutional neural networks (CNNs) that use weight-shared paths to obtain low dimensional embeddings of pairs of input images. The correspondence between the partial mouse brain image and reference atlas plate is determined based on the distance between low dimensional embeddings of brain slices and atlas plates that are obtained from Siamese networks using contrastive learning. Experiments showed that Siamese CNNs can precisely identify brain slices using the Allen mouse brain atlas when training and testing images come from the same source. They achieved TOP-1 and TOP-5 accuracy of 25% and 100%, respectively, taking only 7.2 seconds to identify 29 images.  ( 3 min )
    Conditional Hierarchical Bayesian Tucker Decomposition for Genetic Data Analysis. (arXiv:1911.12426v5 [cs.LG] UPDATED)
    We develop methods for reducing the dimensionality of large data sets, common in biomedical applications. Learning about patients using genetic data often includes more features than observations, which makes direct supervised learning difficult. One method of reducing the feature space is to use latent Dirichlet allocation to group genetic variants in an unsupervised manner. Latent Dirichlet allocation describes a patient as a mixture of topics corresponding to genetic variants. This can be generalized as a Bayesian tensor decomposition to account for multiple feature variables. Our most significant contributions are with hierarchical topic modeling. We design distinct methods of incorporating hierarchical topic modeling, based on nested Chinese restaurant processes and Pachinko Allocation Machine, into Bayesian tensor decomposition. We apply these models to examine patients with one of four common types of cancer (breast, lung, prostate, and colorectal) and siblings with and without autism spectrum disorder. We linked the genes with their biological pathways and combine this information into a tensor of patients, counts of their genetic variants, and the genes' membership in pathways. We find that our trained models outperform baseline models, with respect to coherence, by up to 40%.  ( 3 min )
    Switching One-Versus-the-Rest Loss to Increase the Margin of Logits for Adversarial Robustness. (arXiv:2207.10283v1 [cs.LG])
    Defending deep neural networks against adversarial examples is a key challenge for AI safety. To improve the robustness effectively, recent methods focus on important data points near the decision boundary in adversarial training. However, these methods are vulnerable to Auto-Attack, which is an ensemble of parameter-free attacks for reliable evaluation. In this paper, we experimentally investigate the causes of their vulnerability and find that existing methods reduce margins between logits for the true label and the other labels while keeping their gradient norms non-small values. Reduced margins and non-small gradient norms cause their vulnerability since the largest logit can be easily flipped by the perturbation. Our experiments also show that the histogram of the logit margins has two peaks, i.e., small and large logit margins. From the observations, we propose switching one-versus-the-rest loss (SOVR), which uses one-versus-the-rest loss when data have small logit margins so that it increases the margins. We find that SOVR increases logit margins more than existing methods while keeping gradient norms small and outperforms them in terms of the robustness against Auto-Attack.  ( 2 min )
    High-Dimensional $L_2$Boosting: Rate of Convergence. (arXiv:1602.08927v3 [stat.ML] UPDATED)
    Boosting is one of the most significant developments in machine learning. This paper studies the rate of convergence of $L_2$Boosting, which is tailored for regression, in a high-dimensional setting. Moreover, we introduce so-called \textquotedblleft post-Boosting\textquotedblright. This is a post-selection estimator which applies ordinary least squares to the variables selected in the first stage by $L_2$Boosting. Another variant is \textquotedblleft Orthogonal Boosting\textquotedblright\ where after each step an orthogonal projection is conducted. We show that both post-$L_2$Boosting and the orthogonal boosting achieve the same rate of convergence as LASSO in a sparse, high-dimensional setting. We show that the rate of convergence of the classical $L_2$Boosting depends on the design matrix described by a sparse eigenvalue constant. To show the latter results, we derive new approximation results for the pure greedy algorithm, based on analyzing the revisiting behavior of $L_2$Boosting. We also introduce feasible rules for early stopping, which can be easily implemented and used in applied work. Our results also allow a direct comparison between LASSO and boosting which has been missing from the literature. Finally, we present simulation studies and applications to illustrate the relevance of our theoretical results and to provide insights into the practical aspects of boosting. In these simulation studies, post-$L_2$Boosting clearly outperforms LASSO.  ( 3 min )
    Estimating value at risk: LSTM vs. GARCH. (arXiv:2207.10539v1 [q-fin.RM])
    Estimating value-at-risk on time series data with possibly heteroscedastic dynamics is a highly challenging task. Typically, we face a small data problem in combination with a high degree of non-linearity, causing difficulties for both classical and machine-learning estimation algorithms. In this paper, we propose a novel value-at-risk estimator using a long short-term memory (LSTM) neural network and compare its performance to benchmark GARCH estimators. Our results indicate that even for a relatively short time series, the LSTM could be used to refine or monitor risk estimation processes and correctly identify the underlying risk dynamics in a non-parametric fashion. We evaluate the estimator on both simulated and market data with a focus on heteroscedasticity, finding that LSTM exhibits a similar performance to GARCH estimators on simulated data, whereas on real market data it is more sensitive towards increasing or decreasing volatility and outperforms all existing estimators of value-at-risk in terms of exception rate and mean quantile score.  ( 2 min )
    On minimax density estimation via measure transport. (arXiv:2207.10231v1 [math.ST])
    We study the convergence properties, in Hellinger and related distances, of nonparametric density estimators based on measure transport. These estimators represent the measure of interest as the pushforward of a chosen reference distribution under a transport map, where the map is chosen via a maximum likelihood objective (equivalently, minimizing an empirical Kullback-Leibler loss) or a penalized version thereof. We establish concentration inequalities for a general class of penalized measure transport estimators, by combining techniques from M-estimation with analytical properties of the transport-based density representation. We then demonstrate the implications of our theory for the case of triangular Knothe-Rosenblatt (KR) transports on the $d$-dimensional unit cube, and show that both penalized and unpenalized versions of such estimators achieve minimax optimal convergence rates over H\"older classes of densities. Specifically, we establish optimal rates for unpenalized nonparametric maximum likelihood estimation over bounded H\"older-type balls, and then for certain Sobolev-penalized estimators and sieved wavelet estimators.  ( 2 min )
    Estimation of Non-Crossing Quantile Regression Process with Deep ReQU Neural Networks. (arXiv:2207.10442v1 [stat.ML])
    We propose a penalized nonparametric approach to estimating the quantile regression process (QRP) in a nonseparable model using rectifier quadratic unit (ReQU) activated deep neural networks and introduce a novel penalty function to enforce non-crossing of quantile regression curves. We establish the non-asymptotic excess risk bounds for the estimated QRP and derive the mean integrated squared error for the estimated QRP under mild smoothness and regularity conditions. To establish these non-asymptotic risk and estimation error bounds, we also develop a new error bound for approximating $C^s$ smooth functions with $s >0$ and their derivatives using ReQU activated neural networks. This is a new approximation result for ReQU networks and is of independent interest and may be useful in other problems. Our numerical experiments demonstrate that the proposed method is competitive with or outperforms two existing methods, including methods using reproducing kernels and random forests, for nonparametric quantile regression.  ( 2 min )
    Bayesian Recurrent Units and the Forward-Backward Algorithm. (arXiv:2207.10486v1 [stat.ML])
    Using Bayes's theorem, we derive a unit-wise recurrence as well as a backward recursion similar to the forward-backward algorithm. The resulting Bayesian recurrent units can be integrated as recurrent neural networks within deep learning frameworks, while retaining a probabilistic interpretation from the direct correspondence with hidden Markov models. Whilst the contribution is mainly theoretical, experiments on speech recognition indicate that adding the derived units at the end of state-of-the-art recurrent architectures can improve the performance at a very low cost in terms of trainable parameters.  ( 2 min )
    Efficient Search of Multiple Neural Architectures with Different Complexities via Importance Sampling. (arXiv:2207.10334v1 [cs.NE])
    Neural architecture search (NAS) aims to automate architecture design processes and improve the performance of deep neural networks. Platform-aware NAS methods consider both performance and complexity and can find well-performing architectures with low computational resources. Although ordinary NAS methods result in tremendous computational costs owing to the repetition of model training, one-shot NAS, which trains the weights of a supernetwork containing all candidate architectures only once during the search process, has been reported to result in a lower search cost. This study focuses on the architecture complexity-aware one-shot NAS that optimizes the objective function composed of the weighted sum of two metrics, such as the predictive performance and number of parameters. In existing methods, the architecture search process must be run multiple times with different coefficients of the weighted sum to obtain multiple architectures with different complexities. This study aims at reducing the search cost associated with finding multiple architectures. The proposed method uses multiple distributions to generate architectures with different complexities and updates each distribution using the samples obtained from multiple distributions based on importance sampling. The proposed method allows us to obtain multiple architectures with different complexities in a single architecture search, resulting in reducing the search cost. The proposed method is applied to the architecture search of convolutional neural networks on the CIAFR-10 and ImageNet datasets. Consequently, compared with baseline methods, the proposed method finds multiple architectures with varying complexities while requiring less computational effort.  ( 3 min )
    Optimal precision for GANs. (arXiv:2207.10541v1 [cs.LG])
    When learning disconnected distributions, Generative adversarial networks (GANs) are known to face model misspecification. Indeed, a continuous mapping from a unimodal latent distribution to a disconnected one is impossible, so GANs necessarily generate samples outside of the support of the target distribution. This raises a fundamental question: what is the latent space partition that minimizes the measure of these areas? Building on a recent result of geometric measure theory, we prove that an optimal GANs must structure its latent space as a 'simplicial cluster' - a Voronoi partition where cells are convex cones - when the dimension of the latent space is larger than the number of modes. In this configuration, each Voronoi cell maps to a distinct mode of the data. We derive both an upper and a lower bound on the optimal precision of GANs learning disconnected manifolds. Interestingly, these two bounds have the same order of decrease: $\sqrt{\log m}$, $m$ being the number of modes. Finally, we perform several experiments to exhibit the geometry of the latent space and experimentally show that GANs have a geometry with similar properties to the theoretical one.  ( 2 min )
    Provably tuning the ElasticNet across instances. (arXiv:2207.10199v1 [cs.LG])
    An important unresolved challenge in the theory of regularization is to set the regularization coefficients of popular techniques like the ElasticNet with general provable guarantees. We consider the problem of tuning the regularization parameters of Ridge regression, LASSO, and the ElasticNet across multiple problem instances, a setting that encompasses both cross-validation and multi-task hyperparameter optimization. We obtain a novel structural result for the ElasticNet which characterizes the loss as a function of the tuning parameters as a piecewise-rational function with algebraic boundaries. We use this to bound the structural complexity of the regularized loss functions and show generalization guarantees for tuning the ElasticNet regression coefficients in the statistical setting. We also consider the more challenging online learning setting, where we show vanishing average expected regret relative to the optimal parameter pair. We further extend our results to tuning classification algorithms obtained by thresholding regression fits regularized by Ridge, LASSO, or ElasticNet. Our results are the first general learning-theoretic guarantees for this important class of problems that avoid strong assumptions on the data distribution. Furthermore, our guarantees hold for both validation and popular information criterion objectives.  ( 2 min )

  • Open

    [D] Which GPU cloud do you use and recommend?
    I'm looking to migrate all my local experiments to some GPU cloud, but I found many options and I know few people who have used some and can give me some useful feedback. I have two contexts: Experiments performed in Jupyter Notebook; DRL experiments using StarCraft II Learning Environment. For the first context I think about using Google Colab Pro, because I already have experience with Google Colab, so it would not be difficult to migrate to the Pro version. In the second case I used my local machine, but I'm out of GPU and the use of my university's supercomputer is absurdly problematic. My monthly budget is $50.00, because the most massive processing I'm going to do on the university's supercomputer. This budget will be used to run experiments of a few hours, experiments of one or more days will use the supercomputer. GPU clouds I found: Lambda Linode Paperspace RunPod Obviously there are big tech clouds (AWS, Google Cloud and Azure), but from what I've seen these other GPU Clouds are usually cheaper and less difficult to use. You who are reading the post could recommend me some Cloud GPU that you have already used? (Clouds with student discounts are welcome) TL;DR: please recommend me some cloud GPU that you have already used. submitted by /u/barash-616 [link] [comments]  ( 89 min )
    [D]Consumer Forecast
    Anyone successfully use the US CPI Index as a feature/external regressor for a time series model? Considering running some tests with this but curious if others have some experience with this. submitted by /u/datajunky624 [link] [comments]  ( 87 min )
    [D] How do collaborations materialize in your research group?
    Typically, I would come up with a problem ("Introduction"), research about it ("Previous work"), and develop a solution ("Methods", "Experiments"). Then, my PI would point out mistakes, raise some questions, polish the paper, etc. With this workflow, I ended up with various two-author papers (me and my PI). There are obvious benefits of including more people in the work/papers, such as having more points of views and having more time to do other things given that the work has been split. But how to do that in practice? I would like to hear how this is done in other groups. For instance, say you come up with a method and need to conduct other experiments to compare it with other methods. You could share your code and let your colleagues write the experiments following the same style and write and execute those experiments. Another potentially useful idea is to have regular meetings to tell each other what you're doing and what problems you are facing. submitted by /u/kuonlp [link] [comments]  ( 92 min )
    [D] ICML 2022 Outstanding Paper Awards 🔥
    It seems that ML twitter is under fire this week 🔥 Two of the recent ICML outstanding paper awards have received major criticisms on Twitter: Paper 1: Bayesian Model Selection, the Marginal Likelihood, and Generalization Paper link: https://proceedings.mlr.press/v162/lotfi22a.html Twitter discussion: https://twitter.com/BlackHC/status/1549832198152683520 https://twitter.com/LotfiSanae/status/1549842925328257025 https://twitter.com/andrewgwils/status/1550120752099180548 Blog containing the critical review: https://blog.blackhc.net/2022/06/bayesian-model-selection-marginal-likehood-generalization/ Paper 2: Privacy for Free: How does Dataset Condensation Help Privacy? Paper link: https://proceedings.mlr.press/v162/dong22c.html Twitter discussion: https://twit…  ( 116 min )
    [D] Super-resolution / image reconstruction aided by reference images
    Are there any models that can say, restore or upscale an image given other references as input? For example, say you have a portrait photo that needs to be improved. You may also have 10 other portraits of the same person, which should be useful information to the model in solving its task accurately. submitted by /u/thegreatjoke [link] [comments]  ( 87 min )
    [D] Hey Reddit! We're a bunch of research scientists and software engineers and we just open sourced a new state-of-the-art AI model that can translate between 200 different languages. We're excited to hear your thoughts so we're hosting an AMA on 07/21/2022 @ 9:00AM PT. Ask Us Anything!
    PROOF: https://i.redd.it/2z42nlnbssc91.jpg We’re part of the team behind Meta AI’s latest AI breakthrough in machine translation with our No Language Left Behind (NLLB) project. It’s a translation system that can support over 200 languages, even if there isn't a lot of text available to learn from. The reality is that a handful of languages dominate the web meaning only a fraction of the world can access content and contribute to the web in their own language. We want to change this by creating more inclusive machine translations systems – ones that unlock access to the web for the more than 4B people around the world that are currently excluded because they do not speak one of the few languages content is available in. Here are a few things about NLLB we’re excited for: Latest breakth…  ( 131 min )
    [N] Diffusers: Introducing Hugging Face's new library for diffusion models.
    Diffusion models have recently gained a lot of interest from the machine learning community. This is partly because diffusion models play an important role for models like DALL-E or Imagen to generate previously unparalleled photorealistic images when prompted on some text. The computer vision community isn't the only one to enjoy the success of diffusion models, as they have also achieved remarkable results in other domains, such as: - video generation - audio synthesis - reinforcement learning However, most recent research on diffusion models, namely Dalle-2 and Imagen, have not been made accessible to machine learning and often remains behind closed doors of large tech companies. This is why we decided to build and open-source 🧨 Diffusers. The objective is twofold: - Centralize the most important, open-sourced research on diffusion models and make them more accessible and easier to use for the community. - Provide the community with simple yet powerful training utilities to build powerful systems, such as Imagen and DALLE, in a transparent, open-sourced fashion so that everybody profits from the new technology. 🧨 Diffusers aims to be a modular toolbox for diffusion techniques, with a focus on: - Inference pipelines- Schedulers- Models- Training examples Check out the library here: https://github.com/huggingface/diffusers Check out a walkthrough colab here: https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb Using a DDPM model and scheduler to generate a church image from noise submitted by /u/jikkii [link] [comments]  ( 89 min )
    [D] Why don't we see faster adoption of FNet: Mixing Tokens with Fourier Transforms?
    When the paper came out last year, I thought that the innovation would quickly be adopted across the board, instead it would seem to be all but forgotten. https://arxiv.org/abs/2105.03824v3 Just for a quick tl;dr, the architecture replaces the self-attention block with a Fourier transform, so no learnable parameters. The enormously lower computation costs greatly outweigh the performance loss. So you can then make the network deeper and/or wider, thus for the same computation cost, have a great overall increase of performance. If I remember correctly, they also tried to replace the self-attention block with some kind of linear transformation, and this performed really poorly, so the key here seems to be the nonlinearity of the Fourier transform. The conclusion here being that the awesomeness of the transformer is in fact related to the greater architecture, and benefit of self-attention is mixing everything together in an interdependent and nonlinear fashion, which is exactly what the Fourier transform is doing. So now, back to my original question, why is this not being adopted more quickly? What am I missing? submitted by /u/MercuriusExMachina [link] [comments]  ( 113 min )
    [D] Why is my proceedings paper not shown in google scholar, or even google search?
    Hello guys, I’m a fresh PhD student. Last month we have a paper accepted by a conference and can be found in its proceedings. However, it didn’t show in my google scholar, while other papers in the same proceedings are. I’m wonder why is this and how should I fix it? submitted by /u/Successful_Paper1542 [link] [comments]  ( 88 min )
    [R] Robust SDE-Based Variational Formulations for Solving Linear PDEs via Deep Learning
    Published at ICML 2022 (If you are also at the conference, feel free to reach out and we can talk about our work.) PDF on ResearchGate / Poster Abstract: The combination of Monte Carlo methods and deep learning has recently led to efficient algorithms for solving partial differential equations (PDEs) in high dimensions. Related learning problems are often stated as variational formulations based on associated stochastic differential equations (SDEs), which allow the minimization of corresponding losses using gradient-based optimization methods. In respective numerical implementations it is therefore crucial to rely on adequate gradient estimators that exhibit low variance in order to reach convergence accurately and swiftly. In this article, we rigorously investigate corresponding numerical aspects that appear in the context of linear Kolmogorov PDEs. In particular, we systematically compare existing deep learning approaches and provide theoretical explanations for their performances. Subsequently, we suggest novel methods that can be shown to be more robust both theoretically and numerically, leading to substantial performance improvements. submitted by /u/julbern [link] [comments]  ( 113 min )
    [D] In an MLP model, if I disable the gradient in some random dense layers, is it normal for the training time to remain the same?
    For context, I'm training a model with LoRA layers in Embedding and Linear layers but the training time does not decrease, although I am using way less trainable weights. submitted by /u/JClub [link] [comments]  ( 89 min )
    [News] Theseus: Meta AI open sourced a library for encoding domain knowledge in end to end AI models
    Domain knowledge can sometimes boost model performance significantly. I have used knowledge graph in some of my projects as pre/post step to improve model performance. But that adds to deployment complexity. Theseus is good step towards end to end AI models that incorporate domain knowledge. It is library for differentiable nonlinear least squares (NLS) that is particularly useful for applications like robotics and computer visions. Read more: https://ai.facebook.com/blog/theseus-a-library-for-encoding-domain-knowledge-in-end-to-end-ai-models/ submitted by /u/ashwan1 [link] [comments]  ( 87 min )
    [P] Reduce your data labeling needs 70% by using Active Learning
    Hey everyone. Massive news from Hasty! Today, we launch the newest Active Learning for labeling feature. In short, when you annotate pictures, our algorithms review all possible images you can label next and sort them according to which ones will bring the most significant potential impact on your model. Independent research showed that such an Active Learning pipeline reduces the amount of data a neural network needs to perform by 70%. Nowadays, vision AI models tend to use hundreds of thousands of images before they start producing excellent results. Most teams spend many hours labeling data for their needs or waiting for an outsource team to annotate the data for them. Our solution for such a conventional, time-consuming, and expensive procedure is the advanced technology in the Data Science field - Active Learning. With it, in combination with our existing Hasty product, we deliver a true pain reliever solution for all your vision AI needs. Learn more by checking out the release update. https://preview.redd.it/crn3czgw9wc91.png?width=1009&format=png&auto=webp&s=6f3331c58fdc2a0bc81e1413940c25a3b0c2db90 submitted by /u/Ierihon_hasty_ai [link] [comments]  ( 93 min )
    [D] How useful are torchvision augmentations? is there a strategy to using some over others?
    There's a lot to try, but also quite time consuming to brute-force compare them all, so I was wondering if there's pros and cons to each transformation for certain contexts. And if they're useful for generating images as well. submitted by /u/ethansmith2000 [link] [comments]  ( 90 min )
    [R] High-Resolution Virtual Try-On with Misalignment and Occlusion-Handled Conditions
    ​ https://preview.redd.it/vqxmxahu7uc91.png?width=5400&format=png&auto=webp&s=bea0ff23fc42d13b1d73d04c8b4c3236fab16a35 submitted by /u/Impressive-Mirror430 [link] [comments]  ( 87 min )
  • Open

    Hi, everyone I'm doing a statistic study about Artificial Intelligence, so I will be forever grateful with you if I can steal 1 minute of your time to complete this survey. Thank you. Hope you enjoy it.
    submitted by /u/KatCelest [link] [comments]  ( 86 min )
    Have any of you applied the python SHAP library to models trained with stablebaselines? I'm looking through their docs and I can't find much about RL models.
    submitted by /u/elonmusk12345_ [link] [comments]  ( 86 min )
    "DayDreamer: World Models for Physical Robot Learning", Wu et al 2022 (world models)
    submitted by /u/gwern [link] [comments]  ( 119 min )
    Deep Learning vs Reinforcement Learning vs Deep Reinforcement Learning
    hi everyone! can anyone explain to me in simple words the difference between the terms mentioned above or provide me with ressources that explains it. i know that DL algorithms use training data sets to learn and then we can use them on new data for prediction or whatever. unlike RL that doesn't require training data it learns through trial and error ... but i don't understand how excatly can the 2 approaches RL and DL work together is the DL used after a few iterations from RL where the agent has collected some exp? what RL limitations does DL tackle? ty in advance and have a good day! submitted by /u/Affectionate_Worth43 [link] [comments]  ( 88 min )
    How can I create observation space with different size of observations for my custom agent?
    Hi everyone, Recently, I was working on creating a custom reinforcement agent agent for my problem. The observation includes 4 observations with different upper and lower bounds that I created with the space.box. However, now I want to give a vector (instead of only current measurement, now I want to include future forecast steps) for one of my observations (the rest is still scalar). I don't know if that is possible or what the best way to do it. Any comments would be appreciated. Does anyone in the field have any suggestions? submitted by /u/ayska_ [link] [comments]  ( 87 min )
    My challenging Gym Environment (Gym decision trees) !
    Hello everyone. This is my first post on this subreddit. I am doing a PhD in interpretability for Reinforcement Learning. I am currently designing RL algorithms that learn decision tree policies at train time. For that purpose I made a small gym benchmark : https://github.com/KohlerHECTOR/gym-decision-trees . It is a simple continuous-2D-states-4-discrete-actions MDP for which the optimal policy is a decision tree of given depth ! The maximum episodic cumulative reward is 500. Let me know if you can retrieve the optimal policy ;) submitted by /u/Hkohler98 [link] [comments]  ( 119 min )
    Confusion with GAIL
    Hello! In my understanding in the original GAIL paper discriminator D is defined following way: D(a,s) = P(student) and 1-D(a,s)=P(expert). This is how i understand equation 18: we want to decrease probabilities for a state-action pairs which are generated by a student thus state-action pairs generated by an expert comes more likely. Thus minimization of the equation 18 makes sense. However in practical implementations they are maximizing the equation 18 given the discriminator as in the original GAIL paper. My question: why we want to increase probabilities for state-action pairs which are generated by a student? ​ https://arxiv.org/pdf/1606.03476.pdf ​ ​ https://preview.redd.it/npk3xe7s6vc91.png?width=720&format=png&auto=webp&s=024ff946d0633c8d9e91914472475e65378a5a20 submitted by /u/SigmaEpsilonDelta [link] [comments]  ( 86 min )
  • Open

    Training Generalist Agents with Multi-Game Decision Transformers
    Posted by Winnie Xu, Student Researcher and Kuang-Huei Lee, Software Engineer, Google Research, Brain Team Current deep reinforcement learning (RL) methods can train specialist artificial agents that excel at decision-making on various individual tasks in specific environments, such as Go or StarCraft. However, little progress has been made to extend these results to generalist agents that would not only be capable of performing many different tasks, but also upon a variety of environments with potentially distinct embodiments. Looking across recent progress in the fields of natural language processing, vision, and generative models (such as PaLM, Imagen, and Flamingo), we see that breakthroughs in making general-purpose models are often achieved by scaling up Transformer-based models an…  ( 25 min )
  • Open

    An API for evaluating the quality of any synthetic dataset
    submitted by /u/Repeat-or [link] [comments]  ( 86 min )
    hmmm
    submitted by /u/TheSilverHound [link] [comments]  ( 89 min )
    Alibaba AI Research Team Introduces ‘DCT-Net’; A Novel Image Translation Architecture For Few-Shot Portrait Stylization
    submitted by /u/ai-lover [link] [comments]  ( 87 min )
    A.I. Digital Graffiti Art || Starryai 占~~~~
    submitted by /u/widgia [link] [comments]  ( 86 min )
    I Created an Automated Finance News Channel with Python and AI
    submitted by /u/kbf_ [link] [comments]  ( 86 min )
    Artificial Intelligence for Business Leaders Webinar
    Artificial Intelligence for Business Leaders Webinar ​ Join Professor Pedram Mokrian to learn how business leaders should think about developing AI solutions. Learn key AI terms, trends, and concepts that inform business strategy. Register for webinar. submitted by /u/Stanford_Online [link] [comments]  ( 86 min )
    Extraterrestrial Emergence | Cinematic Encounter | 4K UHD | 24 FPS
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 86 min )
    A short exploration of how AI might become evil
    submitted by /u/Eth_ai [link] [comments]  ( 86 min )
    AI-designed graphic printed tees inspired by NBA Posters - Side project I've been working on called GraphicAI, thoughts?
    submitted by /u/cityofgoul [link] [comments]  ( 90 min )
    There's a Leak Somewhere made by @rufusdinosaur
    submitted by /u/widgia [link] [comments]  ( 90 min )
    Dimensional DALLE Dude (218 prompt lipsync)
    submitted by /u/Lozmosis [link] [comments]  ( 90 min )
    Free stickers! Dm me and I’ll send it. It’s a high quality 11.5 x 3 bumper sticker. A friend and I made them and are sharing them with the community.
    submitted by /u/A-V-8 [link] [comments]  ( 86 min )
  • Open

    The Role of AI in Ops Management of the Water Industry
    Water is the most abundant asset that humans have. We all use water to survive; it’s also vital for keeping our environment clean. Water provides life and substance. In cities with plenty of water, it’s vital to maintain the quality of life and the economy. This means that water has a great influence on human… Read More »The Role of AI in Ops Management of the Water Industry The post The Role of AI in Ops Management of the Water Industry appeared first on Data Science Central.  ( 21 min )
  • Open

    Kaggle Titanic Survival predicition using NN
    Hey guys! I'm new to deep learning and I'm trying to build a model to predict titanic survivors based on Kaggle's Titanic dataset - https://www.kaggle.com/c/titanic I am trying to improve the accuracy of the model but I'm stuck at around 83% on the training data. I have created the model using Tensorflow Sequential API. I would greatly appreciate any advice on improving the performance of this model! The final training data after removing nulls/one-hot encoding categorical values is of shape (889x12) My model architecture is - Input Layer: 12 neurons Hidden Layer 1: 12 neurons Hidden Layer 2: 6 neurons Output Layer: 1 neuron All the layers use ReLU activation except the output which uses Sigmoid. The optmizer selected is Adam with the default lr, and Binary Cross-entropy loss. I have trained for 200 epochs with a batch size of 32. I have been trying to experiment with different layers/architectures but I haven't been able to get past 83%. Do you guys know any methods to improve this? P.s sorry if I am missing something as I'm a beginner in building NNs, thanks in advance! submitted by /u/cheap_wizard [link] [comments]  ( 88 min )
    DataCamp is offering free access to their platform all week! Try it out now! https://bit.ly/3Q1tTO3
    submitted by /u/joanna58 [link] [comments]  ( 86 min )
  • Open

    Organize your machine learning journey with Amazon SageMaker Experiments and Amazon SageMaker Pipelines
    The process of building a machine learning (ML) model is iterative until you find the candidate model that is performing well and is ready to be deployed. As data scientists iterate through that process, they need a reliable method to easily track experiments to understand how each model version was built and how it performed. […]  ( 10 min )
  • Open

    Shifting Into High Gear: Lunit, Maker of FDA-Cleared AI for Cancer Analysis, Goes Public in Seoul
    South Korean startup Lunit, developer of two FDA-cleared AI models for healthcare, went public this week on the country’s Kosdaq stock market. The move marks the maturity of the Seoul-based company — which was founded in 2013 and has for years been part of the NVIDIA Inception program that nurtures cutting-edge startups. Lunit’s AI software Read article > The post Shifting Into High Gear: Lunit, Maker of FDA-Cleared AI for Cancer Analysis, Goes Public in Seoul appeared first on NVIDIA Blog.  ( 6 min )
    Get Battle Ready With New GeForce NOW Fortnite Reward
    Epic Games is bringing a new Fortnite reward to GeForce NOW, available to all members. Drop from the Battle Bus in Fortnite on GeForce NOW between today and Thursday, Aug. 4, to earn “The Dish-stroyer Pickaxe” in game for free. Members can earn this item by streaming Fortnite on GeForce NOW Read article > The post Get Battle Ready With New GeForce NOW Fortnite Reward appeared first on NVIDIA Blog.  ( 5 min )
    Researchers Use GPUs to Give Earbud Users a ‘Mute Button’ for Background Noise
    Thanks to earbuds you can have calls anywhere while doing anything. The problem: those on the other end of the call hear it all, too, from your roommate’s vacuum cleaner to background conversations at the cafe you’re working from. Now, work by a trio of graduate students at the University of Washington who spent the Read article > The post Researchers Use GPUs to Give Earbud Users a ‘Mute Button’ for Background Noise appeared first on NVIDIA Blog.  ( 5 min )
  • Open

    How AI & ML lens can assist in defence and private security?
    Business approach is changing constantly to all creative possibilities provided by the digital revolution. The big bang of the digital…  ( 11 min )
  • Open

    Detecting Textual Adversarial Examples through Randomized Substitution and Vote. (arXiv:2109.05698v2 [cs.CL] UPDATED)
    A line of work has shown that natural text processing models are vulnerable to adversarial examples. Correspondingly, various defense methods are proposed to mitigate the threat of textual adversarial examples, eg, adversarial training, input transformations, detection, etc. In this work, we treat the optimization process for synonym substitution based textual adversarial attacks as a specific sequence of word replacement, in which each word mutually influences other words. We identify that we could destroy such mutual interaction and eliminate the adversarial perturbation by randomly substituting a word with its synonyms. Based on this observation, we propose a novel textual adversarial example detection method, termed Randomized Substitution and Vote (RS&V), which votes the prediction label by accumulating the logits of k samples generated by randomly substituting the words in the input text with synonyms. The proposed RS&V is generally applicable to any existing neural networks without modification on the architecture or extra training, and it is orthogonal to prior work on making the classification network itself more robust. Empirical evaluations on three benchmark datasets demonstrate that our RS&V could detect the textual adversarial examples more successfully than the existing detection methods while maintaining the high classification accuracy on benign samples.  ( 3 min )
    DataPerf: Benchmarks for Data-Centric AI Development. (arXiv:2207.10062v1 [cs.LG])
    Machine learning (ML) research has generally focused on models, while the most prominent datasets have been employed for everyday ML tasks without regard for the breadth, difficulty, and faithfulness of these datasets to the underlying problem. Neglecting the fundamental importance of datasets has caused major problems involving data cascades in real-world applications and saturation of dataset-driven criteria for model quality, hindering research growth. To solve this problem, we present DataPerf, a benchmark package for evaluating ML datasets and dataset-working algorithms. We intend it to enable the "data ratchet," in which training sets will aid in evaluating test sets on the same problems, and vice versa. Such a feedback-driven strategy will generate a virtuous loop that will accelerate development of data-centric AI. The MLCommons Association will maintain DataPerf.  ( 2 min )
    LSCALE: Latent Space Clustering-Based Active Learning for Node Classification. (arXiv:2012.07065v2 [cs.LG] UPDATED)
    Node classification on graphs is an important task in many practical domains. It usually requires labels for training, which can be difficult or expensive to obtain in practice. Given a budget for labelling, active learning aims to improve performance by carefully choosing which nodes to label. Previous graph active learning methods learn representations using labelled nodes and select some unlabelled nodes for label acquisition. However, they do not fully utilize the representation power present in unlabelled nodes. We argue that the representation power in unlabelled nodes can be useful for active learning and for further improving performance of active learning for node classification. In this paper, we propose a latent space clustering-based active learning framework for node classification (LSCALE), where we fully utilize the representation power in both labelled and unlabelled nodes. Specifically, to select nodes for labelling, our framework uses the K-Medoids clustering algorithm on a latent space based on a dynamic combination of both unsupervised features and supervised features. In addition, we design an incremental clustering module to avoid redundancy between nodes selected at different steps. Extensive experiments on five datasets show that our proposed framework LSCALE consistently and significantly outperforms the stateof-the-art approaches by a large margin.  ( 3 min )
    Measuring and signing fairness as performance under multiple stakeholder distributions. (arXiv:2207.09960v1 [stat.ML])
    As learning machines increase their influence on decisions concerning human lives, analyzing their fairness properties becomes a subject of central importance. Yet, our best tools for measuring the fairness of learning systems are rigid fairness metrics encapsulated as mathematical one-liners, offer limited power to the stakeholders involved in the prediction task, and are easy to manipulate when we exhort excessive pressure to optimize them. To advance these issues, we propose to shift focus from shaping fairness metrics to curating the distributions of examples under which these are computed. In particular, we posit that every claim about fairness should be immediately followed by the tagline "Fair under what examples, and collected by whom?". By highlighting connections to the literature in domain generalization, we propose to measure fairness as the ability of the system to generalize under multiple stress tests -- distributions of examples with social relevance. We encourage each stakeholder to curate one or multiple stress tests containing examples reflecting their (possibly conflicting) interests. The machine passes or fails each stress test by falling short of or exceeding a pre-defined metric value. The test results involve all stakeholders in a discussion about how to improve the learning system, and provide flexible assessments of fairness dependent on context and based on interpretable data. We provide full implementation guidelines for stress testing, illustrate both the benefits and shortcomings of this framework, and introduce a cryptographic scheme to enable a degree of prediction accountability from system providers.  ( 3 min )
    Fixed Points of Cone Mapping with the Application to Neural Networks. (arXiv:2207.09947v1 [math.DS])
    We derive conditions for the existence of fixed points of cone mappings without assuming scalability of functions. Monotonicity and scalability are often inseparable in the literature in the context of searching for fixed points of interference mappings. In applications, such mappings are approximated by non-negative neural networks. It turns out, however, that the process of training non-negative networks requires imposing an artificial constraint on the weights of the model. However, in the case of specific non-negative data, it cannot be said that if the mapping is non-negative, it has only non-negative weights. Therefore, we considered the problem of the existence of fixed points for general neural networks, assuming the conditions of tangency conditions with respect to specific cones. This does not relax the physical assumptions, because even assuming that the input and output are to be non-negative, the weights can have (small, but) less than zero values. Such properties (often found in papers on the interpretability of weights of neural networks) lead to the weakening of the assumptions about the monotonicity or scalability of the mapping associated with the neural network. To the best of our knowledge, this paper is the first to study this phenomenon.  ( 2 min )
    Provable Stochastic Optimization for Global Contrastive Learning: Small Batch Does Not Harm Performance. (arXiv:2202.12387v2 [cs.LG] UPDATED)
    In this paper, we study contrastive learning from an optimization perspective, aiming to analyze and address a fundamental issue of existing contrastive learning methods that either rely on a large batch size or a large dictionary of feature vectors. We consider a global objective for contrastive learning, which contrasts each positive pair with all negative pairs for an anchor point. From the optimization perspective, we explain why existing methods such as SimCLR require a large batch size in order to achieve a satisfactory result. In order to remove such requirement, we propose a memory-efficient Stochastic Optimization algorithm for solving the Global objective of Contrastive Learning of Representations, named SogCLR. We show that its optimization error is negligible under a reasonable condition after a sufficient number of iterations or is diminishing for a slightly different global contrastive objective. Empirically, we demonstrate that SogCLR with small batch size (e.g., 256) can achieve similar performance as SimCLR with large batch size (e.g., 8192) on self-supervised learning task on ImageNet-1K. We also attempt to show that the proposed optimization technique is generic and can be applied to solving other contrastive losses, e.g., two-way contrastive losses for bimodal contrastive learning. The proposed method is implemented in our open-sourced library LibAUC (www.libauc.org).  ( 3 min )
    Kernel Thinning. (arXiv:2105.05842v8 [stat.ML] UPDATED)
    We introduce kernel thinning, a new procedure for compressing a distribution $\mathbb{P}$ more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel $\mathbf{k}$ and $\mathcal{O}(n^2)$ time, kernel thinning compresses an $n$-point approximation to $\mathbb{P}$ into a $\sqrt{n}$-point approximation with comparable worst-case integration error across the associated reproducing kernel Hilbert space. With high probability, the maximum discrepancy in integration error is $\mathcal{O}_d(n^{-1/2}\sqrt{\log n})$ for compactly supported $\mathbb{P}$ and $\mathcal{O}_d(n^{-\frac{1}{2}} (\log n)^{(d+1)/2}\sqrt{\log\log n})$ for sub-exponential $\mathbb{P}$ on $\mathbb{R}^d$. In contrast, an equal-sized i.i.d. sample from $\mathbb{P}$ suffers $\Omega(n^{-1/4})$ integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform $\mathbb{P}$ on $[0,1]^d$ but apply to general distributions on $\mathbb{R}^d$ and a wide range of common kernels. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat\'ern, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning, in dimensions $d=2$ through $100$.  ( 3 min )
    Network Clustering by Embedding of Attribute-augmented Graphs. (arXiv:2109.09367v3 [cs.LG] UPDATED)
    In this paper we propose a new approach to detect clusters in undirected graphs with attributed vertices. The aim is to group vertices which are similar not only in terms of structural connectivity but also in terms of attribute values. We incorporate structural and attribute similarities between the vertices in an augmented graph by creating additional vertices and edges as proposed in [6,38]. The augmented graph is then embedded in a Euclidean space associated to its Laplacian where a modified K-means algorithm is applied to identify clusters. The modified K-means relies on a vector distance measure where to each original vertex we assign a suitable vector-valued set of coordinates depending on both structural connectivity and attribute similarities, so that each original graph vertex is thought as representative of $m+1$ vertices of the augmented graph, if $m$ is the number of vertex attributes. To define the coordinate vectors we employ our recently proposed algorithm based on an adaptive AMG (Algebraic MultiGrid) method, which identifies the coordinate directions in the embedding Euclidean space in terms of algebraically smooth vectors with respect to the augmented graph Laplacian, and thus extending our previous result for graphs without attributes. We analyze the effectiveness of our proposed clustering method by comparison with some well known methods, whose software implementation is freely available, and also with results reported in the literature, on two different types of widely used synthetic graphs and on some real-world attributed graphs.  ( 3 min )
    AI-based Data Preparation and Data Analytics in Healthcare: The Case of Diabetes. (arXiv:2206.06182v2 [cs.LG] UPDATED)
    The Associazione Medici Diabetologi (AMD) collects and manages one of the largest worldwide-available collections of diabetic patient records, also known as the AMD database. This paper presents the initial results of an ongoing project whose focus is the application of Artificial Intelligence and Machine Learning techniques for conceptualizing, cleaning, and analyzing such an important and valuable dataset, with the goal of providing predictive insights to better support diabetologists in their diagnostic and therapeutic choices.
    Policy Optimization for Markov Games: Unified Framework and Faster Convergence. (arXiv:2206.02640v3 [cs.LG] UPDATED)
    This paper studies policy optimization algorithms for multi-agent reinforcement learning. We begin by proposing an algorithm framework for two-player zero-sum Markov Games in the full-information setting, where each iteration consists of a policy update step at each state using a certain matrix game algorithm, and a value update step with a certain learning rate. This framework unifies many existing and new policy optimization algorithms. We show that the state-wise average policy of this algorithm converges to an approximate Nash equilibrium (NE) of the game, as long as the matrix game algorithms achieve low weighted regret at each state, with respect to weights determined by the speed of the value updates. Next, we show that this framework instantiated with the Optimistic Follow-The-Regularized-Leader (OFTRL) algorithm at each state (and smooth value updates) can find an $\mathcal{\widetilde{O}}(T^{-5/6})$ approximate NE in $T$ iterations, and a similar algorithm with slightly modified value update rule achieves a faster $\mathcal{\widetilde{O}}(T^{-1})$ convergence rate. These improve over the current best $\mathcal{\widetilde{O}}(T^{-1/2})$ rate of symmetric policy optimization type algorithms. We also extend this algorithm to multi-player general-sum Markov Games and show an $\mathcal{\widetilde{O}}(T^{-3/4})$ convergence rate to Coarse Correlated Equilibria (CCE). Finally, we provide a numerical example to verify our theory and investigate the importance of smooth value updates, and find that using "eager" value updates instead (equivalent to the independent natural policy gradient algorithm) may significantly slow down the convergence, even on a simple game with $H=2$ layers.
    Federated Self-supervised Speech Representations: Are We There Yet?. (arXiv:2204.02804v2 [cs.SD] UPDATED)
    The ubiquity of microphone-enabled devices has lead to large amounts of unlabelled audio data being produced at the edge. The integration of self-supervised learning (SSL) and federated learning (FL) into one coherent system can potentially offer data privacy guarantees while also advancing the quality and robustness of speech representations. In this paper, we provide a first-of-its-kind systematic study of the feasibility and complexities for training speech SSL models under FL scenarios from the perspective of algorithms, hardware, and systems limits. Despite the high potential of their combination, we find existing system constraints and algorithmic behaviour make SSL and FL systems nearly impossible to build today. Yet critically, our results indicate specific performance bottlenecks and research opportunities that would allow this situation to be reversed. While our analysis suggests that, given existing trends in hardware, hybrid SSL and FL speech systems will not be viable until 2027. We believe this study can act as a roadmap to accelerate work towards reaching this milestone much earlier.
    Contingency-constrained economic dispatch with safe reinforcement learning. (arXiv:2205.06212v2 [eess.SY] UPDATED)
    Future power systems will rely heavily on micro grids with a high share of decentralised renewable energy sources and energy storage systems. The high complexity and uncertainty in this context might make conventional power dispatch strategies infeasible. Reinforcement-learning based (RL) controllers can address this challenge, however, cannot themselves provide safety guarantees, preventing their deployment in practice. To overcome this limitation, we propose a formally validated RL controller for economic dispatch. We extend conventional constraints by a time-dependent constraint encoding the islanding contingency. The contingency constraint is computed using set-based backwards reachability analysis and actions of the RL agent are verified through a safety layer. Unsafe actions are projected into the safe action space while leveraging constrained zonotope set representations for computational efficiency. The developed approach is demonstrated on a residential use case using real-world measurements.
    Generalized Normalizing Flows via Markov Chains. (arXiv:2111.12506v3 [cs.LG] UPDATED)
    Normalizing flows, diffusion normalizing flows and variational autoencoders are powerful generative models. This chapter provides a unified framework to handle these approaches via Markov chains. We consider stochastic normalizing flows as a pair of Markov chains fulfilling some properties and show how many state-of-the-art models for data generation fit into this framework. Indeed numerical simulations show that including stochastic layers improves the expressivity of the network and allows for generating multimodal distributions from unimodal ones. The Markov chains point of view enables us to couple both deterministic layers as invertible neural networks and stochastic layers as Metropolis-Hasting layers, Langevin layers, variational autoencoders and diffusion normalizing flows in a mathematically sound way. Our framework establishes a useful mathematical tool to combine the various approaches.
    Increasing the Cost of Model Extraction with Calibrated Proof of Work. (arXiv:2201.09243v2 [cs.CR] UPDATED)
    In model extraction attacks, adversaries can steal a machine learning model exposed via a public API by repeatedly querying it and adjusting their own model based on obtained predictions. To prevent model stealing, existing defenses focus on detecting malicious queries, truncating, or distorting outputs, thus necessarily introducing a tradeoff between robustness and model utility for legitimate users. Instead, we propose to impede model extraction by requiring users to complete a proof-of-work before they can read the model's predictions. This deters attackers by greatly increasing (even up to 100x) the computational effort needed to leverage query access for model extraction. Since we calibrate the effort required to complete the proof-of-work to each query, this only introduces a slight overhead for regular users (up to 2x). To achieve this, our calibration applies tools from differential privacy to measure the information revealed by a query. Our method requires no modification of the victim model and can be applied by machine learning practitioners to guard their publicly exposed models against being easily stolen.
    Connect, Not Collapse: Explaining Contrastive Learning for Unsupervised Domain Adaptation. (arXiv:2204.00570v3 [cs.LG] UPDATED)
    We consider unsupervised domain adaptation (UDA), where labeled data from a source domain (e.g., photographs) and unlabeled data from a target domain (e.g., sketches) are used to learn a classifier for the target domain. Conventional UDA methods (e.g., domain adversarial training) learn domain-invariant features to improve generalization to the target domain. In this paper, we show that contrastive pre-training, which learns features on unlabeled source and target data and then fine-tunes on labeled source data, is competitive with strong UDA methods. However, we find that contrastive pre-training does not learn domain-invariant features, diverging from conventional UDA intuitions. We show theoretically that contrastive pre-training can learn features that vary subtantially across domains but still generalize to the target domain, by disentangling domain and class information. Our results suggest that domain invariance is not necessary for UDA. We empirically validate our theory on benchmark vision datasets.
    Differentiable Time-Frequency Scattering on GPU. (arXiv:2204.08269v4 [cs.SD] UPDATED)
    Joint time-frequency scattering (JTFS) is a convolutional operator in the time-frequency domain which extracts spectrotemporal modulations at various rates and scales. It offers an idealized model of spectrotemporal receptive fields (STRF) in the primary auditory cortex, and thus may serve as a biological plausible surrogate for human perceptual judgments at the scale of isolated audio events. Yet, prior implementations of JTFS and STRF have remained outside of the standard toolkit of perceptual similarity measures and evaluation methods for audio generation. We trace this issue down to three limitations: differentiability, speed, and flexibility. In this paper, we present an implementation of time-frequency scattering in Python. Unlike prior implementations, ours accommodates NumPy, PyTorch, and TensorFlow as backends and is thus portable on both CPU and GPU. We demonstrate the usefulness of JTFS via three applications: unsupervised manifold learning of spectrotemporal modulations, supervised classification of musical instruments, and texture resynthesis of bioacoustic sounds.
    Generalized Kernel Thinning. (arXiv:2110.01593v5 [stat.ML] UPDATED)
    The kernel thinning (KT) algorithm of Dwivedi and Mackey (2021) compresses a probability distribution more effectively than independent sampling by targeting a reproducing kernel Hilbert space (RKHS) and leveraging a less smooth square-root kernel. Here we provide four improvements. First, we show that KT applied directly to the target RKHS yields tighter, dimension-free guarantees for any kernel, any distribution, and any fixed function in the RKHS. Second, we show that, for analytic kernels like Gaussian, inverse multiquadric, and sinc, target KT admits maximum mean discrepancy (MMD) guarantees comparable to or better than those of square-root KT without making explicit use of a square-root kernel. Third, we prove that KT with a fractional power kernel yields better-than-Monte-Carlo MMD guarantees for non-smooth kernels, like Laplace and Mat\'ern, that do not have square-roots. Fourth, we establish that KT applied to a sum of the target and power kernels (a procedure we call KT+) simultaneously inherits the improved MMD guarantees of power KT and the tighter individual function guarantees of target KT. In our experiments with target KT and KT+, we witness significant improvements in integration error even in $100$ dimensions and when compressing challenging differential equation posteriors.
    Temporal Difference Learning for Model Predictive Control. (arXiv:2203.04955v2 [cs.LG] UPDATED)
    Data-driven model predictive control has two key advantages over model-free methods: a potential for improved sample efficiency through model learning, and better performance as computational budget for planning increases. However, it is both costly to plan over long horizons and challenging to obtain an accurate model of the environment. In this work, we combine the strengths of model-free and model-based methods. We use a learned task-oriented latent dynamics model for local trajectory optimization over a short horizon, and use a learned terminal value function to estimate long-term return, both of which are learned jointly by temporal difference learning. Our method, TD-MPC, achieves superior sample efficiency and asymptotic performance over prior work on both state and image-based continuous control tasks from DMControl and Meta-World. Code and video results are available at https://nicklashansen.github.io/td-mpc.
    Robust Simulation-Based Inference in Cosmology with Bayesian Neural Networks. (arXiv:2207.08435v2 [astro-ph.CO] UPDATED)
    Simulation-based inference (SBI) is rapidly establishing itself as a standard machine learning technique for analyzing data in cosmological surveys. Despite continual improvements to the quality of density estimation by learned models, applications of such techniques to real data are entirely reliant on the generalization power of neural networks far outside the training distribution, which is mostly unconstrained. Due to the imperfections in scientist-created simulations, and the large computational expense of generating all possible parameter combinations, SBI methods in cosmology are vulnerable to such generalization issues. Here, we discuss the effects of both issues, and show how using a Bayesian neural network framework for training SBI can mitigate biases, and result in more reliable inference outside the training set. We introduce cosmoSWAG, the first application of Stochastic Weight Averaging to cosmology, and apply it to SBI trained for inference on the cosmic microwave background.
    Application of QUBO solver using black-box optimization to structural design for resonance avoidance. (arXiv:2204.04906v2 [cond-mat.mes-hall] UPDATED)
    Quadratic unconstrained binary optimization (QUBO) solvers can be applied to design an optimal structure to avoid resonance. QUBO algorithms that work on a classical or quantum device have succeeded in some industrial applications. However, their applications are still limited due to the difficulty of transforming from the original optimization problem to QUBO. Recently, black-box optimization (BBO) methods have been proposed to tackle this issue using a machine learning technique and a Bayesian treatment for combinatorial optimization. We employed the BBO methods to design a printed circuit board for resonance avoidance. This design problem is formulated to maximize natural frequency and simultaneously minimize the number of mounting points. The natural frequency, which is the bottleneck for the QUBO formulation, is approximated to a quadratic model in the BBO method. We demonstrated that BBO using a factorization machine shows good performance in both the calculation time and the success probability of finding the optimal solution. Our results can open up QUBO solvers' potential for other applications in structural designs.
    Deep Learning to Estimate Permeability using Geophysical Data. (arXiv:2110.10077v2 [physics.geo-ph] UPDATED)
    Time-lapse electrical resistivity tomography (ERT) is a popular geophysical method to estimate three-dimensional (3D) permeability fields from electrical potential difference measurements. Traditional inversion and data assimilation methods are used to ingest this ERT data into hydrogeophysical models to estimate permeability. Due to ill-posedness and the curse of dimensionality, existing inversion strategies provide poor estimates and low resolution of the 3D permeability field. Recent advances in deep learning provide us with powerful algorithms to overcome this challenge. This paper presents a deep learning (DL) framework to estimate the 3D subsurface permeability from time-lapse ERT data. To test the feasibility of the proposed framework, we train DL-enabled inverse models on simulation data. Subsurface process models based on hydrogeophysics are used to generate this synthetic data for deep learning analyses. Results show that proposed weak supervised learning can capture salient spatial features in the 3D permeability field. Quantitatively, the average mean squared error (in terms of the natural log) on the strongly labeled training, validation, and test datasets is less than 0.5. The R2-score (global metric) is greater than 0.75, and the percent error in each cell (local metric) is less than 10%. Finally, an added benefit in terms of computational cost is that the proposed DL-based inverse model is at least O(104) times faster than running a forward model. Note that traditional inversion may require multiple forward model simulations (e.g., in the order of 10 to 1000), which are very expensive. This computational savings (O(105) - O(107)) makes the proposed DL-based inverse model attractive for subsurface imaging and real-time ERT monitoring applications due to fast and yet reasonably accurate estimations of the permeability field.
    Reinforcement Learning For Survival, A Clinically Motivated Method For Critically Ill Patients. (arXiv:2207.08040v2 [cs.LG] UPDATED)
    There has been considerable interest in leveraging RL and stochastic control methods to learn optimal treatment strategies for critically ill patients, directly from observational data. However, there is significant ambiguity on the control objective and on the best reward choice for the standard RL objective. In this work, we propose a clinically motivated control objective for critically ill patients, for which the value functions have a simple medical interpretation. Further, we present theoretical results and adapt our method to a practical Deep RL algorithm, which can be used alongside any value based Deep RL method. We experiment on a large sepsis cohort and show that our method produces results consistent with clinical knowledge.
    Investigation of a Data Split Strategy Involving the Time Axis in Adverse Event Prediction Using Machine Learning. (arXiv:2204.08682v2 [cs.LG] UPDATED)
    Adverse events are a serious issue in drug development and many prediction methods using machine learning have been developed. The random split cross-validation is the de facto standard for model building and evaluation in machine learning, but care should be taken in adverse event prediction because this approach does not match to the real-world situation. The time split, which uses the time axis, is considered suitable for real-world prediction. However, the differences in model performance obtained using the time and random splits are not clear due to the lack of the comparable studies. To understand the differences, we compared the model performance between the time and random splits using nine types of compound information as input, eight adverse events as targets, and six machine learning algorithms. The random split showed higher area under the curve values than did the time split for six of eight targets. The chemical spaces of the training and test datasets of the time split were similar, suggesting that the concept of applicability domain is insufficient to explain the differences derived from the splitting. The area under the curve differences were smaller for the protein interaction than for the other datasets. Subsequent detailed analyses suggested the danger of confounding in the use of knowledge-based information in the time split. These findings indicate the importance of understanding the differences between the time and random splits in adverse event prediction and strongly suggest that appropriate use of the splitting strategies and interpretation of results are necessary for the real-world prediction of adverse events. We provide analysis code and datasets used in the present study (https://github.com/mizuno-group/AE_prediction).
    On the Robustness of Quality Measures for GANs. (arXiv:2201.13019v2 [cs.LG] UPDATED)
    This work evaluates the robustness of quality measures of generative models such as Inception Score (IS) and Fr\'echet Inception Distance (FID). Analogous to the vulnerability of deep models against a variety of adversarial attacks, we show that such metrics can also be manipulated by additive pixel perturbations. Our experiments indicate that one can generate a distribution of images with very high scores but low perceptual quality. Conversely, one can optimize for small imperceptible perturbations that, when added to real world images, deteriorate their scores. We further extend our evaluation to generative models themselves, including the state of the art network StyleGANv2. We show the vulnerability of both the generative model and the FID against additive perturbations in the latent space. Finally, we show that the FID can be robustified by simply replacing the standard Inception with a robust Inception. We validate the effectiveness of the robustified metric through extensive experiments, showing it is more robust against manipulation.
    Speech Enhancement Guided by Contextual Articulatory Information. (arXiv:2011.07442v3 [cs.SD] UPDATED)
    Previous studies have confirmed that by augmenting acoustic features with the place/manner of articulatory features, the speech enhancement (SE) process can be guided to consider the articulatory properties of the input speech when performing enhancement to attain performance improvements. Thus, the contextual information of articulatory attributes has additional information that can further benefit SE. This study proposed an SE system that improved performance by optimizing contextual articulatory information in enhanced speech through joint training of the SE model with an end-to-end automatic speech recognition (E2E-ASR) model and predicting the sequence of broad phone classes (BPCs) instead of the phoneme/word sequences. We developed two strategies to train the SE system based on BPC-based ASR: multi-task learning and deep-feature training strategies. Experimental results from speech denoising, speech dereverberation, and impaired speech enhancement tasks confirmed that the contextual articulatory information facilitates the SE system to improve enhancement results. Moreover, in contrast to another SE system trained with monophonic ASR, the BPC-based ASR (providing contextual articulatory information) can achieve superior SE performance at different signal-to-noise ratio (SNR) levels.
    Contextformer: A Transformer with Spatio-Channel Attention for Context Modeling in Learned Image Compression. (arXiv:2203.02452v2 [eess.IV] UPDATED)
    Entropy modeling is a key component for high-performance image compression algorithms. Recent developments in autoregressive context modeling helped learning-based methods to surpass their classical counterparts. However, the performance of those models can be further improved due to the underexploited spatio-channel dependencies in latent space, and the suboptimal implementation of context adaptivity. Inspired by the adaptive characteristics of the transformers, we propose a transformer-based context model, named Contextformer, which generalizes the de facto standard attention mechanism to spatio-channel attention. We replace the context model of a modern compression framework with the Contextformer and test it on the widely used Kodak, CLIC2020, and Tecnick image datasets. Our experimental results show that the proposed model provides up to 11% rate savings compared to the standard Versatile Video Coding (VVC) Test Model (VTM) 16.2, and outperforms various learning-based models in terms of PSNR and MS-SSIM.
    StolenEncoder: Stealing Pre-trained Encoders in Self-supervised Learning. (arXiv:2201.05889v2 [cs.CR] UPDATED)
    Pre-trained encoders are general-purpose feature extractors that can be used for many downstream tasks. Recent progress in self-supervised learning can pre-train highly effective encoders using a large volume of unlabeled data, leading to the emerging encoder as a service (EaaS). A pre-trained encoder may be deemed confidential because its training requires lots of data and computation resources as well as its public release may facilitate misuse of AI, e.g., for deepfakes generation. In this paper, we propose the first attack called StolenEncoder to steal pre-trained image encoders. We evaluate StolenEncoder on multiple target encoders pre-trained by ourselves and three real-world target encoders including the ImageNet encoder pre-trained by Google, CLIP encoder pre-trained by OpenAI, and Clarifai's General Embedding encoder deployed as a paid EaaS. Our results show that our stolen encoders have similar functionality with the target encoders. In particular, the downstream classifiers built upon a target encoder and a stolen one have similar accuracy. Moreover, stealing a target encoder using StolenEncoder requires much less data and computation resources than pre-training it from scratch. We also explore three defenses that perturb feature vectors produced by a target encoder. Our results show these defenses are not enough to mitigate StolenEncoder.
    Reconfigurable Intelligent Surface Empowered Over-the-Air Federated Edge Learning. (arXiv:2109.02353v2 [cs.IT] UPDATED)
    Federated edge learning (FEEL) has emerged as a revolutionary paradigm to develop AI services at the edge of 6G wireless networks as it supports collaborative model training at a massive number of mobile devices. However, model communication over wireless channels, especially in uplink model uploading of FEEL, has been widely recognized as a bottleneck that critically limits the efficiency of FEEL. Although over-the-air computation can alleviate the excessive cost of radio resources in FEEL model uploading, practical implementations of over-the-air FEEL still suffer from several challenges, including strong straggler issues, large communication overheads, and potential privacy leakage. In this article, we study these challenges in over-the-air FEEL and leverage reconfigurable intelligent surface (RIS), a key enabler of future wireless systems, to address these challenges. We study the state-of-the-art solutions on RIS-empowered FEEL and explore the promising research opportunities for adopting RIS to enhance FEEL performance.
    Effects of Epileptiform Activity on Discharge Outcome in Critically Ill Patients. (arXiv:2203.04920v2 [stat.ME] UPDATED)
    Many fundamental problems affecting the care of critically ill patients lead to similar analytical challenges: physicians cannot easily estimate the effects of at-risk medical conditions or treatments because the causal effects of medical conditions and drugs are entangled. They also cannot easily perform studies: there are not enough high-quality data for high-dimensional observational causal inference, and RCTs often cannot ethically be conducted. However, mechanistic knowledge is available, including how drugs are absorbed into the body, and the combination of this knowledge with the limited data could potentially suffice -- if we knew how to combine them. In this work, we present a framework for interpretable estimation of causal effects for critically ill patients under exactly these complex conditions: interactions between drugs and observations over time, patient data sets that are not large, and mechanistic knowledge that can substitute for lack of data. We apply this framework to an extremely important problem affecting critically ill patients, namely the effect of seizures and other potentially harmful electrical events in the brain (called epileptiform activity -- EA) on outcomes. Given the high stakes involved and the high noise in the data, interpretability is critical for troubleshooting such complex problems. Interpretability of our matched groups allowed neurologists to perform chart reviews to verify the quality of our causal analysis. For instance, our work indicates that a patient who experiences a high level of seizure-like activity (75% high EA burden) and is untreated for a six-hour window, has, on average, a 16.7% increased chance of adverse outcomes such as severe brain damage, lifetime disability, or death. We find that patients with mild but long-lasting EA (average EA burden >= 50%) have their risk of an adverse outcome increased by 11.2%.
    Universal Regular Conditional Distributions. (arXiv:2105.07743v3 [cs.LG] UPDATED)
    We introduce a deep learning model which can generically approximate regular conditional distributions (RCDs). The proposed model operates in three phases: first linearizes inputs from a given metric space $\mathcal{X}$ to $\mathbb{R}^d$ via a feature map then, these linearized features are processed by a deep feedforward neural network, and the network's outputs are then translated to the $1$-Wasserstein space $\mathcal{P}_1(\mathbb{R}^D)$ via a probabilistic extension of the attention mechanism introduced by Bahdanau et al. (2014). We find that the models built using our framework can approximate any continuous function from $\mathbb{R}^d$ to $\mathcal{P}_1(\mathbb{R}^D)$ uniformly on compact sets, quantitatively. We identify two ways of avoiding the curse of dimensionality when approximating $\mathcal{P}_1(\mathbb{R}^D)$-valued functions. The first strategy describes functions in $C(\mathbb{R}^d,\mathcal{P}_1(\mathbb{R}^D))$ which can be efficiently approximated on any compact subset of $\mathbb{R}^d$. The second approach describes compact subsets of $\mathbb{R}^d$, on which any most in $C(\mathbb{R}^d,\mathcal{P}_1(\mathbb{R}^D))$ can be efficiently approximated. The results are verified experimentally.
    A Discontinuity Capturing Shallow Neural Network for Elliptic Interface Problems. (arXiv:2106.05587v2 [math.NA] UPDATED)
    In this paper, a new Discontinuity Capturing Shallow Neural Network (DCSNN) for approximating $d$-dimensional piecewise continuous functions and for solving elliptic interface problems is developed. There are three novel features in the present network; namely, (i) jump discontinuities are accurately captured, (ii) it is completely shallow, comprising only one hidden layer, (iii) it is completely mesh-free for solving partial differential equations. The crucial idea here is that a $d$-dimensional piecewise continuous function can be extended to a continuous function defined in $(d+1)$-dimensional space, where the augmented coordinate variable labels the pieces of each sub-domain. We then construct a shallow neural network to express this new function. Since only one hidden layer is employed, the number of training parameters (weights and biases) scales linearly with the dimension and the neurons used in the hidden layer. For solving elliptic interface problems, the network is trained by minimizing the mean square error loss that consists of the residual of the governing equation, boundary condition, and the interface jump conditions. We perform a series of numerical tests to demonstrate the accuracy of the present network. Our DCSNN model is efficient due to only a moderate number of parameters needed to be trained (a few hundred parameters used throughout all numerical examples), and the results indicate good accuracy. Compared with the results obtained by the traditional grid-based immersed interface method (IIM), which is designed particularly for elliptic interface problems, our network model shows a better accuracy than IIM. We conclude by solving a six-dimensional problem to demonstrate the capability of the present network for high-dimensional applications.
    Align-Deform-Subtract: An Interventional Framework for Explaining Object Differences. (arXiv:2203.04694v2 [cs.CV] UPDATED)
    Given two object images, how can we explain their differences in terms of the underlying object properties? To address this question, we propose Align-Deform-Subtract (ADS) -- an interventional framework for explaining object differences. By leveraging semantic alignments in image-space as counterfactual interventions on the underlying object properties, ADS iteratively quantifies and removes differences in object properties. The result is a set of "disentangled" error measures which explain object differences in terms of the underlying properties. Experiments on real and synthetic data illustrate the efficacy of the framework.
    A density peaks clustering algorithm with sparse search and K-d tree. (arXiv:2203.00973v2 [stat.ML] UPDATED)
    Density peaks clustering has become a nova of clustering algorithm because of its simplicity and practicality. However, there is one main drawback: it is time-consuming due to its high computational complexity. Herein, a density peaks clustering algorithm with sparse search and K-d tree is developed to solve this problem. Firstly, a sparse distance matrix is calculated by using K-d tree to replace the original full rank distance matrix, so as to accelerate the calculation of local density. Secondly, a sparse search strategy is proposed to accelerate the computation of relative-separation with the intersection between the set of $k$ nearest neighbors and the set consisting of the data points with larger local density for any data point. Furthermore, a second-order difference method for decision values is adopted to determine the cluster centers adaptively. Finally, experiments are carried out on datasets with different distribution characteristics, by comparing with other six state-of-the-art clustering algorithms. It is proved that the algorithm can effectively reduce the computational complexity of the original DPC from $O(n^2K)$ to $O(n(n^{1-1/K}+k))$. Especially for larger datasets, the efficiency is elevated more remarkably. Moreover, the clustering accuracy is also improved to a certain extent. Therefore, it can be concluded that the overall performance of the newly proposed algorithm is excellent.
    Error-in-variables modelling for operator learning. (arXiv:2204.10909v2 [cs.LG] UPDATED)
    Deep operator learning has emerged as a promising tool for reduced-order modelling and PDE model discovery. Leveraging the expressive power of deep neural networks, especially in high dimensions, such methods learn the mapping between functional state variables. While proposed methods have assumed noise only in the dependent variables, experimental and numerical data for operator learning typically exhibit noise in the independent variables as well, since both variables represent signals that are subject to measurement error. In regression on scalar data, failure to account for noisy independent variables can lead to biased parameter estimates. With noisy independent variables, linear models fitted via ordinary least squares (OLS) will show attenuation bias, wherein the slope will be underestimated. In this work, we derive an analogue of attenuation bias for linear operator regression with white noise in both the independent and dependent variables. In the nonlinear setting, we computationally demonstrate underprediction of the action of the Burgers operator in the presence of noise in the independent variable. We propose error-in-variables (EiV) models for two operator regression methods, MOR-Physics and DeepONet, and demonstrate that these new models reduce bias in the presence of noisy independent variables for a variety of operator learning problems. Considering the Burgers operator in 1D and 2D, we demonstrate that EiV operator learning robustly recovers operators in high-noise regimes that defeat OLS operator learning. We also introduce an EiV model for time-evolving PDE discovery and show that OLS and EiV perform similarly in learning the Kuramoto-Sivashinsky evolution operator from corrupted data, suggesting that the effect of bias in OLS operator learning depends on the regularity of the target operator.
    TREND: Truncated Generalized Normal Density Estimation of Inception Embeddings for GAN Evaluation. (arXiv:2104.14767v2 [cs.CV] UPDATED)
    Evaluating image generation models such as generative adversarial networks (GANs) is a challenging problem. A common approach is to compare the distributions of the set of ground truth images and the set of generated test images. The Frech\'et Inception distance is one of the most widely used metrics for evaluation of GANs, which assumes that the features from a trained Inception model for a set of images follow a normal distribution. In this paper, we argue that this is an over-simplified assumption, which may lead to unreliable evaluation results, and more accurate density estimation can be achieved using a truncated generalized normal distribution. Based on this, we propose a novel metric for accurate evaluation of GANs, named TREND (TRuncated gEneralized Normal Density estimation of inception embeddings). We demonstrate that our approach significantly reduces errors of density estimation, which consequently eliminates the risk of faulty evaluation results. Furthermore, we show that the proposed metric significantly improves robustness of evaluation results against variation of the number of image samples.
    StARformer: Transformer with State-Action-Reward Representations for Visual Reinforcement Learning. (arXiv:2110.06206v2 [cs.LG] UPDATED)
    Reinforcement Learning (RL) can be considered as a sequence modeling task: given a sequence of past state-action-reward experiences, an agent predicts a sequence of next actions. In this work, we propose State-Action-Reward Transformer (StARformer) for visual RL, which explicitly models short-term state-action-reward representations (StAR-representations), essentially introducing a Markovian-like inductive bias to improve long-term modeling. Our approach first extracts StAR-representations by self-attending image state patches, action, and reward tokens within a short temporal window. These are then combined with pure image state representations -- extracted as convolutional features, to perform self-attention over the whole sequence. Our experiments show that StARformer outperforms the state-of-the-art Transformer-based method on image-based Atari and DeepMind Control Suite benchmarks, in both offline-RL and imitation learning settings. StARformer is also more compliant with longer sequences of inputs. Our code is available at https://github.com/elicassion/StARformer.
    DDPG based on multi-scale strokes for financial time series trading strategy. (arXiv:2207.10071v1 [q-fin.TR])
    With the development of artificial intelligence,more and more financial practitioners apply deep reinforcement learning to financial trading strategies.However,It is difficult to extract accurate features due to the characteristics of considerable noise,highly non-stationary,and non-linearity of single-scale time series,which makes it hard to obtain high returns.In this paper,we extract a multi-scale feature matrix on multiple time scales of financial time series,according to the classic financial theory-Chan Theory,and put forward to an approach of multi-scale stroke deep deterministic policy gradient reinforcement learning model(MSSDDPG)to search for the optimal trading strategy.We carried out experiments on the datasets of the Dow Jones,S&P 500 of U.S. stocks, and China's CSI 300,SSE Composite,evaluate the performance of our approach compared with turtle trading strategy, Deep Q-learning(DQN)reinforcement learning strategy,and deep deterministic policy gradient (DDPG) reinforcement learning strategy.The result shows that our approach gets the best performance in China CSI 300,SSE Composite,and get an outstanding result in Dow Jones,S&P 500 of U.S.
    Learning Algebraic Representation for Systematic Generalization in Abstract Reasoning. (arXiv:2111.12990v2 [cs.AI] UPDATED)
    Is intelligence realized by connectionist or classicist? While connectionist approaches have achieved superhuman performance, there has been growing evidence that such task-specific superiority is particularly fragile in systematic generalization. This observation lies in the central debate between connectionist and classicist, wherein the latter continually advocates an algebraic treatment in cognitive architectures. In this work, we follow the classicist's call and propose a hybrid approach to improve systematic generalization in reasoning. Specifically, we showcase a prototype with algebraic representation for the abstract spatial-temporal reasoning task of Raven's Progressive Matrices (RPM) and present the ALgebra-Aware Neuro-Semi-Symbolic (ALANS) learner. The ALANS learner is motivated by abstract algebra and the representation theory. It consists of a neural visual perception frontend and an algebraic abstract reasoning backend: the frontend summarizes the visual information from object-based representation, while the backend transforms it into an algebraic structure and induces the hidden operator on the fly. The induced operator is later executed to predict the answer's representation, and the choice most similar to the prediction is selected as the solution. Extensive experiments show that by incorporating an algebraic treatment, the ALANS learner outperforms various pure connectionist models in domains requiring systematic generalization. We further show the generative nature of the learned algebraic representation; it can be decoded by isomorphism to generate an answer.
    Learning Convolutional Neural Networks in the Frequency Domain. (arXiv:2204.06718v10 [cs.CV] UPDATED)
    Convolutional neural network (CNN) has achieved impressive success in computer vision during the past few decades. The image convolution operation helps CNNs to get good performance on image-related tasks. However, the image convolution has high computation complexity and hard to be implemented. This paper proposes the CEMNet, which can be trained in the frequency domain. The most important motivation of this research is that we can use the straightforward element-wise multiplication operation to replace the image convolution in the frequency domain based on the Cross-Correlation Theorem, which obviously reduces the computation complexity. We further introduce a Weight Fixation mechanism to alleviate the problem of over-fitting, and analyze the working behavior of Batch Normalization, Leaky ReLU, and Dropout in the frequency domain to design their counterparts for CEMNet. Also, to deal with complex inputs brought by Discrete Fourier Transform, we design a two-branches network structure for CEMNet. Experimental results imply that CEMNet achieves good performance on MNIST and CIFAR-10 databases.
    An Adaptive Human Driver Model for Realistic Race Car Simulations. (arXiv:2203.01909v2 [cs.LG] UPDATED)
    Engineering a high-performance race car requires a direct consideration of the human driver using real-world tests or Human-Driver-in-the-Loop simulations. Apart from that, offline simulations with human-like race driver models could make this vehicle development process more effective and efficient but are hard to obtain due to various challenges. With this work, we intend to provide a better understanding of race driver behavior and introduce an adaptive human race driver model based on imitation learning. Using existing findings and an interview with a professional race engineer, we identify fundamental adaptation mechanisms and how drivers learn to optimize lap time on a new track. Subsequently, we use these insights to develop generalization and adaptation techniques for a recently presented probabilistic driver modeling approach and evaluate it using data from professional race drivers and a state-of-the-art race car simulator. We show that our framework can create realistic driving line distributions on unseen race tracks with almost human-like performance. Moreover, our driver model optimizes its driving lap by lap, correcting driving errors from previous laps while achieving faster lap times. This work contributes to a better understanding and modeling of the human driver, aiming to expedite simulation methods in the modern vehicle development process and potentially supporting automated driving and racing technologies.
    Discriminator-Weighted Offline Imitation Learning from Suboptimal Demonstrations. (arXiv:2207.10050v1 [cs.LG])
    We study the problem of offline Imitation Learning (IL) where an agent aims to learn an optimal expert behavior policy without additional online environment interactions. Instead, the agent is provided with a supplementary offline dataset from suboptimal behaviors. Prior works that address this problem either require that expert data occupies the majority proportion of the offline dataset, or need to learn a reward function and perform offline reinforcement learning (RL) afterwards. In this paper, we aim to address the problem without additional steps of reward learning and offline RL training for the case when demonstrations contain a large proportion of suboptimal data. Built upon behavioral cloning (BC), we introduce an additional discriminator to distinguish expert and non-expert data. We propose a cooperation framework to boost the learning of both tasks, Based on this framework, we design a new IL algorithm, where the outputs of discriminator serve as the weights of the BC loss. Experimental results show that our proposed algorithm achieves higher returns and faster training speed compared to baseline algorithms.
    Learning Pedestrian Group Representations for Multi-modal Trajectory Prediction. (arXiv:2207.09953v1 [cs.CV])
    Modeling the dynamics of people walking is a problem of long-standing interest in computer vision. Many previous works involving pedestrian trajectory prediction define a particular set of individual actions to implicitly model group actions. In this paper, we present a novel architecture named GP-Graph which has collective group representations for effective pedestrian trajectory prediction in crowded environments, and is compatible with all types of existing approaches. A key idea of GP-Graph is to model both individual-wise and group-wise relations as graph representations. To do this, GP-Graph first learns to assign each pedestrian into the most likely behavior group. Using this assignment information, GP-Graph then forms both intra- and inter-group interactions as graphs, accounting for human-human relations within a group and group-group relations, respectively. To be specific, for the intra-group interaction, we mask pedestrian graph edges out of an associated group. We also propose group pooling&unpooling operations to represent a group with multiple pedestrians as one graph node. Lastly, GP-Graph infers a probability map for socially-acceptable future trajectories from the integrated features of both group interactions. Moreover, we introduce a group-level latent vector sampling to ensure collective inferences over a set of possible future trajectories. Extensive experiments are conducted to validate the effectiveness of our architecture, which demonstrates consistent performance improvements with publicly available benchmarks. Code is publicly available at https://github.com/inhwanbae/GPGraph.
    Extending Environments To Measure Self-Reflection In Reinforcement Learning. (arXiv:2110.06890v3 [cs.AI] UPDATED)
    We consider an extended notion of reinforcement learning in which the environment can simulate the agent and base its outputs on the agent's hypothetical behavior. Since good performance usually requires paying attention to whatever things the environment's outputs are based on, we argue that for an agent to achieve on-average good performance across many such extended environments, it is necessary for the agent to self-reflect. Thus weighted-average performance over the space of all suitably well-behaved extended environments could be considered a way of measuring how self-reflective an agent is. We give examples of extended environments and introduce a simple transformation which experimentally seems to increase some standard RL agents' performance in a certain type of extended environment.
    Backdoor Attacks on the DNN Interpretation System. (arXiv:2011.10698v3 [cs.CR] UPDATED)
    Interpretability is crucial to understand the inner workings of deep neural networks (DNNs) and many interpretation methods generate saliency maps that highlight parts of the input image that contribute the most to the prediction made by the DNN. In this paper we design a backdoor attack that alters the saliency map produced by the network for an input image only with injected trigger that is invisible to the naked eye while maintaining the prediction accuracy. The attack relies on injecting poisoned data with a trigger into the training data set. The saliency maps are incorporated in the penalty term of the objective function that is used to train a deep model and its influence on model training is conditioned upon the presence of a trigger. We design two types of attacks: targeted attack that enforces a specific modification of the saliency map and untargeted attack when the importance scores of the top pixels from the original saliency map are significantly reduced. We perform empirical evaluation of the proposed backdoor attacks on gradient-based and gradient-free interpretation methods for a variety of deep learning architectures. We show that our attacks constitute a serious security threat when deploying deep learning models developed by untrusty sources. Finally, in the Supplement we demonstrate that the proposed methodology can be used in an inverted setting, where the correct saliency map can be obtained only in the presence of a trigger (key), effectively making the interpretation system available only to selected users.
    Online Evasion Attacks on Recurrent Models:The Power of Hallucinating the Future. (arXiv:2207.09912v1 [cs.CR])
    Recurrent models are frequently being used in online tasks such as autonomous driving, and a comprehensive study of their vulnerability is called for. Existing research is limited in generality only addressing application-specific vulnerability or making implausible assumptions such as the knowledge of future input. In this paper, we present a general attack framework for online tasks incorporating the unique constraints of the online setting different from offline tasks. Our framework is versatile in that it covers time-varying adversarial objectives and various optimization constraints, allowing for a comprehensive study of robustness. Using the framework, we also present a novel white-box attack called Predictive Attack that `hallucinates' the future. The attack achieves 98 percent of the performance of the ideal but infeasible clairvoyant attack on average. We validate the effectiveness of the proposed framework and attacks through various experiments.
    Pretraining a Neural Network before Knowing Its Architecture. (arXiv:2207.10049v1 [cs.CV])
    Training large neural networks is possible by training a smaller hypernetwork that predicts parameters for the large ones. A recently released Graph HyperNetwork (GHN) trained this way on one million smaller ImageNet architectures is able to predict parameters for large unseen networks such as ResNet-50. While networks with predicted parameters lose performance on the source task, the predicted parameters have been found useful for fine-tuning on other tasks. We study if fine-tuning based on the same GHN is still useful on novel strong architectures that were published after the GHN had been trained. We found that for recent architectures such as ConvNeXt, GHN initialization becomes less useful than for ResNet-50. One potential reason is the increased distribution shift of novel architectures from those used to train the GHN. We also found that the predicted parameters lack the diversity necessary to successfully fine-tune parameters with gradient descent. We alleviate this limitation by applying simple post-processing techniques to predicted parameters before fine-tuning them on a target task and improve fine-tuning of ResNet-50 and ConvNeXt.
    MANI-Rank: Multiple Attribute and Intersectional Group Fairness for Consensus Ranking. (arXiv:2207.10020v1 [cs.CY])
    Combining the preferences of many rankers into one single consensus ranking is critical for consequential applications from hiring and admissions to lending. While group fairness has been extensively studied for classification, group fairness in rankings and in particular rank aggregation remains in its infancy. Recent work introduced the concept of fair rank aggregation for combining rankings but restricted to the case when candidates have a single binary protected attribute, i.e., they fall into two groups only. Yet it remains an open problem how to create a consensus ranking that represents the preferences of all rankers while ensuring fair treatment for candidates with multiple protected attributes such as gender, race, and nationality. In this work, we are the first to define and solve this open Multi-attribute Fair Consensus Ranking (MFCR) problem. As a foundation, we design novel group fairness criteria for rankings, called MANI-RANK, ensuring fair treatment of groups defined by individual protected attributes and their intersection. Leveraging the MANI-RANK criteria, we develop a series of algorithms that for the first time tackle the MFCR problem. Our experimental study with a rich variety of consensus scenarios demonstrates our MFCR methodology is the only approach to achieve both intersectional and protected attribute fairness while also representing the preferences expressed through many base rankings. Our real-world case study on merit scholarships illustrates the effectiveness of our MFCR methods to mitigate bias across multiple protected attributes and their intersections. This is an extended version of "MANI-Rank: Multiple Attribute and Intersectional Group Fairness for Consensus Ranking", to appear in ICDE 2022.
    Generative and discriminative training of Boltzmann machine through Quantum annealing. (arXiv:2002.00792v3 [quant-ph] UPDATED)
    A hybrid quantum-classical method for learning Boltzmann machines (BM) for a generative and discriminative task is presented. Boltzmann machines are undirected graphs with a network of visible and hidden nodes where the former is used as the reading site while the latter is used to manipulate visible states' probability. In Generative BM, the samples of visible data imitate the probability distribution of a given data set. In contrast, the visible sites of discriminative BM are treated as Input/Output (I/O) reading sites where the conditional probability of output state is optimized for a given set of input states. The cost function for learning BM is defined as a weighted sum of Kullback-Leibler (KL) divergence and Negative conditional Log-Likelihood (NCLL), adjusted using a hyperparamter. Here, the KL Divergence is the cost for generative learning, and NCLL is the cost for discriminative learning. A Stochastic Newton-Raphson optimization scheme is presented. The gradients and the Hessians are approximated using direct samples of BM obtained through Quantum annealing (QA). Quantum annealers are hardware representing the physics of the Ising model that operates on low but finite temperature. This temperature affects the probability distribution of the BM; however, its value is unknown. Previous efforts have focused on estimating this unknown temperature through regression of theoretical Boltzmann energies of sampled states with the probability of states sampled by the actual hardware. This assumes that the control parameter change does not affect the system temperature, however, this is not usually the case. Instead, an approach that works on the probability distribution of samples, instead of the energies, is proposed to estimate the optimal parameter set. This ensures that the optimal set can be obtained from a single run.
    Exploration of Parameter Spaces Assisted by Machine Learning. (arXiv:2207.09959v1 [hep-ph])
    We showcase a variety of functions and classes that implement sampling procedures with improved exploration of the parameter space assisted by machine learning. Special attention is paid to setting sane defaults with the objective that adjustments required by different problems remain minimal. This collection of routines can be employed for different types of analysis, from finding bounds on the parameter space to accumulating samples in areas of interest. In particular, we discuss two methods assisted by incorporating different machine learning models: regression and classification. We show that a machine learning classifier can provide higher efficiency for exploring the parameter space. Also, we introduce a boosting technique to improve the slow convergence at the start of the process. The use of these routines is better explained with the help of a few examples that illustrate the type of results one can obtain. We also include examples of the code used to obtain the examples as well as descriptions of the adjustments that can be made to adapt the calculation to other problems. We finalize by showing the impact of these techniques when exploring the parameter space of the two Higgs doublet model that matches the measured Higgs Boson signal strength. The code used for this paper and instructions on how to use it are available on the web.
    BYEL : Bootstrap on Your Emotion Latent. (arXiv:2207.10003v1 [cs.LG])
    According to the problem of dataset construction cost for training in deep learning and the development of generative models, more and more researches are being conducted to train with synthetic data and to inference using real data. We propose emotion aware Self-Supervised Learning using ABAW's Learning Synthetic Data (LSD) dataset. We pre-train our method to LSD dataset as a self-supervised learning and then use the same LSD dataset to do downstream training on the emotion classification task as a supervised learning. As a result, a higher result(0.63) than baseline(0.5) was obtained.
    Flood Inflow Forecast Using L2-norm Ensemble Weighting Sea Surface Feature. (arXiv:2112.03108v2 [stat.ML] UPDATED)
    It is important to forecast dam inflow for flood damage mitigation. The hydrograph provides critical information such as the start time, peak level, and volume. Particularly, dam management requires a 6-h lead time of the dam inflow forecast based on a future hydrograph. The authors propose novel target inflow weights to create an ocean feature vector extracted from the analyzed images of the sea surface. We extracted 4,096 elements of the dimension vector in the fc6 layer of the pre-trained VGG16 network. Subsequently, we reduced it to three dimensions of t-SNE. Furthermore, we created the principal component of the sea temperature weights using PCA. We found that these weights contribute to the stability of predictor importance by numerical experiments. As base regression models, we calibrate the least squares with kernel expansion, the quantile random forest minimized out-of bag error, and the support vector regression with a polynomial kernel. When we compute the predictor importance, we visualize the stability of each variable importance introduced by our proposed weights, compared with other results without weights. We apply our method to a dam at Kanto region in Japan and focus on the trained term from 2007 to 2018, with a limited flood term from June to October. We test the accuracy over the 2019 flood term. Finally, we present the applied results and further statistical learning for unknown flood forecast.
    AI Fairness: from Principles to Practice. (arXiv:2207.09833v1 [cs.CY])
    This paper summarizes and evaluates various approaches, methods, and techniques for pursuing fairness in artificial intelligence (AI) systems. It examines the merits and shortcomings of these measures and proposes practical guidelines for defining, measuring, and preventing bias in AI. In particular, it cautions against some of the simplistic, yet common, methods for evaluating bias in AI systems, and offers more sophisticated and effective alternatives. The paper also addresses widespread controversies and confusions in the field by providing a common language among different stakeholders of high-impact AI systems. It describes various trade-offs involving AI fairness, and provides practical recommendations for balancing them. It offers techniques for evaluating the costs and benefits of fairness targets, and defines the role of human judgment in setting these targets. This paper provides discussions and guidelines for AI practitioners, organization leaders, and policymakers, as well as various links to additional materials for a more technical audience. Numerous real-world examples are provided to clarify the concepts, challenges, and recommendations from a practical perspective.
    NeuralNEB -- Neural Networks can find Reaction Paths Fast. (arXiv:2207.09971v1 [physics.comp-ph])
    Machine Learning (ML) models have, in contrast to their usefulness in molecular dynamics studies, had limited success as surrogate potentials for reaction barrier search. It is due to the scarcity of training data in relevant transition state regions of chemical space. Currently, available datasets for training ML models on small molecular systems almost exclusively contain configurations at or near equilibrium. In this work, we present the dataset Transition1x containing 9.6 million Density Functional Theory (DFT) calculations of forces and energies of molecular configurations on and around reaction pathways at the wB97x/6-31G(d) level of theory. The data was generated by running Nudged Elastic Band (NEB) calculations with DFT on 10k reactions while saving intermediate calculations. We train state-of-the-art equivariant graph message-passing neural network models on Transition1x and cross-validate on the popular ANI1x and QM9 datasets. We show that ML models cannot learn features in transition-state regions solely by training on hitherto popular benchmark datasets. Transition1x is a new challenging benchmark that will provide an important step towards developing next-generation ML force fields that also work far away from equilibrium configurations and reactive systems.
    Discover and Mitigate Unknown Biases with Debiasing Alternate Networks. (arXiv:2207.10077v1 [cs.CV])
    Deep image classifiers have been found to learn biases from datasets. To mitigate the biases, most previous methods require labels of protected attributes (e.g., age, skin tone) as full-supervision, which has two limitations: 1) it is infeasible when the labels are unavailable; 2) they are incapable of mitigating unknown biases -- biases that humans do not preconceive. To resolve those problems, we propose Debiasing Alternate Networks (DebiAN), which comprises two networks -- a Discoverer and a Classifier. By training in an alternate manner, the discoverer tries to find multiple unknown biases of the classifier without any annotations of biases, and the classifier aims at unlearning the biases identified by the discoverer. While previous works evaluate debiasing results in terms of a single bias, we create Multi-Color MNIST dataset to better benchmark mitigation of multiple biases in a multi-bias setting, which not only reveals the problems in previous methods but also demonstrates the advantage of DebiAN in identifying and mitigating multiple biases simultaneously. We further conduct extensive experiments on real-world datasets, showing that the discoverer in DebiAN can identify unknown biases that may be hard to be found by humans. Regarding debiasing, DebiAN achieves strong bias mitigation performance.
    Self-supervised learning methods and applications in medical imaging analysis: A survey. (arXiv:2109.08685v3 [eess.IV] UPDATED)
    The scarcity of high-quality annotated medical imaging datasets is a major problem that collides with machine learning applications in the field of medical imaging analysis and impedes its advancement. Self-supervised learning is a recent training paradigm that enables learning robust representations without the need for human annotation which can be considered an effective solution for the scarcity of annotated medical data. This article reviews the state-of-the-art research directions in self-supervised learning approaches for image data with a concentration on their applications in the field of medical imaging analysis. The article covers a set of the most recent self-supervised learning methods from the computer vision field as they are applicable to the medical imaging analysis and categorize them as predictive, generative, and contrastive approaches. Moreover, the article covers 40 of the most recent research papers in the field of self-supervised learning in medical imaging analysis aiming at shedding the light on the recent innovation in the field. Finally, the article concludes with possible future research directions in the field.
    Quantifying the Effect of Feedback Frequency in Interactive Reinforcement Learning for Robotic Tasks. (arXiv:2207.09845v1 [cs.RO])
    Reinforcement learning (RL) has become widely adopted in robot control. Despite many successes, one major persisting problem can be very low data efficiency. One solution is interactive feedback, which has been shown to speed up RL considerably. As a result, there is an abundance of different strategies, which are, however, primarily tested on discrete grid-world and small scale optimal control scenarios. In the literature, there is no consensus about which feedback frequency is optimal or at which time the feedback is most beneficial. To resolve these discrepancies we isolate and quantify the effect of feedback frequency in robotic tasks with continuous state and action spaces. The experiments encompass inverse kinematics learning for robotic manipulator arms of different complexity. We show that seemingly contradictory reported phenomena occur at different complexity levels. Furthermore, our results suggest that no single ideal feedback frequency exists. Rather that feedback frequency should be changed as the agent's proficiency in the task increases.
    Differentiable Agent-based Epidemiology. (arXiv:2207.09714v1 [cs.LG])
    Mechanistic simulators are an indispensable tool for epidemiology to explore the behavior of complex, dynamic infections under varying conditions and navigate uncertain environments. ODE-based models are the dominant paradigm that enable fast simulations and are tractable to gradient-based optimization, but make simplifying assumptions about population homogeneity. Agent-based models (ABMs) are an increasingly popular alternative paradigm that can represent the heterogeneity of contact interactions with granular detail and agency of individual behavior. However, conventional ABM frameworks are not differentiable and present challenges in scalability; due to which it is non-trivial to connect them to auxiliary data sources easily. In this paper we introduce GradABM which is a new scalable, fast and differentiable design for ABMs. GradABM runs simulations in few seconds on commodity hardware and enables fast forward and differentiable inverse simulations. This makes it amenable to be merged with deep neural networks and seamlessly integrate heterogeneous data sources to help with calibration, forecasting and policy evaluation. We demonstrate the efficacy of GradABM via extensive experiments with real COVID-19 and influenza datasets. We are optimistic this work will bring ABM and AI communities closer together.
    Introducing Auxiliary Text Query-modifier to Content-based Audio Retrieval. (arXiv:2207.09732v1 [eess.AS])
    The amount of audio data available on public websites is growing rapidly, and an efficient mechanism for accessing the desired data is necessary. We propose a content-based audio retrieval method that can retrieve a target audio that is similar to but slightly different from the query audio by introducing auxiliary textual information which describes the difference between the query and target audio. While the range of conventional content-based audio retrieval is limited to audio that is similar to the query audio, the proposed method can adjust the retrieval range by adding an embedding of the auxiliary text query-modifier to the embedding of the query sample audio in a shared latent space. To evaluate our method, we built a dataset comprising two different audio clips and the text that describes the difference. The experimental results show that the proposed method retrieves the paired audio more accurately than the baseline. We also confirmed based on visualization that the proposed method obtains the shared latent space in which the audio difference and the corresponding text are represented as similar embedding vectors.
    Distributionally Robust Batch Contextual Bandits. (arXiv:2006.05630v5 [cs.LG] UPDATED)
    Policy learning using historical observational data is an important problem that has found widespread applications. Examples include selecting offers, prices, advertisements to send to customers, as well as selecting which medication to prescribe to a patient. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment that has generated the data -- an assumption that is often false or too coarse an approximation. In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data. We first present a policy evaluation procedure that allows us to assess how well the policy does under the worst-case environment shift. We then establish a central limit theorem type guarantee for this proposed policy evaluation scheme. Leveraging this evaluation scheme, we further propose a novel learning algorithm that is able to learn a policy that is robust to adversarial perturbations and unknown covariate shifts with a performance guarantee based on the theory of uniform convergence. Finally, we empirically test the effectiveness of our proposed algorithm in synthetic datasets and demonstrate that it provides the robustness that is missing using standard policy learning algorithms. We conclude the paper by providing a comprehensive application of our methods in the context of a real-world voting dataset.
    Digital Twin-based Intrusion Detection for Industrial Control Systems. (arXiv:2207.09999v1 [cs.CR])
    Digital twins have recently gained significant interest in simulation, optimization, and predictive maintenance of Industrial Control Systems (ICS). Recent studies discuss the possibility of using digital twins for intrusion detection in industrial systems. Accordingly, this study contributes to a digital twin-based security framework for industrial control systems, extending its capabilities for simulation of attacks and defense mechanisms. Four types of process-aware attack scenarios are implemented on a standalone open-source digital twin of an industrial filling plant: command injection, network Denial of Service (DoS), calculated measurement modification, and naive measurement modification. A stacked ensemble classifier is proposed as the real-time intrusion detection, based on the offline evaluation of eight supervised machine learning algorithms. The designed stacked model outperforms previous methods in terms of F1-Score and accuracy, by combining the predictions of various algorithms, while it can detect and classify intrusions in near real-time (0.1 seconds). This study also discusses the practicality and benefits of the proposed digital twin-based security framework.
    A Novel Neural Network Training Method for Autonomous Driving Using Semi-Pseudo-Labels and 3D Data Augmentations. (arXiv:2207.09869v1 [cs.CV])
    Training neural networks to perform 3D object detection for autonomous driving requires a large amount of diverse annotated data. However, obtaining training data with sufficient quality and quantity is expensive and sometimes impossible due to human and sensor constraints. Therefore, a novel solution is needed for extending current training methods to overcome this limitation and enable accurate 3D object detection. Our solution for the above-mentioned problem combines semi-pseudo-labeling and novel 3D augmentations. For demonstrating the applicability of the proposed method, we have designed a convolutional neural network for 3D object detection which can significantly increase the detection range in comparison with the training data distribution.
    Deep Reinforcement Learning for Market Making Under a Hawkes Process-Based Limit Order Book Model. (arXiv:2207.09951v1 [q-fin.GN])
    The stochastic control problem of optimal market making is among the central problems in quantitative finance. In this paper, a deep reinforcement learning-based controller is trained on a weakly consistent, multivariate Hawkes process-based limit order book simulator to obtain market making controls. The proposed approach leverages the advantages of Monte Carlo backtesting and contributes to the line of research on market making under weakly consistent limit order book models. The ensuing deep reinforcement learning controller is compared to multiple market making benchmarks, with the results indicating its superior performance with respect to various risk-reward metrics, even under significant transaction costs.
    Predictive Object-Centric Process Monitoring. (arXiv:2207.10017v1 [cs.AI])
    The automation and digitalization of business processes has resulted in large amounts of data captured in information systems, which can aid businesses in understanding their processes better, improve workflows, or provide operational support. By making predictions about ongoing processes, bottlenecks can be identified and resources reallocated, as well as insights gained into the state of a process instance (case). Traditionally, data is extracted from systems in the form of an event log with a single identifying case notion, such as an order id for an Order to Cash (O2C) process. However, real processes often have multiple object types, for example, order, item, and package, so a format that forces the use of a single case notion does not reflect the underlying relations in the data. The Object-Centric Event Log (OCEL) format was introduced to correctly capture this information. The state-of-the-art predictive methods have been tailored to only traditional event logs. This thesis shows that a prediction method utilizing Generative Adversarial Networks (GAN), Long Short-Term Memory (LSTM) architectures, and Sequence to Sequence models (Seq2seq), can be augmented with the rich data contained in OCEL. Objects in OCEL can have attributes that are useful in predicting the next event and timestamp, such as a priority class attribute for an object type package indicating slower or faster processing. In the metrics of sequence similarity of predicted remaining events and mean absolute error (MAE) of the timestamp, the approach in this thesis matches or exceeds previous research, depending on whether selected object attributes are useful features for the model. Additionally, this thesis provides a web interface to predict the next sequence of activities from user input.
    Automated machine learning for borehole resistivity measurements. (arXiv:2207.09849v1 [cs.LG])
    Deep neural networks (DNNs) offer a real-time solution for the inversion of borehole resistivity measurements to approximate forward and inverse operators. It is possible to use extremely large DNNs to approximate the operators, but it demands a considerable training time. Moreover, evaluating the network after training also requires a significant amount of memory and processing power. In addition, we may overfit the model. In this work, we propose a scoring function that accounts for the accuracy and size of the DNNs compared to a reference DNN that provides a good approximation for the operators. Using this scoring function, we use DNN architecture search algorithms to obtain a quasi-optimal DNN smaller than the reference network; hence, it requires less computational effort during training and evaluation. The quasi-optimal DNN delivers comparable accuracy to the original large DNN.
    Learning Counterfactually Invariant Predictors. (arXiv:2207.09768v1 [cs.LG])
    We propose a method to learn predictors that are invariant under counterfactual changes of certain covariates. This method is useful when the prediction target is causally influenced by covariates that should not affect the predictor output. For instance, an object recognition model may be influenced by position, orientation, or scale of the object itself. We address the problem of training predictors that are explicitly counterfactually invariant to changes of such covariates. We propose a model-agnostic regularization term based on conditional kernel mean embeddings, to enforce counterfactual invariance during training. We prove the soundness of our method, which can handle mixed categorical and continuous multi-variate attributes. Empirical results on synthetic and real-world data demonstrate the efficacy of our method in a variety of settings.
    Stream-based active learning with linear models. (arXiv:2207.09874v1 [stat.ML])
    The proliferation of automated data collection schemes and the advances in sensorics are increasing the amount of data we are able to monitor in real-time. However, given the high annotation costs and the time required by quality inspections, data is often available in an unlabeled form. This is fostering the use of active learning for the development of soft sensors and predictive models. In production, instead of performing random inspections to obtain product information, labels are collected by evaluating the information content of the unlabeled data. Several query strategy frameworks for regression have been proposed in the literature but most of the focus has been dedicated to the static pool-based scenario. In this work, we propose a new strategy for the stream-based scenario, where instances are sequentially offered to the learner, which must instantaneously decide whether to perform the quality check to obtain the label or discard the instance. The approach is inspired by the optimal experimental design theory and the iterative aspect of the decision-making process is tackled by setting a threshold on the informativeness of the unlabeled data points. The proposed approach is evaluated using numerical simulations and the Tennessee Eastman Process simulator. The results confirm that selecting the examples suggested by the proposed algorithm allows for a faster reduction in the prediction error.
    Intrinsic dimension estimation for discrete metrics. (arXiv:2207.09688v1 [stat.ML])
    Real world-datasets characterized by discrete features are ubiquitous: from categorical surveys to clinical questionnaires, from unweighted networks to DNA sequences. Nevertheless, the most common unsupervised dimensional reduction methods are designed for continuous spaces, and their use for discrete spaces can lead to errors and biases. In this letter we introduce an algorithm to infer the intrinsic dimension (ID) of datasets embedded in discrete spaces. We demonstrate its accuracy on benchmark datasets, and we apply it to analyze a metagenomic dataset for species fingerprinting, finding a surprisingly small ID, of order 2. This suggests that evolutive pressure acts on a low-dimensional manifold despite the high-dimensionality of sequences' space.
    The Poisson binomial mechanism for secure and private federated learning. (arXiv:2207.09916v1 [cs.CR])
    We introduce the Poisson Binomial mechanism (PBM), a discrete differential privacy mechanism for distributed mean estimation (DME) with applications to federated learning and analytics. We provide a tight analysis of its privacy guarantees, showing that it achieves the same privacy-accuracy trade-offs as the continuous Gaussian mechanism. Our analysis is based on a novel bound on the R\'enyi divergence of two Poisson binomial distributions that may be of independent interest. Unlike previous discrete DP schemes based on additive noise, our mechanism encodes local information into a parameter of the binomial distribution, and hence the output distribution is discrete with bounded support. Moreover, the support does not increase as the privacy budget $\varepsilon \rightarrow 0$ as in the case of additive schemes which require the addition of more noise to achieve higher privacy; on the contrary, the support becomes smaller as $\varepsilon \rightarrow 0$. The bounded support enables us to combine our mechanism with secure aggregation (SecAgg), a multi-party cryptographic protocol, without the need of performing modular clipping which results in an unbiased estimator of the sum of the local vectors. This in turn allows us to apply it in the private FL setting and provide an upper bound on the convergence rate of the SGD algorithm. Moreover, since the support of the output distribution becomes smaller as $\varepsilon \rightarrow 0$, the communication cost of our scheme decreases with the privacy constraint $\varepsilon$, outperforming all previous distributed DP schemes based on additive noise in the high privacy or low communication regimes.
    Operation-Level Performance Benchmarking of Graph Neural Networks for Scientific Applications. (arXiv:2207.09955v1 [cs.LG])
    As Graph Neural Networks (GNNs) increase in popularity for scientific machine learning, their training and inference efficiency is becoming increasingly critical. Additionally, the deep learning field as a whole is trending towards wider and deeper networks, and ever increasing data sizes, to the point where hard hardware bottlenecks are often encountered. Emerging specialty hardware platforms provide an exciting solution to this problem. In this paper, we systematically profile and select low-level operations pertinent to GNNs for scientific computing implemented in the Pytorch Geometric software framework. These are then rigorously benchmarked on NVIDIA A100 GPUs for several various combinations of input values, including tensor sparsity. We then analyze these results for each operation. At a high level, we conclude that on NVIDIA systems: (1) confounding bottlenecks such as memory inefficiency often dominate runtime costs moreso than data sparsity alone, (2) native Pytorch operations are often as or more competitive than their Pytorch Geometric equivalents, especially at low to moderate levels of input data sparsity, and (3) many operations central to state-of-the-art GNN architectures have little to no optimization for sparsity. We hope that these results serve as a baseline for those developing these operations on specialized hardware and that our subsequent analysis helps to facilitate future software and hardware based optimizations of these operations and thus scalable GNN performance as a whole.
    Probable Domain Generalization via Quantile Risk Minimization. (arXiv:2207.09944v1 [stat.ML])
    Domain generalization (DG) seeks predictors which perform well on unseen test distributions by leveraging labeled training data from multiple related distributions or domains. To achieve this, the standard formulation optimizes for worst-case performance over the set of all possible domains. However, with worst-case shifts very unlikely in practice, this generally leads to overly-conservative solutions. In fact, a recent study found that no DG algorithm outperformed empirical risk minimization in terms of average performance. In this work, we argue that DG is neither a worst-case problem nor an average-case problem, but rather a probabilistic one. To this end, we propose a probabilistic framework for DG, which we call Probable Domain Generalization, wherein our key idea is that distribution shifts seen during training should inform us of probable shifts at test time. To realize this, we explicitly relate training and test domains as draws from the same underlying meta-distribution, and propose a new optimization problem -- Quantile Risk Minimization (QRM) -- which requires that predictors generalize with high probability. We then prove that QRM: (i) produces predictors that generalize to new domains with a desired probability, given sufficiently many domains and samples; and (ii) recovers the causal predictor as the desired probability of generalization approaches one. In our experiments, we introduce a more holistic quantile-focused evaluation protocol for DG, and show that our algorithms outperform state-of-the-art baselines on real and synthetic data.
    Estimating Model Performance under Domain Shifts with Class-Specific Confidence Scores. (arXiv:2207.09957v1 [cs.CV])
    Machine learning models are typically deployed in a test setting that differs from the training setting, potentially leading to decreased model performance because of domain shift. If we could estimate the performance that a pre-trained model would achieve on data from a specific deployment setting, for example a certain clinic, we could judge whether the model could safely be deployed or if its performance degrades unacceptably on the specific data. Existing approaches estimate this based on the confidence of predictions made on unlabeled test data from the deployment's domain. We find existing methods struggle with data that present class imbalance, because the methods used to calibrate confidence do not account for bias induced by class imbalance, consequently failing to estimate class-wise accuracy. Here, we introduce class-wise calibration within the framework of performance estimation for imbalanced datasets. Specifically, we derive class-specific modifications of state-of-the-art confidence-based model evaluation methods including temperature scaling (TS), difference of confidences (DoC), and average thresholded confidence (ATC). We also extend the methods to estimate Dice similarity coefficient (DSC) in image segmentation. We conduct experiments on four tasks and find the proposed modifications consistently improve the estimation accuracy for imbalanced datasets. Our methods improve accuracy estimation by 18\% in classification under natural domain shifts, and double the estimation accuracy on segmentation tasks, when compared with prior methods.
    REFACTOR GNNS: Revisiting Factorisation-based Models from a Message-Passing Perspective. (arXiv:2207.09980v1 [cs.LG])
    Factorisation-based Models (FMs), such as DistMult, have enjoyed enduring success for Knowledge Graph Completion (KGC) tasks, often outperforming Graph Neural Networks (GNNs). However, unlike GNNs, FMs struggle to incorporate node features and to generalise to unseen nodes in inductive settings. Our work bridges the gap between FMs and GNNs by proposing REFACTOR GNNS. This new architecture draws upon both modelling paradigms, which previously were largely thought of as disjoint. Concretely, using a message-passing formalism, we show how FMs can be cast as GNNs by reformulating the gradient descent procedure as message-passing operations, which forms the basis of our REFACTOR GNNS. Across a multitude of well-established KGC benchmarks, our REFACTOR GNNS achieve comparable transductive performance to FMs, and state-of-the-art inductive performance while using an order of magnitude fewer parameters.
    ApHMM: Accelerating Profile Hidden Markov Models for Fast and Energy-Efficient Genome Analysis. (arXiv:2207.09765v1 [cs.AR])
    Profile hidden Markov models (pHMMs) are widely used in many bioinformatics applications to accurately identify similarities between biological sequences (e.g., DNA or protein sequences). PHMMs use a commonly-adopted and highly-accurate method, called the Baum-Welch algorithm, to calculate these similarities. However, the Baum-Welch algorithm is computationally expensive, and existing works provide either software- or hardware-only solutions for a fixed pHMM design. When we analyze the state-of-the-art works, we find that there is a pressing need for a flexible, high-performant, and energy-efficient hardware-software co-design to efficiently and effectively solve all the major inefficiencies in the Baum-Welch algorithm for pHMMs. We propose ApHMM, the first flexible acceleration framework that can significantly reduce computational and energy overheads of the Baum-Welch algorithm for pHMMs. ApHMM leverages hardware-software co-design to solve the major inefficiencies in the Baum-Welch algorithm by 1) designing a flexible hardware to support different pHMMs designs, 2) exploiting the predictable data dependency pattern in an on-chip memory with memoization techniques, 3) quickly eliminating negligible computations with a hardware-based filter, and 4) minimizing the redundant computations. We implement our 1) hardware-software optimizations on a specialized hardware and 2) software optimizations for GPUs to provide the first flexible Baum-Welch accelerator for pHMMs. ApHMM provides significant speedups of 15.55x-260.03x, 1.83x-5.34x, and 27.97x compared to CPU, GPU, and FPGA implementations of the Baum-Welch algorithm, respectively. ApHMM outperforms the state-of-the-art CPU implementations of three important bioinformatics applications, 1) error correction, 2) protein family search, and 3) multiple sequence alignment, by 1.29x-59.94x, 1.03x-1.75x, and 1.03x-1.95x, respectively.
    DESCN: Deep Entire Space Cross Networks for Individual Treatment Effect Estimation. (arXiv:2207.09920v1 [cs.LG])
    Causal Inference has wide applications in various areas such as E-commerce and precision medicine, and its performance heavily relies on the accurate estimation of the Individual Treatment Effect (ITE). Conventionally, ITE is predicted by modeling the treated and control response functions separately in their individual sample spaces. However, such an approach usually encounters two issues in practice, i.e. divergent distribution between treated and control groups due to treatment bias, and significant sample imbalance of their population sizes. This paper proposes Deep Entire Space Cross Networks (DESCN) to model treatment effects from an end-to-end perspective. DESCN captures the integrated information of the treatment propensity, the response, and the hidden treatment effect through a cross network in a multi-task learning manner. Our method jointly learns the treatment and response functions in the entire sample space to avoid treatment bias and employs an intermediate pseudo treatment effect prediction network to relieve sample imbalance. Extensive experiments are conducted on a synthetic dataset and a large-scaled production dataset from the E-commerce voucher distribution business. The results indicate that DESCN can successfully enhance the accuracy of ITE estimation and improve the uplift ranking performance. A sample of the production dataset and the source code are released to facilitate future research in the community, which is, to the best of our knowledge, the first large-scale public biased treatment dataset for causal inference.
    VisageSynTalk: Unseen Speaker Video-to-Speech Synthesis via Speech-Visage Feature Selection. (arXiv:2206.07458v2 [cs.CV] UPDATED)
    The goal of this work is to reconstruct speech from a silent talking face video. Recent studies have shown impressive performance on synthesizing speech from silent talking face videos. However, they have not explicitly considered on varying identity characteristics of different speakers, which place a challenge in the video-to-speech synthesis, and this becomes more critical in unseen-speaker settings. Our approach is to separate the speech content and the visage-style from a given silent talking face video. By guiding the model to independently focus on modeling the two representations, we can obtain the speech of high intelligibility from the model even when the input video of an unseen subject is given. To this end, we introduce speech-visage selection that separates the speech content and the speaker identity from the visual features of the input video. The disentangled representations are jointly incorporated to synthesize speech through visage-style based synthesizer which generates speech by coating the visage-styles while maintaining the speech content. Thus, the proposed framework brings the advantage of synthesizing the speech containing the right content even with the silent talking face video of an unseen subject. We validate the effectiveness of the proposed framework on the GRID, TCD-TIMIT volunteer, and LRW datasets.
    GIPSO: Geometrically Informed Propagation for Online Adaptation in 3D LiDAR Segmentation. (arXiv:2207.09763v1 [cs.CV])
    3D point cloud semantic segmentation is fundamental for autonomous driving. Most approaches in the literature neglect an important aspect, i.e., how to deal with domain shift when handling dynamic scenes. This can significantly hinder the navigation capabilities of self-driving vehicles. This paper advances the state of the art in this research field. Our first contribution consists in analysing a new unexplored scenario in point cloud segmentation, namely Source-Free Online Unsupervised Domain Adaptation (SF-OUDA). We experimentally show that state-of-the-art methods have a rather limited ability to adapt pre-trained deep network models to unseen domains in an online manner. Our second contribution is an approach that relies on adaptive self-training and geometric-feature propagation to adapt a pre-trained source model online without requiring either source data or target labels. Our third contribution is to study SF-OUDA in a challenging setup where source data is synthetic and target data is point clouds captured in the real world. We use the recent SynLiDAR dataset as a synthetic source and introduce two new synthetic (source) datasets, which can stimulate future synthetic-to-real autonomous driving research. Our experiments show the effectiveness of our segmentation approach on thousands of real-world point clouds. Code and synthetic datasets are available at https://github.com/saltoricristiano/gipso-sfouda.
    Learning Object-Centered Autotelic Behaviors with Graph Neural Networks. (arXiv:2204.05141v2 [cs.AI] UPDATED)
    Although humans live in an open-ended world and endlessly face new challenges, they do not have to learn from scratch each time they face the next one. Rather, they have access to a handful of previously learned skills, which they rapidly adapt to new situations. In artificial intelligence, autotelic agents, which are intrinsically motivated to represent and set their own goals, exhibit promising skill adaptation capabilities. However, these capabilities are highly constrained by their policy and goal space representations. In this paper, we propose to investigate the impact of these representations on the learning and transfer capabilities of autotelic agents. We study different implementations of autotelic agents using four types of Graph Neural Networks policy representations and two types of goal spaces, either geometric or predicate-based. By testing agents on unseen goals, we show that combining object-centered architectures that are expressive enough with semantic relational goals helps learning to reach more difficult goals. We also release our graph-based implementations to encourage further research in this direction.
    ExoSGAN and ExoACGAN: Exoplanet Detection using Adversarial Training Algorithms. (arXiv:2207.09665v1 [astro-ph.EP])
    Exoplanet detection opens the door to the discovery of new habitable worlds and helps us understand how planets were formed. With the objective of finding earth-like habitable planets, NASA launched Kepler space telescope and its follow up mission K2. The advancement of observation capabilities has increased the range of fresh data available for research, and manually handling them is both time-consuming and difficult. Machine learning and deep learning techniques can greatly assist in lowering human efforts to process the vast array of data produced by the modern instruments of these exoplanet programs in an economical and unbiased manner. However, care should be taken to detect all the exoplanets precisely while simultaneously minimizing the misclassification of non-exoplanet stars. In this paper, we utilize two variations of generative adversarial networks, namely semi-supervised generative adversarial networks and auxiliary classifier generative adversarial networks, to detect transiting exoplanets in K2 data. We find that the usage of these models can be helpful for the classification of stars with exoplanets. Both of our techniques are able to categorize the light curves with a recall and precision of 1.00 on the test data. Our semi-supervised technique is beneficial to solve the cumbersome task of creating a labeled dataset.
    Quantized Training of Gradient Boosting Decision Trees. (arXiv:2207.09682v1 [cs.LG])
    Recent years have witnessed significant success in Gradient Boosting Decision Trees (GBDT) for a wide range of machine learning applications. Generally, a consensus about GBDT's training algorithms is gradients and statistics are computed based on high-precision floating points. In this paper, we investigate an essentially important question which has been largely ignored by the previous literature: how many bits are needed for representing gradients in training GBDT? To solve this mystery, we propose to quantize all the high-precision gradients in a very simple yet effective way in the GBDT's training algorithm. Surprisingly, both our theoretical analysis and empirical studies show that the necessary precisions of gradients without hurting any performance can be quite low, e.g., 2 or 3 bits. With low-precision gradients, most arithmetic operations in GBDT training can be replaced by integer operations of 8, 16, or 32 bits. Promisingly, these findings may pave the way for much more efficient training of GBDT from several aspects: (1) speeding up the computation of gradient statistics in histograms; (2) compressing the communication cost of high-precision statistical information during distributed training; (3) the inspiration of utilization and development of hardware architectures which well support low-precision computation for GBDT training. Benchmarked on CPU, GPU, and distributed clusters, we observe up to 2$\times$ speedup of our simple quantization strategy compared with SOTA GBDT systems on extensive datasets, demonstrating the effectiveness and potential of the low-precision training of GBDT. The code will be released to the official repository of LightGBM.
    Unsupervised energy disaggregation via convolutional sparse coding. (arXiv:2207.09785v1 [math.OC])
    In this work, a method for unsupervised energy disaggregation in private households equipped with smart meters is proposed. This method aims to classify power consumption as active or passive, granting the ability to report on the residents' activity and presence without direct interaction. This lays the foundation for applications like non-intrusive health monitoring of private homes. The proposed method is based on minimizing a suitable energy functional, for which the iPALM (inertial proximal alternating linearized minimization) algorithm is employed, demonstrating that various conditions guaranteeing convergence are satisfied. In order to confirm feasibility of the proposed method, experiments on semi-synthetic test data sets and a comparison to existing, supervised methods are provided.
    CoSMix: Compositional Semantic Mix for Domain Adaptation in 3D LiDAR Segmentation. (arXiv:2207.09778v1 [cs.CV])
    3D LiDAR semantic segmentation is fundamental for autonomous driving. Several Unsupervised Domain Adaptation (UDA) methods for point cloud data have been recently proposed to improve model generalization for different sensors and environments. Researchers working on UDA problems in the image domain have shown that sample mixing can mitigate domain shift. We propose a new approach of sample mixing for point cloud UDA, namely Compositional Semantic Mix (CoSMix), the first UDA approach for point cloud segmentation based on sample mixing. CoSMix consists of a two-branch symmetric network that can process labelled synthetic data (source) and real-world unlabelled point clouds (target) concurrently. Each branch operates on one domain by mixing selected pieces of data from the other one, and by using the semantic information derived from source labels and target pseudo-labels. We evaluate CoSMix on two large-scale datasets, showing that it outperforms state-of-the-art methods by a large margin. Our code is available at https://github.com/saltoricristiano/cosmix-uda.
    Diversified Adversarial Attacks based on Conjugate Gradient Method. (arXiv:2206.09628v2 [cs.LG] UPDATED)
    Deep learning models are vulnerable to adversarial examples, and adversarial attacks used to generate such examples have attracted considerable research interest. Although existing methods based on the steepest descent have achieved high attack success rates, ill-conditioned problems occasionally reduce their performance. To address this limitation, we utilize the conjugate gradient (CG) method, which is effective for this type of problem, and propose a novel attack algorithm inspired by the CG method, named the Auto Conjugate Gradient (ACG) attack. The results of large-scale evaluation experiments conducted on the latest robust models show that, for most models, ACG was able to find more adversarial examples with fewer iterations than the existing SOTA algorithm Auto-PGD (APGD). We investigated the difference in search performance between ACG and APGD in terms of diversification and intensification, and define a measure called Diversity Index (DI) to quantify the degree of diversity. From the analysis of the diversity using this index, we show that the more diverse search of the proposed method remarkably improves its attack success rate.
    Learning to Solve Soft-Constrained Vehicle Routing Problems with Lagrangian Relaxation. (arXiv:2207.09860v1 [cs.AI])
    Vehicle Routing Problems (VRPs) in real-world applications often come with various constraints, therefore bring additional computational challenges to exact solution methods or heuristic search approaches. The recent idea to learn heuristic move patterns from sample data has become increasingly promising to reduce solution developing costs. However, using learning-based approaches to address more types of constrained VRP remains a challenge. The difficulty lies in controlling for constraint violations while searching for optimal solutions. To overcome this challenge, we propose a Reinforcement Learning based method to solve soft-constrained VRPs by incorporating the Lagrangian relaxation technique and using constrained policy optimization. We apply the method on three common types of VRPs, the Travelling Salesman Problem with Time Windows (TSPTW), the Capacitated VRP (CVRP) and the Capacitated VRP with Time Windows (CVRPTW), to show the generalizability of the proposed method. After comparing to existing RL-based methods and open-source heuristic solvers, we demonstrate its competitive performance in finding solutions with a good balance in travel distance, constraint violations and inference speed.
    Journal Impact Factor and Peer Review Thoroughness and Helpfulness: A Supervised Machine Learning Study. (arXiv:2207.09821v1 [cs.DL])
    The journal impact factor (JIF) is often equated with journal quality and the quality of the peer review of the papers submitted to the journal. We examined the association between the content of peer review and JIF by analysing 10,000 peer review reports submitted to 1,644 medical and life sciences journals. Two researchers hand-coded a random sample of 2,000 sentences. We then trained machine learning models to classify all 187,240 sentences as contributing or not contributing to content categories. We examined the association between ten groups of journals defined by JIF deciles and the content of peer reviews using linear mixed-effects models, adjusting for the length of the review. The JIF ranged from 0.21 to 74.70. The length of peer reviews increased from the lowest (median number of words 185) to the JIF group (387 words). The proportion of sentences allocated to different content categories varied widely, even within JIF groups. For thoroughness, sentences on 'Materials and Methods' were more common in the highest JIF journals than in the lowest JIF group (difference of 7.8 percentage points; 95% CI 4.9 to 10.7%). The trend for 'Presentation and Reporting' went in the opposite direction, with the highest JIF journals giving less emphasis to such content (difference -8.9%; 95% CI -11.3 to -6.5%). For helpfulness, reviews for higher JIF journals devoted less attention to 'Suggestion and Solution' and provided fewer Examples than lower impact factor journals. No, or only small differences were evident for other content categories. In conclusion, peer review in journals with higher JIF tends to be more thorough in discussing the methods used but less helpful in terms of suggesting solutions and providing examples. Differences were modest and variability high, indicating that the JIF is a bad predictor for the quality of peer review of an individual manuscript.
    Deep Random Vortex Method for Simulation and Inference of Navier-Stokes Equations. (arXiv:2206.09571v2 [physics.flu-dyn] UPDATED)
    Navier-Stokes equations are significant partial differential equations that describe the motion of fluids such as liquids and air. Due to the importance of Navier-Stokes equations, the development on efficient numerical schemes is important for both science and engineer. Recently, with the development of AI techniques, several approaches have been designed to integrate deep neural networks in simulating and inferring the fluid dynamics governed by incompressible Navier-Stokes equations, which can accelerate the simulation or inferring process in a mesh-free and differentiable way. In this paper, we point out that the capability of existing deep Navier-Stokes informed methods is limited to handle non-smooth or fractional equations, which are two critical situations in reality. To this end, we propose the \emph{Deep Random Vortex Method} (DRVM), which combines the neural network with a random vortex dynamics system equivalent to the Navier-Stokes equation. Specifically, the random vortex dynamics motivates a Monte Carlo based loss function for training the neural network, which avoids the calculation of derivatives through auto-differentiation. Therefore, DRVM not only can efficiently solve Navier-Stokes equations involving rough path, non-differentiable initial conditions and fractional operators, but also inherits the mesh-free and differentiable benefits of the deep-learning-based solver. We conduct experiments on the Cauchy problem, parametric solver learning, and the inverse problem of both 2-d and 3-d incompressible Navier-Stokes equations. The proposed method achieves accurate results for simulation and inference of Navier-Stokes equations. Especially for the cases that include singular initial conditions, DRVM significantly outperforms existing PINN method.
    Interpreting Latent Spaces of Generative Models for Medical Images using Unsupervised Methods. (arXiv:2207.09740v1 [eess.IV])
    Generative models such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) play an increasingly important role in medical image analysis. The latent spaces of these models often show semantically meaningful directions corresponding to human-interpretable image transformations. However, until now, their exploration for medical images has been limited due to the requirement of supervised data. Several methods for unsupervised discovery of interpretable directions in GAN latent spaces have shown interesting results on natural images. This work explores the potential of applying these techniques on medical images by training a GAN and a VAE on thoracic CT scans and using an unsupervised method to discover interpretable directions in the resulting latent space. We find several directions corresponding to non-trivial image transformations, such as rotation or breast size. Furthermore, the directions show that the generative models capture 3D structure despite being presented only with 2D data. The results show that unsupervised methods to discover interpretable directions in GANs generalize to VAEs and can be applied to medical images. This opens a wide array of future work using these methods in medical image analysis.
    AdaBest: Minimizing Client Drift in Federated Learning via Adaptive Bias Estimation. (arXiv:2204.13170v3 [cs.LG] UPDATED)
    In Federated Learning (FL), a number of clients or devices collaborate to train a model without sharing their data. Models are optimized locally at each client and further communicated to a central hub for aggregation. While FL is an appealing decentralized training paradigm, heterogeneity among data from different clients can cause the local optimization to drift away from the global objective. In order to estimate and therefore remove this drift, variance reduction techniques have been incorporated into FL optimization recently. However, these approaches inaccurately estimate the clients' drift and ultimately fail to remove it properly. In this work, we propose an adaptive algorithm that accurately estimates drift across clients. In comparison to previous works, our approach necessitates less storage and communication bandwidth, as well as lower compute costs. Additionally, our proposed methodology induces stability by constraining the norm of estimates for client drift, making it more practical for large scale FL. Experimental findings demonstrate that the proposed algorithm converges significantly faster and achieves higher accuracy than the baselines across various FL benchmarks.
    FedDM: Iterative Distribution Matching for Communication-Efficient Federated Learning. (arXiv:2207.09653v1 [cs.LG])
    Federated learning~(FL) has recently attracted increasing attention from academia and industry, with the ultimate goal of achieving collaborative training under privacy and communication constraints. Existing iterative model averaging based FL algorithms require a large number of communication rounds to obtain a well-performed model due to extremely unbalanced and non-i.i.d data partitioning among different clients. Thus, we propose FedDM to build the global training objective from multiple local surrogate functions, which enables the server to gain a more global view of the loss landscape. In detail, we construct synthetic sets of data on each client to locally match the loss landscape from original data through distribution matching. FedDM reduces communication rounds and improves model quality by transmitting more informative and smaller synthesized data compared with unwieldy model weights. We conduct extensive experiments on three image classification datasets, and results show that our method can outperform other FL counterparts in terms of efficiency and model performance. Moreover, we demonstrate that FedDM can be adapted to preserve differential privacy with Gaussian mechanism and train a better model under the same privacy budget.
    Improving Data Driven Inverse Text Normalization using Data Augmentation. (arXiv:2207.09674v1 [cs.CL])
    Inverse text normalization (ITN) is used to convert the spoken form output of an automatic speech recognition (ASR) system to a written form. Traditional handcrafted ITN rules can be complex to transcribe and maintain. Meanwhile neural modeling approaches require quality large-scale spoken-written pair examples in the same or similar domain as the ASR system (in-domain data), to train. Both these approaches require costly and complex annotations. In this paper, we present a data augmentation technique that effectively generates rich spoken-written numeric pairs from out-of-domain textual data with minimal human annotation. We empirically demonstrate that ITN model trained using our data augmentation technique consistently outperform ITN model trained using only in-domain data across all numeric surfaces like cardinal, currency, and fraction, by an overall accuracy of 14.44%.
    Large Scale Radio Frequency Signal Classification. (arXiv:2207.09918v1 [cs.LG])
    Existing datasets used to train deep learning models for narrowband radio frequency (RF) signal classification lack enough diversity in signal types and channel impairments to sufficiently assess model performance in the real world. We introduce the Sig53 dataset consisting of 5 million synthetically-generated samples from 53 different signal classes and expertly chosen impairments. We also introduce TorchSig, a signals processing machine learning toolkit that can be used to generate this dataset. TorchSig incorporates data handling principles that are common to the vision domain, and it is meant to serve as an open-source foundation for future signals machine learning research. Initial experiments using the Sig53 dataset are conducted using state of the art (SoTA) convolutional neural networks (ConvNets) and Transformers. These experiments reveal Transformers outperform ConvNets without the need for additional regularization or a ConvNet teacher, which is contrary to results from the vision domain. Additional experiments demonstrate that TorchSig's domain-specific data augmentations facilitate model training, which ultimately benefits model performance. Finally, TorchSig supports on-the-fly synthetic data creation at training time, thus enabling massive scale training sessions with virtually unlimited datasets.
    FOSTER: Feature Boosting and Compression for Class-Incremental Learning. (arXiv:2204.04662v2 [cs.CV] UPDATED)
    The ability to learn new concepts continually is necessary in this ever-changing world. However, deep neural networks suffer from catastrophic forgetting when learning new categories. Many works have been proposed to alleviate this phenomenon, whereas most of them either fall into the stability-plasticity dilemma or take too much computation or storage overhead. Inspired by the gradient boosting algorithm to gradually fit the residuals between the target model and the previous ensemble model, we propose a novel two-stage learning paradigm FOSTER, empowering the model to learn new categories adaptively. Specifically, we first dynamically expand new modules to fit the residuals between the target and the output of the original model. Next, we remove redundant parameters and feature dimensions through an effective distillation strategy to maintain the single backbone model. We validate our method FOSTER on CIFAR-100 and ImageNet-100/1000 under different settings. Experimental results show that our method achieves state-of-the-art performance. Code is available at: https://github.com/G-U-N/ECCV22-FOSTER.
    ManiFest: Manifold Deformation for Few-shot Image Translation. (arXiv:2111.13681v3 [cs.CV] UPDATED)
    Most image-to-image translation methods require a large number of training images, which restricts their applicability. We instead propose ManiFest: a framework for few-shot image translation that learns a context-aware representation of a target domain from a few images only. To enforce feature consistency, our framework learns a style manifold between source and proxy anchor domains (assumed to be composed of large numbers of images). The learned manifold is interpolated and deformed towards the few-shot target domain via patch-based adversarial and feature statistics alignment losses. All of these components are trained simultaneously during a single end-to-end loop. In addition to the general few-shot translation task, our approach can alternatively be conditioned on a single exemplar image to reproduce its specific style. Extensive experiments demonstrate the efficacy of ManiFest on multiple tasks, outperforming the state-of-the-art on all metrics and in both the general- and exemplar-based scenarios. Our code is available at https://github.com/cv-rits/Manifest .
    Generalizable and Robust Deep Learning Algorithm for Atrial Fibrillation Diagnosis Across Ethnicities, Ages and Sexes. (arXiv:2207.09667v1 [cs.LG])
    To drive health innovation that meets the needs of all and democratize healthcare, there is a need to assess the generalization performance of deep learning (DL) algorithms across various distribution shifts to ensure that these algorithms are robust. This retrospective study is, to the best of our knowledge, the first to develop and assess the generalization performance of a deep learning (DL) model for AF events detection from long term beat-to-beat intervals across ethnicities, ages and sexes. The new recurrent DL model, denoted ArNet2, was developed on a large retrospective dataset of 2,147 patients totaling 51,386 hours of continuous electrocardiogram (ECG). The models generalization was evaluated on manually annotated test sets from four centers (USA, Israel, Japan and China) totaling 402 patients. The model was further validated on a retrospective dataset of 1,730 consecutives Holter recordings from the Rambam Hospital Holter clinic, Haifa, Israel. The model outperformed benchmark state-of-the-art models and generalized well across ethnicities, ages and sexes. Performance was higher for female than male and young adults (less than 60 years old) and showed some differences across ethnicities. The main finding explaining these variations was an impairment in performance in groups with a higher prevalence of atrial flutter (AFL). Our findings on the relative performance of ArNet2 across groups may have clinical implications on the choice of the preferred AF examination method to use relative to the group of interest.
    FedNet2Net: Saving Communication and Computations in Federated Learning with Model Growing. (arXiv:2207.09568v1 [cs.LG])
    Federated learning (FL) is a recently developed area of machine learning, in which the private data of a large number of distributed clients is used to develop a global model under the coordination of a central server without explicitly exposing the data. The standard FL strategy has a number of significant bottlenecks including large communication requirements and high impact on the clients' resources. Several strategies have been described in the literature trying to address these issues. In this paper, a novel scheme based on the notion of "model growing" is proposed. Initially, the server deploys a small model of low complexity, which is trained to capture the data complexity during the initial set of rounds. When the performance of such a model saturates, the server switches to a larger model with the help of function-preserving transformations. The model complexity increases as more data is processed by the clients, and the overall process continues until the desired performance is achieved. Therefore, the most complex model is broadcast only at the final stage in our approach resulting in substantial reduction in communication cost and client computational requirements. The proposed approach is tested extensively on three standard benchmarks and is shown to achieve substantial reduction in communication and client computation while achieving comparable accuracy when compared to the current most effective strategies.
    Controllable Data Generation by Deep Learning: A Review. (arXiv:2207.09542v1 [cs.LG])
    Designing and generating new data under targeted properties has been attracting various critical applications such as molecule design, image editing and speech synthesis. Traditional hand-crafted approaches heavily rely on expertise experience and intensive human efforts, yet still suffer from the insufficiency of scientific knowledge and low throughput to support effective and efficient data generation. Recently, the advancement of deep learning induces expressive methods that can learn the underlying representation and properties of data. Such capability provides new opportunities in figuring out the mutual relationship between the structural patterns and functional properties of the data and leveraging such relationship to generate structural data given the desired properties. This article provides a systematic review of this promising research area, commonly known as controllable deep data generation. Firstly, the potential challenges are raised and preliminaries are provided. Then the controllable deep data generation is formally defined, a taxonomy on various techniques is proposed and the evaluation metrics in this specific domain are summarized. After that, exciting applications of controllable deep data generation are introduced and existing works are experimentally analyzed and compared. Finally, the promising future directions of controllable deep data generation are highlighted and five potential challenges are identified.
    ICRICS: Iterative Compensation Recovery for Image Compressive Sensing. (arXiv:2207.09594v1 [cs.LG])
    Closed-loop architecture is widely utilized in automatic control systems and attain distinguished performance. However, classical compressive sensing systems employ open-loop architecture with separated sampling and reconstruction units. Therefore, a method of iterative compensation recovery for image compressive sensing (ICRICS) is proposed by introducing closed-loop framework into traditional compresses sensing systems. The proposed method depends on any existing approaches and upgrades their reconstruction performance by adding negative feedback structure. Theory analysis on negative feedback of compressive sensing systems is performed. An approximate mathematical proof of the effectiveness of the proposed method is also provided. Simulation experiments on more than 3 image datasets show that the proposed method is superior to 10 competition approaches in reconstruction performance. The maximum increment of average peak signal-to-noise ratio is 4.36 dB and the maximum increment of average structural similarity is 0.034 on one dataset. The proposed method based on negative feedback mechanism can efficiently correct the recovery error in the existing systems of image compressive sensing.
    Learning from few examples: Classifying sex from retinal images via deep learning. (arXiv:2207.09624v1 [cs.CV])
    Deep learning has seen tremendous interest in medical imaging, particularly in the use of convolutional neural networks (CNNs) for developing automated diagnostic tools. The facility of its non-invasive acquisition makes retinal fundus imaging amenable to such automated approaches. Recent work in analyzing fundus images using CNNs relies on access to massive data for training and validation - hundreds of thousands of images. However, data residency and data privacy restrictions stymie the applicability of this approach in medical settings where patient confidentiality is a mandate. Here, we showcase results for the performance of DL on small datasets to classify patient sex from fundus images - a trait thought not to be present or quantifiable in fundus images until recently. We fine-tune a Resnet-152 model whose last layer has been modified for binary classification. In several experiments, we assess performance in the small dataset context using one private (DOVS) and one public (ODIR) data source. Our models, developed using approximately 2500 fundus images, achieved test AUC scores of up to 0.72 (95% CI: [0.67, 0.77]). This corresponds to a mere 25% decrease in performance despite a nearly 1000-fold decrease in the dataset size compared to prior work in the literature. Even with a hard task like sex categorization from retinal images, we find that classification is possible with very small datasets. Additionally, we perform domain adaptation experiments between DOVS and ODIR; explore the effect of data curation on training and generalizability; and investigate model ensembling to maximize CNN classifier performance in the context of small development datasets.
    Rayleigh-Gauss-Newton optimization with enhanced sampling for variational Monte Carlo. (arXiv:2106.10558v4 [stat.ML] UPDATED)
    Variational Monte Carlo (VMC) is an approach for computing ground-state wavefunctions that has recently become more powerful due to the introduction of neural network-based wavefunction parametrizations. However, efficiently training neural wavefunctions to converge to an energy minimum remains a difficult problem. In this work, we analyze optimization and sampling methods used in VMC and introduce alterations to improve their performance. First, based on theoretical convergence analysis in a noiseless setting, we motivate a new optimizer that we call the Rayleigh-Gauss-Newton method, which can improve upon gradient descent and natural gradient descent to achieve superlinear convergence at no more than twice the computational cost. Second, in order to realize this favorable comparison in the presence of stochastic noise, we analyze the effect of sampling error on VMC parameter updates and experimentally demonstrate that it can be reduced by the parallel tempering method. In particular, we demonstrate that RGN can be made robust to energy spikes that occur when the sampler moves between metastable regions of configuration space. Finally, putting theory into practice, we apply our enhanced optimization and sampling methods to the transverse-field Ising and XXZ models on large lattices, yielding ground-state energy estimates with remarkably high accuracy after just 200 parameter updates.
    Towards Robust Multivariate Time-Series Forecasting: Adversarial Attacks and Defense Mechanisms. (arXiv:2207.09572v1 [cs.LG])
    As deep learning models have gradually become the main workhorse of time series forecasting, the potential vulnerability under adversarial attacks to forecasting and decision system accordingly has emerged as a main issue in recent years. Albeit such behaviors and defense mechanisms started to be investigated for the univariate time series forecasting, there are still few studies regarding the multivariate forecasting which is often preferred due to its capacity to encode correlations between different time series. In this work, we study and design adversarial attack on multivariate probabilistic forecasting models, taking into consideration attack budget constraints and the correlation architecture between multiple time series. Specifically, we investigate a sparse indirect attack that hurts the prediction of an item (time series) by only attacking the history of a small number of other items to save attacking cost. In order to combat these attacks, we also develop two defense strategies. First, we adopt randomized smoothing to multivariate time series scenario and verify its effectiveness via empirical experiments. Second, we leverage a sparse attacker to enable end-to-end adversarial training that delivers robust probabilistic forecasters. Extensive experiments on real dataset confirm that our attack schemes are powerful and our defend algorithms are more effective compared with other baseline defense mechanisms.
    A Hybrid Spatial-temporal Deep Learning Architecture for Lane Detection. (arXiv:2110.04079v5 [cs.CV] UPDATED)
    Accurate and reliable lane detection is vital for the safe performance of lane-keeping assistance and lane departure warning systems. However, under certain challenging circumstances, it is difficult to get satisfactory performance in accurately detecting the lanes from one single image as mostly done in current literature. Since lane markings are continuous lines, the lanes that are difficult to be accurately detected in the current single image can potentially be better deduced if information from previous frames is incorporated. This study proposes a novel hybrid spatial-temporal (ST) sequence-to-one deep learning architecture. This architecture makes full use of the ST information in multiple continuous image frames to detect the lane markings in the very last frame. Specifically, the hybrid model integrates the following aspects: (a) the single image feature extraction module equipped with the spatial convolutional neural network; (b) the ST feature integration module constructed by ST recurrent neural network; (c) the encoder-decoder structure, which makes this image segmentation problem work in an end-to-end supervised learning format. Extensive experiments reveal that the proposed model architecture can effectively handle challenging driving scenes and outperforms available state-of-the-art methods.
    Time-series image denoising of pressure-sensitive paint data by projected multivariate singular spectrum analysis. (arXiv:2203.07574v3 [eess.IV] UPDATED)
    Time-series data, such as unsteady pressure-sensitive paint (PSP) measurement data, may contain a significant amount of random noise. Thus, in this study, we investigated a noise-reduction method that combines multivariate singular spectrum analysis (MSSA) with low-dimensional data representation. MSSA is a state-space reconstruction technique that utilizes time-delay embedding, and the low-dimensional representation is achieved by projecting data onto the singular value decomposition (SVD) basis. The noise-reduction performance of the proposed method for unsteady PSP data, i.e., the projected MSSA, is compared with that of the truncated SVD method, one of the most employed noise-reduction methods. The result shows that the projected MSSA exhibits better performance in reducing random noise than the truncated SVD method. Additionally, in contrast to that of the truncated SVD method, the performance of the projected MSSA is less sensitive to the truncation rank. Furthermore, the projected MSSA achieves denoising effectively by extracting smooth trajectories in a state space from noisy input data. Expectedly, the projected MSSA will be effective for reducing random noise in not only PSP measurement data, but also various high-dimensional time-series data.
    Combined Federated and Split Learning in Edge Computing for Ubiquitous Intelligence in Internet of Things: State of the Art and Future Directions. (arXiv:2207.09611v1 [cs.LG])
    Federated learning (FL) and split learning (SL) are two emerging collaborative learning methods that may greatly facilitate ubiquitous intelligence in Internet of Things (IoT). Federated learning enables machine learning (ML) models locally trained using private data to be aggregated into a global model. Split learning allows different portions of an ML model to be collaboratively trained on different workers in a learning framework. Federated learning and split learning, each has unique advantages and respective limitations, may complement each other toward ubiquitous intelligence in IoT. Therefore, combination of federated learning and split learning recently became an active research area attracting extensive interest. In this article, we review the latest developments in federated learning and split learning and present a survey on the state-of-the-art technologies for combining these two learning methods in an edge computing-based IoT environment. We also identify some open problems and discuss possible directions for future research in this area with a hope to further arouse the research community's interest in this emerging field.
    EVHA: Explainable Vision System for Hardware Testing and Assurance -- An Overview. (arXiv:2207.09627v1 [cs.CR])
    Due to the ever-growing demands for electronic chips in different sectors the semiconductor companies have been mandated to offshore their manufacturing processes. This unwanted matter has made security and trustworthiness of their fabricated chips concerning and caused creation of hardware attacks. In this condition, different entities in the semiconductor supply chain can act maliciously and execute an attack on the design computing layers, from devices to systems. Our attack is a hardware Trojan that is inserted during mask generation/fabrication in an untrusted foundry. The Trojan leaves a footprint in the fabricated through addition, deletion, or change of design cells. In order to tackle this problem, we propose Explainable Vision System for Hardware Testing and Assurance (EVHA) in this work that can detect the smallest possible change to a design in a low-cost, accurate, and fast manner. The inputs to this system are Scanning Electron Microscopy (SEM) images acquired from the Integrated Circuits (ICs) under examination. The system output is determination of IC status in terms of having any defect and/or hardware Trojan through addition, deletion, or change in the design cells at the cell-level. This article provides an overview on the design, development, implementation, and analysis of our defense system.
    Robust Landmark-based Stent Tracking in X-ray Fluoroscopy. (arXiv:2207.09933v1 [cs.CV])
    In clinical procedures of angioplasty (i.e., open clogged coronary arteries), devices such as balloons and stents need to be placed and expanded in arteries under the guidance of X-ray fluoroscopy. Due to the limitation of X-ray dose, the resulting images are often noisy. To check the correct placement of these devices, typically multiple motion-compensated frames are averaged to enhance the view. Therefore, device tracking is a necessary procedure for this purpose. Even though angioplasty devices are designed to have radiopaque markers for the ease of tracking, current methods struggle to deliver satisfactory results due to the small marker size and complex scenes in angioplasty. In this paper, we propose an end-to-end deep learning framework for single stent tracking, which consists of three hierarchical modules: U-Net based landmark detection, ResNet based stent proposal and feature extraction, and graph convolutional neural network (GCN) based stent tracking that temporally aggregates both spatial information and appearance features. The experiments show that our method performs significantly better in detection compared with the state-of-the-art point-based tracking models. In addition, its fast inference speed satisfies clinical requirements.
    Towards Accurate and Robust Classification in Continuously Transitioning Industrial Sprays with Mixup. (arXiv:2207.09609v1 [cs.CV])
    Image classification with deep neural networks has seen a surge of technological breakthroughs with promising applications in areas such as face recognition, medical imaging, and autonomous driving. In engineering problems, however, such as high-speed imaging of engine fuel injector sprays or body paint sprays, deep neural networks face a fundamental challenge related to the availability of adequate and diverse data. Typically, only thousands or sometimes even hundreds of samples are available for training. In addition, the transition between different spray classes is a continuum and requires a high level of domain expertise to label the images accurately. In this work, we used Mixup as an approach to systematically deal with the data scarcity and ambiguous class boundaries found in industrial spray applications. We show that data augmentation can mitigate the over-fitting problem of large neural networks on small data sets, to a certain level, but cannot fundamentally resolve the issue. We discuss how a convex linear interpolation of different classes naturally aligns with the continuous transition between different classes in our application. Our experiments demonstrate Mixup as a simple yet effective method to train an accurate and robust deep neural network classifier with only a few hundred samples.
    FORML: Learning to Reweight Data for Fairness. (arXiv:2202.01719v2 [cs.LG] UPDATED)
    Machine learning models are trained to minimize the mean loss for a single metric, and thus typically do not consider fairness and robustness. Neglecting such metrics in training can make these models prone to fairness violations when training data are imbalanced or test distributions differ. This work introduces Fairness Optimized Reweighting via Meta-Learning (FORML), a training algorithm that balances fairness and robustness with accuracy by jointly learning training sample weights and neural network parameters. The approach increases model fairness by learning to balance the contributions from both over- and under-represented sub-groups through dynamic reweighting of the data learned from a user-specified held-out set representative of the distribution under which fairness is desired. FORML improves equality of opportunity fairness criteria on image classification tasks, reduces bias of corrupted labels, and facilitates building more fair datasets via data condensation. These improvements are achieved without pre-processing data or post-processing model outputs, without learning an additional weighting function, without changing model architecture, and while maintaining accuracy on the original predictive metric.
    Doge Tickets: Uncovering Domain-general Language Models by Playing Lottery Tickets. (arXiv:2207.09638v1 [cs.CL])
    Over-parameterized models, typically pre-trained language models (LMs), have shown an appealing expressive power due to their small learning bias. However, the huge learning capacity of LMs can also lead to large learning variance. In a pilot study, we find that, when faced with multiple domains, a critical portion of parameters behave unexpectedly in a domain-specific manner while others behave in a domain-general one. Motivated by this phenomenon, we for the first time posit that domain-general parameters can underpin a domain-general LM that can be derived from the original LM. To uncover the domain-general LM, we propose to identify domain-general parameters by playing lottery tickets (dubbed doge tickets). In order to intervene the lottery, we propose a domain-general score, which depicts how domain-invariant a parameter is by associating it with the variance. Comprehensive experiments are conducted on the Amazon, Mnli and OntoNotes datasets. The results show that the doge tickets obtains an improved out-of-domain generalization in comparison with a range of competitive baselines. Analysis results further hint the existence of domain-general parameters and the performance consistency of doge tickets.
    Operating Envelopes under Probabilistic Electricity Demand and Solar Generation Forecasts. (arXiv:2207.09818v1 [eess.SY])
    The increasing penetration of distributed energy resources in low-voltage networks is turning end-users from consumers to prosumers. However, the incomplete smart meter rollout and paucity of smart meter data due to the regulatory separation between retail and network service provision make active distribution network management difficult. Furthermore, distribution network operators oftentimes do not have access to real-time smart meter data, which creates an additional challenge. For the lack of better solutions, they use blanket rooftop solar export limits, leading to suboptimal outcomes. To address this, we designed a conditional generative adversarial network (CGAN)-based model to forecast household solar generation and electricity demand, which serves as an input to chance-constrained optimal power flow used to compute fair operating envelopes under uncertainty.
    MALTS: Matching After Learning to Stretch. (arXiv:1811.07415v7 [stat.ME] UPDATED)
    We introduce a flexible framework that produces high-quality almost-exact matches for causal inference. Most prior work in matching uses ad-hoc distance metrics, often leading to poor quality matches, particularly when there are irrelevant covariates. In this work, we learn an interpretable distance metric for matching, which leads to substantially higher quality matches. The learned distance metric stretches the covariate space according to each covariate's contribution to outcome prediction: this stretching means that mismatches on important covariates carry a larger penalty than mismatches on irrelevant covariates. Our ability to learn flexible distance metrics leads to matches that are interpretable and useful for the estimation of conditional average treatment effects.
    Pre-training strategies and datasets for facial representation learning. (arXiv:2103.16554v2 [cs.CV] UPDATED)
    What is the best way to learn a universal face representation? Recent work on Deep Learning in the area of face analysis has focused on supervised learning for specific tasks of interest (e.g. face recognition, facial landmark localization etc.) but has overlooked the overarching question of how to find a facial representation that can be readily adapted to several facial analysis tasks and datasets. To this end, we make the following 4 contributions: (a) we introduce, for the first time, a comprehensive evaluation benchmark for facial representation learning consisting of 5 important face analysis tasks. (b) We systematically investigate two ways of large-scale representation learning applied to faces: supervised and unsupervised pre-training. Importantly, we focus our evaluations on the case of few-shot facial learning. (c) We investigate important properties of the training datasets including their size and quality (labelled, unlabelled or even uncurated). (d) To draw our conclusions, we conducted a very large number of experiments. Our main two findings are: (1) Unsupervised pre-training on completely in-the-wild, uncurated data provides consistent and, in some cases, significant accuracy improvements for all facial tasks considered. (2) Many existing facial video datasets seem to have a large amount of redundancy. We will release code, and pre-trained models to facilitate future research.
    Error-free approximation of explicit linear MPC through lattice piecewise affine expression. (arXiv:2110.00201v3 [eess.SY] UPDATED)
    In this paper, the disjunctive and conjunctive lattice piecewise affine (PWA) approximations of explicit linear model predictive control (MPC) are proposed. The training data are generated uniformly in the domain of interest, consisting of the state samples and corresponding affine control laws, based on which the lattice PWA approximations are constructed. Re-sampling of data is also proposed to guarantee that the lattice PWA approximations are identical to explicit MPC control law in the unique order (UO) regions containing the sample points as interior points. Additionally, under mild assumptions, the equivalence of the two lattice PWA approximations guarantees that the approximations are error-free in the domain of interest. The algorithms for deriving statistically error-free approximation to the explicit linear MPC are proposed and the complexity of the entire procedure is analyzed, which is polynomial with respect to the number of samples. The performance of the proposed approximation strategy is tested through two simulation examples, and the result shows that with a moderate number of sample points, we can construct lattice PWA approximations that are equivalent to optimal control law of the explicit linear MPC.
    AutoDES: AutoML Pipeline Generation of Classification with Dynamic Ensemble Strategy Selection. (arXiv:2201.00207v2 [cs.LG] UPDATED)
    Automating machine learning has achieved remarkable technological developments in recent years, and building an automated machine learning pipeline is now an essential task. The model ensemble is the technique of combining multiple models to get a better and more robust model. However, existing automated machine learning tends to be simplistic in handling the model ensemble, where the ensemble strategy is fixed, such as stacked generalization. There have been many techniques on different ensemble methods, especially ensemble selection, and the fixed ensemble strategy limits the upper limit of the model's performance. In this article, we present a novel framework for automated machine learning. Our framework incorporates advances in dynamic ensemble selection, and to our best knowledge, our approach is the first in the field of AutoML to search and optimize ensemble strategies. In the comparison experiments, our method outperforms the state-of-the-art automated machine learning frameworks with the same CPU time in 42 classification datasets from the OpenML platform. Ablation experiments on our framework validate the effectiveness of our proposed method.
    Adaptive Step-Size Methods for Compressed SGD. (arXiv:2207.10046v1 [stat.ML])
    Compressed Stochastic Gradient Descent (SGD) algorithms have been recently proposed to address the communication bottleneck in distributed and decentralized optimization problems, such as those that arise in federated machine learning. Existing compressed SGD algorithms assume the use of non-adaptive step-sizes(constant or diminishing) to provide theoretical convergence guarantees. Typically, the step-sizes are fine-tuned in practice to the dataset and the learning algorithm to provide good empirical performance. Such fine-tuning might be impractical in many learning scenarios, and it is therefore of interest to study compressed SGD using adaptive step-sizes. Motivated by prior work on adaptive step-size methods for SGD to train neural networks efficiently in the uncompressed setting, we develop an adaptive step-size method for compressed SGD. In particular, we introduce a scaling technique for the descent step in compressed SGD, which we use to establish order-optimal convergence rates for convex-smooth and strong convex-smooth objectives under an interpolation condition and for non-convex objectives under a strong growth condition. We also show through simulation examples that without this scaling, the algorithm can fail to converge. We present experimental results on deep neural networks for real-world datasets, and compare the performance of our proposed algorithm with previously proposed compressed SGD methods in literature, and demonstrate improved performance on ResNet-18, ResNet-34 and DenseNet architectures for CIFAR-100 and CIFAR-10 datasets at various levels of compression.
    Mitigating Algorithmic Bias with Limited Annotations. (arXiv:2207.10018v1 [cs.LG])
    Existing work on fairness modeling commonly assumes that sensitive attributes for all instances are fully available, which may not be true in many real-world applications due to the high cost of acquiring sensitive information. When sensitive attributes are not disclosed or available, it is needed to manually annotate a small part of the training data to mitigate bias. However, the skewed distribution across different sensitive groups preserves the skewness of the original dataset in the annotated subset, which leads to non-optimal bias mitigation. To tackle this challenge, we propose Active Penalization Of Discrimination (APOD), an interactive framework to guide the limited annotations towards maximally eliminating the effect of algorithmic bias. The proposed APOD integrates discrimination penalization with active instance selection to efficiently utilize the limited annotation budget, and it is theoretically proved to be capable of bounding the algorithmic bias. According to the evaluation on five benchmark datasets, APOD outperforms the state-of-the-arts baseline methods under the limited annotation budget, and shows comparable performance to fully annotated bias mitigation, which demonstrates that APOD could benefit real-world applications when sensitive information is limited.
    UniHPF : Universal Healthcare Predictive Framework with Zero Domain Knowledge. (arXiv:2207.09858v1 [cs.LG])
    Despite the abundance of Electronic Healthcare Records (EHR), its heterogeneity restricts the utilization of medical data in building predictive models. To address this challenge, we propose Universal Healthcare Predictive Framework (UniHPF), which requires no medical domain knowledge and minimal pre-processing for multiple prediction tasks. Experimental results demonstrate that UniHPF is capable of building large-scale EHR models that can process any form of medical data from distinct EHR systems. Our framework significantly outperforms baseline models in multi-source learning tasks, including transfer and pooled learning, while also showing comparable results when trained on a single medical dataset. To empirically demonstrate the efficacy of our work, we conducted extensive experiments using various datasets, model structures, and tasks. We believe that our findings can provide helpful insights for further research on the multi-source learning of EHRs.
    Bayesian Hyperparameter Optimization for Deep Neural Network-Based Network Intrusion Detection. (arXiv:2207.09902v1 [cs.CR])
    Traditional network intrusion detection approaches encounter feasibility and sustainability issues to combat modern, sophisticated, and unpredictable security attacks. Deep neural networks (DNN) have been successfully applied for intrusion detection problems. The optimal use of DNN-based classifiers requires careful tuning of the hyper-parameters. Manually tuning the hyperparameters is tedious, time-consuming, and computationally expensive. Hence, there is a need for an automatic technique to find optimal hyperparameters for the best use of DNN in intrusion detection. This paper proposes a novel Bayesian optimization-based framework for the automatic optimization of hyperparameters, ensuring the best DNN architecture. We evaluated the performance of the proposed framework on NSL-KDD, a benchmark dataset for network intrusion detection. The experimental results show the framework's effectiveness as the resultant DNN architecture demonstrates significantly higher intrusion detection performance than the random search optimization-based approach in terms of accuracy, precision, recall, and f1-score.
    Semantic uncertainty intervals for disentangled latent spaces. (arXiv:2207.10074v1 [cs.CV])
    Meaningful uncertainty quantification in computer vision requires reasoning about semantic information -- say, the hair color of the person in a photo or the location of a car on the street. To this end, recent breakthroughs in generative modeling allow us to represent semantic information in disentangled latent spaces, but providing uncertainties on the semantic latent variables has remained challenging. In this work, we provide principled uncertainty intervals that are guaranteed to contain the true semantic factors for any underlying generative model. The method does the following: (1) it uses quantile regression to output a heuristic uncertainty interval for each element in the latent space (2) calibrates these uncertainties such that they contain the true value of the latent for a new, unseen input. The endpoints of these calibrated intervals can then be propagated through the generator to produce interpretable uncertainty visualizations for each semantic factor. This technique reliably communicates semantically meaningful, principled, and instance-adaptive uncertainty in inverse problems like image super-resolution and image completion.
    Multigraph Topology Design for Cross-Silo Federated Learning. (arXiv:2207.09657v1 [cs.LG])
    Cross-silo federated learning utilizes a few hundred reliable data silos with high-speed access links to jointly train a model. While this approach becomes a popular setting in federated learning, designing a robust topology to reduce the training time is still an open problem. In this paper, we present a new multigraph topology for cross-silo federated learning. We first construct the multigraph using the overlay graph. We then parse this multigraph into different simple graphs with isolated nodes. The existence of isolated nodes allows us to perform model aggregation without waiting for other nodes, hence reducing the training time. We further propose a new distributed learning algorithm to use with our multigraph topology. The intensive experiments on public datasets show that our proposed method significantly reduces the training time compared with recent state-of-the-art topologies while ensuring convergence and maintaining the model's accuracy.
    Revisiting data augmentation for subspace clustering. (arXiv:2207.09728v1 [cs.LG])
    Subspace clustering is the classical problem of clustering a collection of data samples that approximately lie around several low-dimensional subspaces. The current state-of-the-art approaches for this problem are based on the self-expressive model which represents the samples as linear combination of other samples. However, these approaches require sufficiently well-spread samples for accurate representation which might not be necessarily accessible in many applications. In this paper, we shed light on this commonly neglected issue and argue that data distribution within each subspace plays a critical role in the success of self-expressive models. Our proposed solution to tackle this issue is motivated by the central role of data augmentation in the generalization power of deep neural networks. We propose two subspace clustering frameworks for both unsupervised and semi-supervised settings that use augmented samples as an enlarged dictionary to improve the quality of the self-expressive representation. We present an automatic augmentation strategy using a few labeled samples for the semi-supervised problem relying on the fact that the data samples lie in the union of multiple linear subspaces. Experimental results confirm the effectiveness of data augmentation, as it significantly improves the performance of general self-expressive models.
    Cancer Subtyping by Improved Transcriptomic Features Using Vector Quantized Variational Autoencoder. (arXiv:2207.09783v1 [cs.LG])
    Defining and separating cancer subtypes is essential for facilitating personalized therapy modality and prognosis of patients. The definition of subtypes has been constantly recalibrated as a result of our deepened understanding. During this recalibration, researchers often rely on clustering of cancer data to provide an intuitive visual reference that could reveal the intrinsic characteristics of subtypes. The data being clustered are often omics data such as transcriptomics that have strong correlations to the underlying biological mechanism. However, while existing studies have shown promising results, they suffer from issues associated with omics data: sample scarcity and high dimensionality. As such, existing methods often impose unrealistic assumptions to extract useful features from the data while avoiding overfitting to spurious correlations. In this paper, we propose to leverage a recent strong generative model, Vector Quantized Variational AutoEncoder (VQ-VAE), to tackle the data issues and extract informative latent features that are crucial to the quality of subsequent clustering by retaining only information relevant to reconstructing the input. VQ-VAE does not impose strict assumptions and hence its latent features are better representations of the input, capable of yielding superior clustering performance with any mainstream clustering method. Extensive experiments and medical analysis on multiple datasets comprising 10 distinct cancers demonstrate the VQ-VAE clustering results can significantly and robustly improve prognosis over prevalent subtyping systems.
    Test-Time Adaptation via Conjugate Pseudo-labels. (arXiv:2207.09640v1 [cs.LG])
    Test-time adaptation (TTA) refers to adapting neural networks to distribution shifts, with access to only the unlabeled test samples from the new domain at test-time. Prior TTA methods optimize over unsupervised objectives such as the entropy of model predictions in TENT [Wang et al., 2021], but it is unclear what exactly makes a good TTA loss. In this paper, we start by presenting a surprising phenomenon: if we attempt to meta-learn the best possible TTA loss over a wide class of functions, then we recover a function that is remarkably similar to (a temperature-scaled version of) the softmax-entropy employed by TENT. This only holds, however, if the classifier we are adapting is trained via cross-entropy; if trained via squared loss, a different best TTA loss emerges. To explain this phenomenon, we analyze TTA through the lens of the training losses's convex conjugate. We show that under natural conditions, this (unsupervised) conjugate function can be viewed as a good local approximation to the original supervised loss and indeed, it recovers the best losses found by meta-learning. This leads to a generic recipe that can be used to find a good TTA loss for any given supervised training loss function of a general class. Empirically, our approach consistently dominates other baselines over a wide range of benchmarks. Our approach is particularly of interest when applied to classifiers trained with novel loss functions, e.g., the recently-proposed PolyLoss, where it differs substantially from (and outperforms) an entropy-based loss. Further, we show that our approach can also be interpreted as a kind of self-training using a very specific soft label, which we refer to as the conjugate pseudolabel. Overall, our method provides a broad framework for better understanding and improving test-time adaptation. Code is available at https://github.com/locuslab/tta_conjugate.
    Correntropy-Based Logistic Regression with Automatic Relevance Determination for Robust Sparse Brain Activity Decoding. (arXiv:2207.09693v1 [cs.LG])
    Recent studies have utilized sparse classifications to predict categorical variables from high-dimensional brain activity signals to expose human's intentions and mental states, selecting the relevant features automatically in the model training process. However, existing sparse classification models will likely be prone to the performance degradation which is caused by noise inherent in the brain recordings. To address this issue, we aim to propose a new robust and sparse classification algorithm in this study. To this end, we introduce the correntropy learning framework into the automatic relevance determination based sparse classification model, proposing a new correntropy-based robust sparse logistic regression algorithm. To demonstrate the superior brain activity decoding performance of the proposed algorithm, we evaluate it on a synthetic dataset, an electroencephalogram (EEG) dataset, and a functional magnetic resonance imaging (fMRI) dataset. The extensive experimental results confirm that not only the proposed method can achieve higher classification accuracy in a noisy and high-dimensional classification task, but also it would select those more informative features for the decoding scenarios. Integrating the correntropy learning approach with the automatic relevance determination technique will significantly improve the robustness with respect to the noise, leading to more adequate robust sparse brain decoding algorithm. It provides a more powerful approach in the real-world brain activity decoding and the brain-computer interfaces.
    A Temporally and Spatially Local Spike-based Backpropagation Algorithm to Enable Training in Hardware. (arXiv:2207.09755v1 [cs.NE])
    Spiking Neural Networks (SNNs) have emerged as a hardware efficient architecture for classification tasks. The penalty of spikes-based encoding has been the lack of a universal training mechanism performed entirely using spikes. There have been several attempts to adopt the powerful backpropagation (BP) technique used in non-spiking artificial neural networks (ANN): (1) SNNs can be trained by externally computed numerical gradients. (2) A major advancement toward native spike-based learning has been the use of approximate Backpropagation using spike-time-dependent plasticity (STDP) with phased forward/backward passes. However, the transfer of information between such phases necessitates external memory and computational access. This is a challenge for neuromorphic hardware implementations. In this paper, we propose a stochastic SNN-based Back-Prop (SSNN-BP) algorithm that utilizes a composite neuron to simultaneously compute the forward pass activations and backward pass gradients explicitly with spikes. Although signed gradient values are a challenge for spike-based representation, we tackle this by splitting the gradient signal into positive and negative streams. The composite neuron encodes information in the form of stochastic spike-trains and converts Backpropagation weight updates into temporally and spatially local discrete STDP-like spike coincidence updates compatible with hardware-friendly Resistive Processing Units (RPUs). Furthermore, our method approaches BP ANN baseline with sufficiently long spike-trains. Finally, we show that softmax cross-entropy loss function can be implemented through inhibitory lateral connections enforcing a Winner Take All (WTA) rule. Our SNN shows excellent generalization through comparable performance to ANNs on the MNIST, Fashion-MNIST and Extended MNIST datasets. Thus, SSNN-BP enables BP compatible with purely spike-based neuromorphic hardware.
    Facial Affect Analysis: Learning from Synthetic Data & Multi-Task Learning Challenges. (arXiv:2207.09748v1 [cs.LG])
    Facial affect analysis remains a challenging task with its setting transitioned from lab-controlled to in-the-wild situations. In this paper, we present novel frameworks to handle the two challenges in the 4th Affective Behavior Analysis In-The-Wild (ABAW) competition: i) Multi-Task-Learning (MTL) Challenge and ii) Learning from Synthetic Data (LSD) Challenge. For MTL challenge, we adopt the SMM-EmotionNet with a better ensemble strategy of feature vectors. For LSD challenge, we propose respective methods to combat the problems of single labels, imbalanced distribution, fine-tuning limitations, and choice of model architectures. Experimental results on the official validation sets from the competition demonstrated that our proposed approaches outperformed baselines by a large margin. The code is available at https://github.com/sylyoung/ABAW4-HUST-ANT.
    ViGAT: Bottom-up event recognition and explanation in video using factorized graph attention network. (arXiv:2207.09927v1 [cs.CV])
    In this paper a pure-attention bottom-up approach, called ViGAT, that utilizes an object detector together with a Vision Transformer (ViT) backbone network to derive object and frame features, and a head network to process these features for the task of event recognition and explanation in video, is proposed. The ViGAT head consists of graph attention network (GAT) blocks factorized along the spatial and temporal dimensions in order to capture effectively both local and long-term dependencies between objects or frames. Moreover, using the weighted in-degrees (WiDs) derived from the adjacency matrices at the various GAT blocks, we show that the proposed architecture can identify the most salient objects and frames that explain the decision of the network. A comprehensive evaluation study is performed, demonstrating that the proposed approach provides state-of-the-art results on three large, publicly available video datasets (FCVID, Mini-Kinetics, ActivityNet).
    Efficient Privacy Preserving Logistic Regression for Horizontally Distributed Data. (arXiv:2202.02650v2 [cs.CR] UPDATED)
    Internet of Things devices are expanding rapidly and generating huge amount of data. There is an increasing need to explore data collected from these devices. Collaborative learning provides a strategic solution for the Internet of Things settings but also raises public concern over data privacy. In recent years, large amount of privacy preserving techniques have been developed based on secure multi-party computation and differential privacy. A major challenge of collaborative learning is to balance disclosure risk and data utility while maintaining high computation efficiency. In this paper, we proposed privacy preserving logistic regression model using matrix encryption approach. The secure scheme is resilient to chosen plaintext attack, known plaintext attack, and collusion attack that could compromise any agencies in the collaborative learning. Encrypted model estimate is decrypted to provide true model results with no accuracy degradation. Verification phase is implemented to examine dishonest behavior among agencies. Experimental evaluations demonstrate fast convergence rate and high efficiency of proposed scheme.
    Forget-me-not! Contrastive Critics for Mitigating Posterior Collapse. (arXiv:2207.09535v1 [cs.LG])
    Variational autoencoders (VAEs) suffer from posterior collapse, where the powerful neural networks used for modeling and inference optimize the objective without meaningfully using the latent representation. We introduce inference critics that detect and incentivize against posterior collapse by requiring correspondence between latent variables and the observations. By connecting the critic's objective to the literature in self-supervised contrastive representation learning, we show both theoretically and empirically that optimizing inference critics increases the mutual information between observations and latents, mitigating posterior collapse. This approach is straightforward to implement and requires significantly less training time than prior methods, yet obtains competitive results on three established datasets. Overall, the approach lays the foundation to bridge the previously disconnected frameworks of contrastive learning and probabilistic modeling with variational autoencoders, underscoring the benefits both communities may find at their intersection.
    Sample Efficient Learning of Predictors that Complement Humans. (arXiv:2207.09584v1 [cs.LG])
    One of the goals of learning algorithms is to complement and reduce the burden on human decision makers. The expert deferral setting wherein an algorithm can either predict on its own or defer the decision to a downstream expert helps accomplish this goal. A fundamental aspect of this setting is the need to learn complementary predictors that improve on the human's weaknesses rather than learning predictors optimized for average error. In this work, we provide the first theoretical analysis of the benefit of learning complementary predictors in expert deferral. To enable efficiently learning such predictors, we consider a family of consistent surrogate loss functions for expert deferral and analyze their theoretical properties. Finally, we design active learning schemes that require minimal amount of data of human expert predictions in order to learn accurate deferral systems.
    Learning Sequence Representations by Non-local Recurrent Neural Memory. (arXiv:2207.09710v1 [cs.CV])
    The key challenge of sequence representation learning is to capture the long-range temporal dependencies. Typical methods for supervised sequence representation learning are built upon recurrent neural networks to capture temporal dependencies. One potential limitation of these methods is that they only model one-order information interactions explicitly between adjacent time steps in a sequence, hence the high-order interactions between nonadjacent time steps are not fully exploited. It greatly limits the capability of modeling the long-range temporal dependencies since the temporal features learned by one-order interactions cannot be maintained for a long term due to temporal information dilution and gradient vanishing. To tackle this limitation, we propose the Non-local Recurrent Neural Memory (NRNM) for supervised sequence representation learning, which performs non-local operations \MR{by means of self-attention mechanism} to learn full-order interactions within a sliding temporal memory block and models global interactions between memory blocks in a gated recurrent manner. Consequently, our model is able to capture long-range dependencies. Besides, the latent high-level features contained in high-order interactions can be distilled by our model. We validate the effectiveness and generalization of our NRNM on three types of sequence applications across different modalities, including sequence classification, step-wise sequential prediction and sequence similarity learning. Our model compares favorably against other state-of-the-art methods specifically designed for each of these sequence applications.
    Non-Uniform Diffusion Models. (arXiv:2207.09786v1 [cs.LG])
    Diffusion models have emerged as one of the most promising frameworks for deep generative modeling. In this work, we explore the potential of non-uniform diffusion models. We show that non-uniform diffusion leads to multi-scale diffusion models which have similar structure to this of multi-scale normalizing flows. We experimentally find that in the same or less training time, the multi-scale diffusion model achieves better FID score than the standard uniform diffusion model. More importantly, it generates samples $4.4$ times faster in $128\times 128$ resolution. The speed-up is expected to be higher in higher resolutions where more scales are used. Moreover, we show that non-uniform diffusion leads to a novel estimator for the conditional score function which achieves on par performance with the state-of-the-art conditional denoising estimator. Our theoretical and experimental findings are accompanied by an open source library MSDiff which can facilitate further research of non-uniform diffusion models.
    A Frequency-Velocity CNN for Developing Near-Surface 2D Vs Images from Linear-Array, Active-Source Wavefield Measurements. (arXiv:2207.09580v1 [cs.LG])
    This paper presents a frequency-velocity convolutional neural network (CNN) for rapid, non-invasive 2D shear wave velocity (Vs) imaging of near-surface geo-materials. Operating in the frequency-velocity domain allows for significant flexibility in the linear-array, active-source experimental testing configurations used for generating the CNN input, which are normalized dispersion images. Unlike wavefield images, normalized dispersion images are relatively insensitive to the experimental testing configuration, accommodating various source types, source offsets, numbers of receivers, and receiver spacings. We demonstrate the effectiveness of the frequency-velocity CNN by applying it to a classic near-surface geophysics problem, namely, imaging a two-layer, undulating, soil-over-bedrock interface. This problem was recently investigated in our group by developing a time-distance CNN, which showed great promise but lacked flexibility in utilizing different field-testing configurations. Herein, the new frequency-velocity CNN is shown to have comparable accuracy to the time-distance CNN while providing greater flexibility to handle varied field applications. The frequency-velocity CNN was trained, validated, and tested using 100,000 synthetic near-surface models. The ability of the proposed frequency-velocity CNN to generalize across various acquisition configurations is first tested using synthetic near-surface models with different acquisition configurations from that of the training set, and then applied to experimental field data collected at the Hornsby Bend site in Austin, Texas, USA. When fully developed for a wider range of geological conditions, the proposed CNN may ultimately be used as a rapid, end-to-end alternative for current pseudo-2D surface wave imaging techniques or to develop starting models for full waveform inversion.
    DC-BENCH: Dataset Condensation Benchmark. (arXiv:2207.09639v1 [cs.LG])
    Dataset Condensation is a newly emerging technique aiming at learning a tiny dataset that captures the rich information encoded in the original dataset. As the size of datasets contemporary machine learning models rely on becomes increasingly large, condensation methods become a prominent direction for accelerating network training and reducing data storage. Despite numerous methods have been proposed in this rapidly growing field, evaluating and comparing different condensation methods is non-trivial and still remains an open issue. The quality of condensed dataset are often shadowed by many critical contributing factors to the end performance, such as data augmentation and model architectures. The lack of a systematic way to evaluate and compare condensation methods not only hinders our understanding of existing techniques, but also discourages practical usage of the synthesized datasets. This work provides the first large-scale standardized benchmark on Dataset Condensation. It consists of a suite of evaluations to comprehensively reflect the generability and effectiveness of condensation methods through the lens of their generated dataset. Leveraging this benchmark, we conduct a large-scale study of current condensation methods, and report many insightful findings that open up new possibilities for future development. The benchmark library, including evaluators, baseline methods, and generated datasets, is open-sourced to facilitate future research and application.
    Feasible Adversarial Robust Reinforcement Learning for Underspecified Environments. (arXiv:2207.09597v1 [cs.LG])
    Robust reinforcement learning (RL) considers the problem of learning policies that perform well in the worst case among a set of possible environment parameter values. In real-world environments, choosing the set of possible values for robust RL can be a difficult task. When that set is specified too narrowly, the agent will be left vulnerable to reasonable parameter values unaccounted for. When specified too broadly, the agent will be too cautious. In this paper, we propose Feasible Adversarial Robust RL (FARR), a method for automatically determining the set of environment parameter values over which to be robust. FARR implicitly defines the set of feasible parameter values as those on which an agent could achieve a benchmark reward given enough training resources. By formulating this problem as a two-player zero-sum game, FARR jointly learns an adversarial distribution over parameter values with feasible support and a policy robust over this feasible parameter set. Using the PSRO algorithm to find an approximate Nash equilibrium in this FARR game, we show that an agent trained with FARR is more robust to feasible adversarial parameter selection than with existing minimax, domain-randomization, and regret objectives in a parameterized gridworld and three MuJoCo control environments.
    Deep Preconditioners and their application to seismic wavefield processing. (arXiv:2207.09938v1 [physics.geo-ph])
    Seismic data processing heavily relies on the solution of physics-driven inverse problems. In the presence of unfavourable data acquisition conditions (e.g., regular or irregular coarse sampling of sources and/or receivers), the underlying inverse problem becomes very ill-posed and prior information is required to obtain a satisfactory solution. Sparsity-promoting inversion, coupled with fixed-basis sparsifying transforms, represent the go-to approach for many processing tasks due to its simplicity of implementation and proven successful application in a variety of acquisition scenarios. Leveraging the ability of deep neural networks to find compact representations of complex, multi-dimensional vector spaces, we propose to train an AutoEncoder network to learn a direct mapping between the input seismic data and a representative latent manifold. The trained decoder is subsequently used as a nonlinear preconditioner for the physics-driven inverse problem at hand. Synthetic and field data are presented for a variety of seismic processing tasks and the proposed nonlinear, learned transformations are shown to outperform fixed-basis transforms and convergence faster to the sought solution.
    COVID-19 Detection from Respiratory Sounds with Hierarchical Spectrogram Transformers. (arXiv:2207.09529v1 [cs.SD])
    Monitoring of prevalent airborne diseases such as COVID-19 characteristically involve respiratory assessments. While auscultation is a mainstream method for symptomatic monitoring, its diagnostic utility is hampered by the need for dedicated hospital visits. Continual remote monitoring based on recordings of respiratory sounds on portable devices is a promising alternative, which can assist in screening of COVID-19. In this study, we introduce a novel deep learning approach to distinguish patients with COVID-19 from healthy controls given audio recordings of cough or breathing sounds. The proposed approach leverages a novel hierarchical spectrogram transformer (HST) on spectrogram representations of respiratory sounds. HST embodies self-attention mechanisms over local windows in spectrograms, and window size is progressively grown over model stages to capture local to global context. HST is compared against state-of-the-art conventional and deep-learning baselines. Comprehensive demonstrations on a multi-national dataset indicate that HST outperforms competing methods, achieving over 97% area under the receiver operating characteristic curve (AUC) in detecting COVID-19 cases.  ( 3 min )
    e3nn: Euclidean Neural Networks. (arXiv:2207.09453v1 [cs.LG])
    We present e3nn, a generalized framework for creating E(3) equivariant trainable functions, also known as Euclidean neural networks. e3nn naturally operates on geometry and geometric tensors that describe systems in 3D and transform predictably under a change of coordinate system. The core of e3nn are equivariant operations such as the TensorProduct class or the spherical harmonics functions that can be composed to create more complex modules such as convolutions and attention mechanisms. These core operations of e3nn can be used to efficiently articulate Tensor Field Networks, 3D Steerable CNNs, Clebsch-Gordan Networks, SE(3) Transformers and other E(3) equivariant networks.  ( 2 min )
    To update or not to update? Neurons at equilibrium in deep models. (arXiv:2207.09455v1 [cs.LG])
    Recent advances in deep learning optimization showed that, with some a-posteriori information on fully-trained models, it is possible to match the same performance by simply training a subset of their parameters. Such a discovery has a broad impact from theory to applications, driving the research towards methods to identify the minimum subset of parameters to train without look-ahead information exploitation. However, the methods proposed do not match the state-of-the-art performance, and rely on unstructured sparsely connected models. In this work we shift our focus from the single parameters to the behavior of the whole neuron, exploiting the concept of neuronal equilibrium (NEq). When a neuron is in a configuration at equilibrium (meaning that it has learned a specific input-output relationship), we can halt its update; on the contrary, when a neuron is at non-equilibrium, we let its state evolve towards an equilibrium state, updating its parameters. The proposed approach has been tested on different state-of-the-art learning strategies and tasks, validating NEq and observing that the neuronal equilibrium depends on the specific learning setup.  ( 2 min )
    Approximation Power of Deep Neural Networks: an explanatory mathematical survey. (arXiv:2207.09511v1 [cs.LG])
    The goal of this survey is to present an explanatory review of the approximation properties of deep neural networks. Specifically, we aim at understanding how and why deep neural networks outperform other classical linear and nonlinear approximation methods. This survey consists of three chapters. In Chapter 1 we review the key ideas and concepts underlying deep networks and their compositional nonlinear structure. We formalize the neural network problem by formulating it as an optimization problem when solving regression and classification problems. We briefly discuss the stochastic gradient descent algorithm and the back-propagation formulas used in solving the optimization problem and address a few issues related to the performance of neural networks, including the choice of activation functions, cost functions, overfitting issues, and regularization. In Chapter 2 we shift our focus to the approximation theory of neural networks. We start with an introduction to the concept of density in polynomial approximation and in particular study the Stone-Weierstrass theorem for real-valued continuous functions. Then, within the framework of linear approximation, we review a few classical results on the density and convergence rate of feedforward networks, followed by more recent developments on the complexity of deep networks in approximating Sobolev functions. In Chapter 3, utilizing nonlinear approximation theory, we further elaborate on the power of depth and approximation superiority of deep ReLU networks over other classical methods of nonlinear approximation.  ( 3 min )
    The Dice loss in the context of missing or empty labels: Introducing $\Phi$ and $\epsilon$. (arXiv:2207.09521v1 [cs.CV])
    Albeit the Dice loss is one of the dominant loss functions in medical image segmentation, most research omits a closer look at its derivative, i.e. the real motor of the optimization when using gradient descent. In this paper, we highlight the peculiar action of the Dice loss in the presence of missing or empty labels. First, we formulate a theoretical basis that gives a general description of the Dice loss and its derivative. It turns out that the choice of the reduction dimensions $\Phi$ and the smoothing term $\epsilon$ is non-trivial and greatly influences its behavior. We find and propose heuristic combinations of $\Phi$ and $\epsilon$ that work in a segmentation setting with either missing or empty labels. Second, we empirically validate these findings in a binary and multiclass segmentation setting using two publicly available datasets. We confirm that the choice of $\Phi$ and $\epsilon$ is indeed pivotal. With $\Phi$ chosen such that the reductions happen over a single batch (and class) element and with a negligible $\epsilon$, the Dice loss deals with missing labels naturally and performs similarly compared to recent adaptations specific for missing labels. With $\Phi$ chosen such that the reductions happen over multiple batch elements or with a heuristic value for $\epsilon$, the Dice loss handles empty labels correctly. We believe that this work highlights some essential perspectives and hope that it encourages researchers to better describe their exact implementation of the Dice loss in future work.  ( 3 min )
    Holistic Robust Data-Driven Decisions. (arXiv:2207.09560v1 [stat.ML])
    The design of data-driven formulations for machine learning and decision-making with good out-of-sample performance is a key challenge. The observation that good in-sample performance does not guarantee good out-of-sample performance is generally known as overfitting. Practical overfitting can typically not be attributed to a single cause but instead is caused by several factors all at once. We consider here three overfitting sources: (i) statistical error as a result of working with finite sample data, (ii) data noise which occurs when the data points are measured only with finite precision, and finally (iii) data misspecification in which a small fraction of all data may be wholly corrupted. We argue that although existing data-driven formulations may be robust against one of these three sources in isolation they do not provide holistic protection against all overfitting sources simultaneously. We design a novel data-driven formulation which does guarantee such holistic protection and is furthermore computationally viable. Our distributionally robust optimization formulation can be interpreted as a novel combination of a Kullback-Leibler and Levy-Prokhorov robust optimization formulation. Finally, we show how in the context of classification and regression problems several popular regularized and robust formulations reduce to a particular case of our proposed more general formulation.  ( 2 min )
    ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding. (arXiv:2207.09514v1 [eess.AS])
    This paper presents recent progress on integrating speech separation and enhancement (SSE) into the ESPnet toolkit. Compared with the previous ESPnet-SE work, numerous features have been added, including recent state-of-the-art speech enhancement models with their respective training and evaluation recipes. Importantly, a new interface has been designed to flexibly combine speech enhancement front-ends with other tasks, including automatic speech recognition (ASR), speech translation (ST), and spoken language understanding (SLU). To showcase such integration, we performed experiments on carefully designed synthetic datasets for noisy-reverberant multi-channel ST and SLU tasks, which can be used as benchmark corpora for future research. In addition to these new tasks, we also use CHiME-4 and WSJ0-2Mix to benchmark multi- and single-channel SE approaches. Results show that the integration of SE front-ends with back-end tasks is a promising research direction even for tasks besides ASR, especially in the multi-channel scenario. The code is available online at https://github.com/ESPnet/ESPnet. The multi-channel ST and SLU datasets, which are another contribution of this work, are released on HuggingFace.  ( 2 min )
    A Deep Learning Framework for Wind Turbine Repair Action Prediction Using Alarm Sequences and Long Short Term Memory Algorithms. (arXiv:2207.09457v1 [cs.LG])
    With an increasing emphasis on driving down the costs of Operations and Maintenance (O$\&$M) in the Offshore Wind (OSW) sector, comes the requirement to explore new methodology and applications of Deep Learning (DL) to the domain. Condition-based monitoring (CBM) has been at the forefront of recent research developing alarm-based systems and data-driven decision making. This paper provides a brief insight into the research being conducted in this area, with a specific focus on alarm sequence modelling and the associated challenges faced in its implementation. The paper proposes a novel idea to predict a set of relevant repair actions from an input sequence of alarm sequences, comparing Long Short-term Memory (LSTM) and Bidirectional LSTM (biLSTM) models. Achieving training accuracy results of up to 80.23$\%$, and test accuracy results of up to 76.01$\%$ with biLSTM gives a strong indication to the potential benefits of the proposed approach that can be furthered in future research. The paper introduces a framework that integrates the proposed approach into O$\&$M procedures and discusses the potential benefits which include the reduction of a confusing plethora of alarms, as well as unnecessary vessel transfers to the turbines for fault diagnosis and correction.  ( 3 min )
    Contaminant source identification in groundwater by means of artificial neural network. (arXiv:2207.09459v1 [cs.LG])
    In a desired environmental protection system, groundwater may not be excluded. In addition to the problem of over-exploitation, in total disagreement with the concept of sustainable development, another not negligible issue concerns the groundwater contamination. Mainly, this aspect is due to intensive agricultural activities or industrialized areas. In literature, several papers have dealt with transport problem, especially for inverse problems in which the release history or the source location are identified. The innovative aim of the paper is to develop a data-driven model that is able to analyze multiple scenarios, even strongly non-linear, in order to solve forward and inverse transport problems, preserving the reliability of the results and reducing the uncertainty. Furthermore, this tool has the characteristic of providing extremely fast responses, essential to identify remediation strategies immediately. The advantages produced by the model were compared with literature studies. In this regard, a feedforward artificial neural network, which has been trained to handle different cases, represents the data-driven model. Firstly, to identify the concentration of the pollutant at specific observation points in the study area (forward problem); secondly, to deal with inverse problems identifying the release history at known source location; then, in case of one contaminant source, identifying the release history and, at the same time, the location of the source in a specific sub-domain of the investigated area. At last, the observation error is investigated and estimated. The results are satisfactorily achieved, highlighting the capability of the ANN to deal with multiple scenarios by approximating nonlinear functions without the physical point of view that describes the phenomenon, providing reliable results, with very low computational burden and uncertainty.  ( 3 min )
    Revealing Secrets From Pre-trained Models. (arXiv:2207.09539v1 [cs.CR])
    With the growing burden of training deep learning models with large data sets, transfer-learning has been widely adopted in many emerging deep learning algorithms. Transformer models such as BERT are the main player in natural language processing and use transfer-learning as a de facto standard training method. A few big data companies release pre-trained models that are trained with a few popular datasets with which end users and researchers fine-tune the model with their own datasets. Transfer-learning significantly reduces the time and effort of training models. However, it comes at the cost of security concerns. In this paper, we show a new observation that pre-trained models and fine-tuned models have significantly high similarities in weight values. Also, we demonstrate that there exist vendor-specific computing patterns even for the same models. With these new findings, we propose a new model extraction attack that reveals the model architecture and the pre-trained model used by the black-box victim model with vendor-specific computing patterns and then estimates the entire model weights based on the weight value similarities between the fine-tuned model and pre-trained model. We also show that the weight similarity can be leveraged for increasing the model extraction feasibility through a novel weight extraction pruning.  ( 2 min )
  • Open

    A density peaks clustering algorithm with sparse search and K-d tree. (arXiv:2203.00973v2 [stat.ML] UPDATED)
    Density peaks clustering has become a nova of clustering algorithm because of its simplicity and practicality. However, there is one main drawback: it is time-consuming due to its high computational complexity. Herein, a density peaks clustering algorithm with sparse search and K-d tree is developed to solve this problem. Firstly, a sparse distance matrix is calculated by using K-d tree to replace the original full rank distance matrix, so as to accelerate the calculation of local density. Secondly, a sparse search strategy is proposed to accelerate the computation of relative-separation with the intersection between the set of $k$ nearest neighbors and the set consisting of the data points with larger local density for any data point. Furthermore, a second-order difference method for decision values is adopted to determine the cluster centers adaptively. Finally, experiments are carried out on datasets with different distribution characteristics, by comparing with other six state-of-the-art clustering algorithms. It is proved that the algorithm can effectively reduce the computational complexity of the original DPC from $O(n^2K)$ to $O(n(n^{1-1/K}+k))$. Especially for larger datasets, the efficiency is elevated more remarkably. Moreover, the clustering accuracy is also improved to a certain extent. Therefore, it can be concluded that the overall performance of the newly proposed algorithm is excellent.
    Generalized Kernel Thinning. (arXiv:2110.01593v5 [stat.ML] UPDATED)
    The kernel thinning (KT) algorithm of Dwivedi and Mackey (2021) compresses a probability distribution more effectively than independent sampling by targeting a reproducing kernel Hilbert space (RKHS) and leveraging a less smooth square-root kernel. Here we provide four improvements. First, we show that KT applied directly to the target RKHS yields tighter, dimension-free guarantees for any kernel, any distribution, and any fixed function in the RKHS. Second, we show that, for analytic kernels like Gaussian, inverse multiquadric, and sinc, target KT admits maximum mean discrepancy (MMD) guarantees comparable to or better than those of square-root KT without making explicit use of a square-root kernel. Third, we prove that KT with a fractional power kernel yields better-than-Monte-Carlo MMD guarantees for non-smooth kernels, like Laplace and Mat\'ern, that do not have square-roots. Fourth, we establish that KT applied to a sum of the target and power kernels (a procedure we call KT+) simultaneously inherits the improved MMD guarantees of power KT and the tighter individual function guarantees of target KT. In our experiments with target KT and KT+, we witness significant improvements in integration error even in $100$ dimensions and when compressing challenging differential equation posteriors.
    Flood Inflow Forecast Using L2-norm Ensemble Weighting Sea Surface Feature. (arXiv:2112.03108v2 [stat.ML] UPDATED)
    It is important to forecast dam inflow for flood damage mitigation. The hydrograph provides critical information such as the start time, peak level, and volume. Particularly, dam management requires a 6-h lead time of the dam inflow forecast based on a future hydrograph. The authors propose novel target inflow weights to create an ocean feature vector extracted from the analyzed images of the sea surface. We extracted 4,096 elements of the dimension vector in the fc6 layer of the pre-trained VGG16 network. Subsequently, we reduced it to three dimensions of t-SNE. Furthermore, we created the principal component of the sea temperature weights using PCA. We found that these weights contribute to the stability of predictor importance by numerical experiments. As base regression models, we calibrate the least squares with kernel expansion, the quantile random forest minimized out-of bag error, and the support vector regression with a polynomial kernel. When we compute the predictor importance, we visualize the stability of each variable importance introduced by our proposed weights, compared with other results without weights. We apply our method to a dam at Kanto region in Japan and focus on the trained term from 2007 to 2018, with a limited flood term from June to October. We test the accuracy over the 2019 flood term. Finally, we present the applied results and further statistical learning for unknown flood forecast.
    Efficient Privacy Preserving Logistic Regression for Horizontally Distributed Data. (arXiv:2202.02650v2 [cs.CR] UPDATED)
    Internet of Things devices are expanding rapidly and generating huge amount of data. There is an increasing need to explore data collected from these devices. Collaborative learning provides a strategic solution for the Internet of Things settings but also raises public concern over data privacy. In recent years, large amount of privacy preserving techniques have been developed based on secure multi-party computation and differential privacy. A major challenge of collaborative learning is to balance disclosure risk and data utility while maintaining high computation efficiency. In this paper, we proposed privacy preserving logistic regression model using matrix encryption approach. The secure scheme is resilient to chosen plaintext attack, known plaintext attack, and collusion attack that could compromise any agencies in the collaborative learning. Encrypted model estimate is decrypted to provide true model results with no accuracy degradation. Verification phase is implemented to examine dishonest behavior among agencies. Experimental evaluations demonstrate fast convergence rate and high efficiency of proposed scheme.
    Adaptive Step-Size Methods for Compressed SGD. (arXiv:2207.10046v1 [stat.ML])
    Compressed Stochastic Gradient Descent (SGD) algorithms have been recently proposed to address the communication bottleneck in distributed and decentralized optimization problems, such as those that arise in federated machine learning. Existing compressed SGD algorithms assume the use of non-adaptive step-sizes(constant or diminishing) to provide theoretical convergence guarantees. Typically, the step-sizes are fine-tuned in practice to the dataset and the learning algorithm to provide good empirical performance. Such fine-tuning might be impractical in many learning scenarios, and it is therefore of interest to study compressed SGD using adaptive step-sizes. Motivated by prior work on adaptive step-size methods for SGD to train neural networks efficiently in the uncompressed setting, we develop an adaptive step-size method for compressed SGD. In particular, we introduce a scaling technique for the descent step in compressed SGD, which we use to establish order-optimal convergence rates for convex-smooth and strong convex-smooth objectives under an interpolation condition and for non-convex objectives under a strong growth condition. We also show through simulation examples that without this scaling, the algorithm can fail to converge. We present experimental results on deep neural networks for real-world datasets, and compare the performance of our proposed algorithm with previously proposed compressed SGD methods in literature, and demonstrate improved performance on ResNet-18, ResNet-34 and DenseNet architectures for CIFAR-100 and CIFAR-10 datasets at various levels of compression.
    Multilevel Bayesian Deep Neural Networks. (arXiv:2203.12961v3 [stat.CO] UPDATED)
    In this article we consider Bayesian inference associated to deep neural networks (DNNs) and in particular, trace-class neural network (TNN) priors which were proposed by Sell et al. [39]. Such priors were developed as more robust alternatives to classical architectures in the context of inference problems. For this work we develop multilevel Monte Carlo (MLMC) methods for such models. MLMC is a popular variance reduction technique, with particular applications in Bayesian statistics and uncertainty quantification. We show how a particular advanced MLMC method that was introduced in [4] can be applied to Bayesian inference from DNNs and establish mathematically, that the computational cost to achieve a particular mean square error, associated to posterior expectation computation, can be reduced by several orders, versus more conventional techniques. To verify such results we provide numerous numerical experiments on model problems arising in machine learning. These include Bayesian regression, as well as Bayesian classification and reinforcement learning.
    Probable Domain Generalization via Quantile Risk Minimization. (arXiv:2207.09944v1 [stat.ML])
    Domain generalization (DG) seeks predictors which perform well on unseen test distributions by leveraging labeled training data from multiple related distributions or domains. To achieve this, the standard formulation optimizes for worst-case performance over the set of all possible domains. However, with worst-case shifts very unlikely in practice, this generally leads to overly-conservative solutions. In fact, a recent study found that no DG algorithm outperformed empirical risk minimization in terms of average performance. In this work, we argue that DG is neither a worst-case problem nor an average-case problem, but rather a probabilistic one. To this end, we propose a probabilistic framework for DG, which we call Probable Domain Generalization, wherein our key idea is that distribution shifts seen during training should inform us of probable shifts at test time. To realize this, we explicitly relate training and test domains as draws from the same underlying meta-distribution, and propose a new optimization problem -- Quantile Risk Minimization (QRM) -- which requires that predictors generalize with high probability. We then prove that QRM: (i) produces predictors that generalize to new domains with a desired probability, given sufficiently many domains and samples; and (ii) recovers the causal predictor as the desired probability of generalization approaches one. In our experiments, we introduce a more holistic quantile-focused evaluation protocol for DG, and show that our algorithms outperform state-of-the-art baselines on real and synthetic data.
    Distributionally Robust Batch Contextual Bandits. (arXiv:2006.05630v5 [cs.LG] UPDATED)
    Policy learning using historical observational data is an important problem that has found widespread applications. Examples include selecting offers, prices, advertisements to send to customers, as well as selecting which medication to prescribe to a patient. However, existing literature rests on the crucial assumption that the future environment where the learned policy will be deployed is the same as the past environment that has generated the data -- an assumption that is often false or too coarse an approximation. In this paper, we lift this assumption and aim to learn a distributionally robust policy with incomplete observational data. We first present a policy evaluation procedure that allows us to assess how well the policy does under the worst-case environment shift. We then establish a central limit theorem type guarantee for this proposed policy evaluation scheme. Leveraging this evaluation scheme, we further propose a novel learning algorithm that is able to learn a policy that is robust to adversarial perturbations and unknown covariate shifts with a performance guarantee based on the theory of uniform convergence. Finally, we empirically test the effectiveness of our proposed algorithm in synthetic datasets and demonstrate that it provides the robustness that is missing using standard policy learning algorithms. We conclude the paper by providing a comprehensive application of our methods in the context of a real-world voting dataset.
    Rayleigh-Gauss-Newton optimization with enhanced sampling for variational Monte Carlo. (arXiv:2106.10558v4 [stat.ML] UPDATED)
    Variational Monte Carlo (VMC) is an approach for computing ground-state wavefunctions that has recently become more powerful due to the introduction of neural network-based wavefunction parametrizations. However, efficiently training neural wavefunctions to converge to an energy minimum remains a difficult problem. In this work, we analyze optimization and sampling methods used in VMC and introduce alterations to improve their performance. First, based on theoretical convergence analysis in a noiseless setting, we motivate a new optimizer that we call the Rayleigh-Gauss-Newton method, which can improve upon gradient descent and natural gradient descent to achieve superlinear convergence at no more than twice the computational cost. Second, in order to realize this favorable comparison in the presence of stochastic noise, we analyze the effect of sampling error on VMC parameter updates and experimentally demonstrate that it can be reduced by the parallel tempering method. In particular, we demonstrate that RGN can be made robust to energy spikes that occur when the sampler moves between metastable regions of configuration space. Finally, putting theory into practice, we apply our enhanced optimization and sampling methods to the transverse-field Ising and XXZ models on large lattices, yielding ground-state energy estimates with remarkably high accuracy after just 200 parameter updates.
    Learning Counterfactually Invariant Predictors. (arXiv:2207.09768v1 [cs.LG])
    We propose a method to learn predictors that are invariant under counterfactual changes of certain covariates. This method is useful when the prediction target is causally influenced by covariates that should not affect the predictor output. For instance, an object recognition model may be influenced by position, orientation, or scale of the object itself. We address the problem of training predictors that are explicitly counterfactually invariant to changes of such covariates. We propose a model-agnostic regularization term based on conditional kernel mean embeddings, to enforce counterfactual invariance during training. We prove the soundness of our method, which can handle mixed categorical and continuous multi-variate attributes. Empirical results on synthetic and real-world data demonstrate the efficacy of our method in a variety of settings.
    Align-Deform-Subtract: An Interventional Framework for Explaining Object Differences. (arXiv:2203.04694v2 [cs.CV] UPDATED)
    Given two object images, how can we explain their differences in terms of the underlying object properties? To address this question, we propose Align-Deform-Subtract (ADS) -- an interventional framework for explaining object differences. By leveraging semantic alignments in image-space as counterfactual interventions on the underlying object properties, ADS iteratively quantifies and removes differences in object properties. The result is a set of "disentangled" error measures which explain object differences in terms of the underlying properties. Experiments on real and synthetic data illustrate the efficacy of the framework.
    Holistic Robust Data-Driven Decisions. (arXiv:2207.09560v1 [stat.ML])
    The design of data-driven formulations for machine learning and decision-making with good out-of-sample performance is a key challenge. The observation that good in-sample performance does not guarantee good out-of-sample performance is generally known as overfitting. Practical overfitting can typically not be attributed to a single cause but instead is caused by several factors all at once. We consider here three overfitting sources: (i) statistical error as a result of working with finite sample data, (ii) data noise which occurs when the data points are measured only with finite precision, and finally (iii) data misspecification in which a small fraction of all data may be wholly corrupted. We argue that although existing data-driven formulations may be robust against one of these three sources in isolation they do not provide holistic protection against all overfitting sources simultaneously. We design a novel data-driven formulation which does guarantee such holistic protection and is furthermore computationally viable. Our distributionally robust optimization formulation can be interpreted as a novel combination of a Kullback-Leibler and Levy-Prokhorov robust optimization formulation. Finally, we show how in the context of classification and regression problems several popular regularized and robust formulations reduce to a particular case of our proposed more general formulation.
    Measuring and signing fairness as performance under multiple stakeholder distributions. (arXiv:2207.09960v1 [stat.ML])
    As learning machines increase their influence on decisions concerning human lives, analyzing their fairness properties becomes a subject of central importance. Yet, our best tools for measuring the fairness of learning systems are rigid fairness metrics encapsulated as mathematical one-liners, offer limited power to the stakeholders involved in the prediction task, and are easy to manipulate when we exhort excessive pressure to optimize them. To advance these issues, we propose to shift focus from shaping fairness metrics to curating the distributions of examples under which these are computed. In particular, we posit that every claim about fairness should be immediately followed by the tagline "Fair under what examples, and collected by whom?". By highlighting connections to the literature in domain generalization, we propose to measure fairness as the ability of the system to generalize under multiple stress tests -- distributions of examples with social relevance. We encourage each stakeholder to curate one or multiple stress tests containing examples reflecting their (possibly conflicting) interests. The machine passes or fails each stress test by falling short of or exceeding a pre-defined metric value. The test results involve all stakeholders in a discussion about how to improve the learning system, and provide flexible assessments of fairness dependent on context and based on interpretable data. We provide full implementation guidelines for stress testing, illustrate both the benefits and shortcomings of this framework, and introduce a cryptographic scheme to enable a degree of prediction accountability from system providers.
    Semantic uncertainty intervals for disentangled latent spaces. (arXiv:2207.10074v1 [cs.CV])
    Meaningful uncertainty quantification in computer vision requires reasoning about semantic information -- say, the hair color of the person in a photo or the location of a car on the street. To this end, recent breakthroughs in generative modeling allow us to represent semantic information in disentangled latent spaces, but providing uncertainties on the semantic latent variables has remained challenging. In this work, we provide principled uncertainty intervals that are guaranteed to contain the true semantic factors for any underlying generative model. The method does the following: (1) it uses quantile regression to output a heuristic uncertainty interval for each element in the latent space (2) calibrates these uncertainties such that they contain the true value of the latent for a new, unseen input. The endpoints of these calibrated intervals can then be propagated through the generator to produce interpretable uncertainty visualizations for each semantic factor. This technique reliably communicates semantically meaningful, principled, and instance-adaptive uncertainty in inverse problems like image super-resolution and image completion.
    Kernel Thinning. (arXiv:2105.05842v8 [stat.ML] UPDATED)
    We introduce kernel thinning, a new procedure for compressing a distribution $\mathbb{P}$ more effectively than i.i.d. sampling or standard thinning. Given a suitable reproducing kernel $\mathbf{k}$ and $\mathcal{O}(n^2)$ time, kernel thinning compresses an $n$-point approximation to $\mathbb{P}$ into a $\sqrt{n}$-point approximation with comparable worst-case integration error across the associated reproducing kernel Hilbert space. With high probability, the maximum discrepancy in integration error is $\mathcal{O}_d(n^{-1/2}\sqrt{\log n})$ for compactly supported $\mathbb{P}$ and $\mathcal{O}_d(n^{-\frac{1}{2}} (\log n)^{(d+1)/2}\sqrt{\log\log n})$ for sub-exponential $\mathbb{P}$ on $\mathbb{R}^d$. In contrast, an equal-sized i.i.d. sample from $\mathbb{P}$ suffers $\Omega(n^{-1/4})$ integration error. Our sub-exponential guarantees resemble the classical quasi-Monte Carlo error rates for uniform $\mathbb{P}$ on $[0,1]^d$ but apply to general distributions on $\mathbb{R}^d$ and a wide range of common kernels. We use our results to derive explicit non-asymptotic maximum mean discrepancy bounds for Gaussian, Mat\'ern, and B-spline kernels and present two vignettes illustrating the practical benefits of kernel thinning over i.i.d. sampling and standard Markov chain Monte Carlo thinning, in dimensions $d=2$ through $100$.
    Intrinsic dimension estimation for discrete metrics. (arXiv:2207.09688v1 [stat.ML])
    Real world-datasets characterized by discrete features are ubiquitous: from categorical surveys to clinical questionnaires, from unweighted networks to DNA sequences. Nevertheless, the most common unsupervised dimensional reduction methods are designed for continuous spaces, and their use for discrete spaces can lead to errors and biases. In this letter we introduce an algorithm to infer the intrinsic dimension (ID) of datasets embedded in discrete spaces. We demonstrate its accuracy on benchmark datasets, and we apply it to analyze a metagenomic dataset for species fingerprinting, finding a surprisingly small ID, of order 2. This suggests that evolutive pressure acts on a low-dimensional manifold despite the high-dimensionality of sequences' space.
    Alternating minimization for generalized rank one matrix sensing: Sharp predictions from a random initialization. (arXiv:2207.09660v1 [math.OC])
    We consider the problem of estimating the factors of a rank-$1$ matrix with i.i.d. Gaussian, rank-$1$ measurements that are nonlinearly transformed and corrupted by noise. Considering two prototypical choices for the nonlinearity, we study the convergence properties of a natural alternating update rule for this nonconvex optimization problem starting from a random initialization. We show sharp convergence guarantees for a sample-split version of the algorithm by deriving a deterministic recursion that is accurate even in high-dimensional problems. Notably, while the infinite-sample population update is uninformative and suggests exact recovery in a single step, the algorithm -- and our deterministic prediction -- converges geometrically fast from a random initialization. Our sharp, non-asymptotic analysis also exposes several other fine-grained properties of this problem, including how the nonlinearity and noise level affect convergence behavior. On a technical level, our results are enabled by showing that the empirical error recursion can be predicted by our deterministic sequence within fluctuations of the order $n^{-1/2}$ when each iteration is run with $n$ observations. Our technique leverages leave-one-out tools originating in the literature on high-dimensional $M$-estimation and provides an avenue for sharply analyzing higher-order iterative algorithms from a random initialization in other high-dimensional optimization problems with random data.
    Journal Impact Factor and Peer Review Thoroughness and Helpfulness: A Supervised Machine Learning Study. (arXiv:2207.09821v1 [cs.DL])
    The journal impact factor (JIF) is often equated with journal quality and the quality of the peer review of the papers submitted to the journal. We examined the association between the content of peer review and JIF by analysing 10,000 peer review reports submitted to 1,644 medical and life sciences journals. Two researchers hand-coded a random sample of 2,000 sentences. We then trained machine learning models to classify all 187,240 sentences as contributing or not contributing to content categories. We examined the association between ten groups of journals defined by JIF deciles and the content of peer reviews using linear mixed-effects models, adjusting for the length of the review. The JIF ranged from 0.21 to 74.70. The length of peer reviews increased from the lowest (median number of words 185) to the JIF group (387 words). The proportion of sentences allocated to different content categories varied widely, even within JIF groups. For thoroughness, sentences on 'Materials and Methods' were more common in the highest JIF journals than in the lowest JIF group (difference of 7.8 percentage points; 95% CI 4.9 to 10.7%). The trend for 'Presentation and Reporting' went in the opposite direction, with the highest JIF journals giving less emphasis to such content (difference -8.9%; 95% CI -11.3 to -6.5%). For helpfulness, reviews for higher JIF journals devoted less attention to 'Suggestion and Solution' and provided fewer Examples than lower impact factor journals. No, or only small differences were evident for other content categories. In conclusion, peer review in journals with higher JIF tends to be more thorough in discussing the methods used but less helpful in terms of suggesting solutions and providing examples. Differences were modest and variability high, indicating that the JIF is a bad predictor for the quality of peer review of an individual manuscript.
    The Poisson binomial mechanism for secure and private federated learning. (arXiv:2207.09916v1 [cs.CR])
    We introduce the Poisson Binomial mechanism (PBM), a discrete differential privacy mechanism for distributed mean estimation (DME) with applications to federated learning and analytics. We provide a tight analysis of its privacy guarantees, showing that it achieves the same privacy-accuracy trade-offs as the continuous Gaussian mechanism. Our analysis is based on a novel bound on the R\'enyi divergence of two Poisson binomial distributions that may be of independent interest. Unlike previous discrete DP schemes based on additive noise, our mechanism encodes local information into a parameter of the binomial distribution, and hence the output distribution is discrete with bounded support. Moreover, the support does not increase as the privacy budget $\varepsilon \rightarrow 0$ as in the case of additive schemes which require the addition of more noise to achieve higher privacy; on the contrary, the support becomes smaller as $\varepsilon \rightarrow 0$. The bounded support enables us to combine our mechanism with secure aggregation (SecAgg), a multi-party cryptographic protocol, without the need of performing modular clipping which results in an unbiased estimator of the sum of the local vectors. This in turn allows us to apply it in the private FL setting and provide an upper bound on the convergence rate of the SGD algorithm. Moreover, since the support of the output distribution becomes smaller as $\varepsilon \rightarrow 0$, the communication cost of our scheme decreases with the privacy constraint $\varepsilon$, outperforming all previous distributed DP schemes based on additive noise in the high privacy or low communication regimes.
    Approximation Power of Deep Neural Networks: an explanatory mathematical survey. (arXiv:2207.09511v1 [cs.LG])
    The goal of this survey is to present an explanatory review of the approximation properties of deep neural networks. Specifically, we aim at understanding how and why deep neural networks outperform other classical linear and nonlinear approximation methods. This survey consists of three chapters. In Chapter 1 we review the key ideas and concepts underlying deep networks and their compositional nonlinear structure. We formalize the neural network problem by formulating it as an optimization problem when solving regression and classification problems. We briefly discuss the stochastic gradient descent algorithm and the back-propagation formulas used in solving the optimization problem and address a few issues related to the performance of neural networks, including the choice of activation functions, cost functions, overfitting issues, and regularization. In Chapter 2 we shift our focus to the approximation theory of neural networks. We start with an introduction to the concept of density in polynomial approximation and in particular study the Stone-Weierstrass theorem for real-valued continuous functions. Then, within the framework of linear approximation, we review a few classical results on the density and convergence rate of feedforward networks, followed by more recent developments on the complexity of deep networks in approximating Sobolev functions. In Chapter 3, utilizing nonlinear approximation theory, we further elaborate on the power of depth and approximation superiority of deep ReLU networks over other classical methods of nonlinear approximation.
    Stream-based active learning with linear models. (arXiv:2207.09874v1 [stat.ML])
    The proliferation of automated data collection schemes and the advances in sensorics are increasing the amount of data we are able to monitor in real-time. However, given the high annotation costs and the time required by quality inspections, data is often available in an unlabeled form. This is fostering the use of active learning for the development of soft sensors and predictive models. In production, instead of performing random inspections to obtain product information, labels are collected by evaluating the information content of the unlabeled data. Several query strategy frameworks for regression have been proposed in the literature but most of the focus has been dedicated to the static pool-based scenario. In this work, we propose a new strategy for the stream-based scenario, where instances are sequentially offered to the learner, which must instantaneously decide whether to perform the quality check to obtain the label or discard the instance. The approach is inspired by the optimal experimental design theory and the iterative aspect of the decision-making process is tackled by setting a threshold on the informativeness of the unlabeled data points. The proposed approach is evaluated using numerical simulations and the Tennessee Eastman Process simulator. The results confirm that selecting the examples suggested by the proposed algorithm allows for a faster reduction in the prediction error.
    Universal Regular Conditional Distributions. (arXiv:2105.07743v3 [cs.LG] UPDATED)
    We introduce a deep learning model which can generically approximate regular conditional distributions (RCDs). The proposed model operates in three phases: first linearizes inputs from a given metric space $\mathcal{X}$ to $\mathbb{R}^d$ via a feature map then, these linearized features are processed by a deep feedforward neural network, and the network's outputs are then translated to the $1$-Wasserstein space $\mathcal{P}_1(\mathbb{R}^D)$ via a probabilistic extension of the attention mechanism introduced by Bahdanau et al. (2014). We find that the models built using our framework can approximate any continuous function from $\mathbb{R}^d$ to $\mathcal{P}_1(\mathbb{R}^D)$ uniformly on compact sets, quantitatively. We identify two ways of avoiding the curse of dimensionality when approximating $\mathcal{P}_1(\mathbb{R}^D)$-valued functions. The first strategy describes functions in $C(\mathbb{R}^d,\mathcal{P}_1(\mathbb{R}^D))$ which can be efficiently approximated on any compact subset of $\mathbb{R}^d$. The second approach describes compact subsets of $\mathbb{R}^d$, on which any most in $C(\mathbb{R}^d,\mathcal{P}_1(\mathbb{R}^D))$ can be efficiently approximated. The results are verified experimentally.
    Error-in-variables modelling for operator learning. (arXiv:2204.10909v2 [cs.LG] UPDATED)
    Deep operator learning has emerged as a promising tool for reduced-order modelling and PDE model discovery. Leveraging the expressive power of deep neural networks, especially in high dimensions, such methods learn the mapping between functional state variables. While proposed methods have assumed noise only in the dependent variables, experimental and numerical data for operator learning typically exhibit noise in the independent variables as well, since both variables represent signals that are subject to measurement error. In regression on scalar data, failure to account for noisy independent variables can lead to biased parameter estimates. With noisy independent variables, linear models fitted via ordinary least squares (OLS) will show attenuation bias, wherein the slope will be underestimated. In this work, we derive an analogue of attenuation bias for linear operator regression with white noise in both the independent and dependent variables. In the nonlinear setting, we computationally demonstrate underprediction of the action of the Burgers operator in the presence of noise in the independent variable. We propose error-in-variables (EiV) models for two operator regression methods, MOR-Physics and DeepONet, and demonstrate that these new models reduce bias in the presence of noisy independent variables for a variety of operator learning problems. Considering the Burgers operator in 1D and 2D, we demonstrate that EiV operator learning robustly recovers operators in high-noise regimes that defeat OLS operator learning. We also introduce an EiV model for time-evolving PDE discovery and show that OLS and EiV perform similarly in learning the Kuramoto-Sivashinsky evolution operator from corrupted data, suggesting that the effect of bias in OLS operator learning depends on the regularity of the target operator.
    Provable Stochastic Optimization for Global Contrastive Learning: Small Batch Does Not Harm Performance. (arXiv:2202.12387v2 [cs.LG] UPDATED)
    In this paper, we study contrastive learning from an optimization perspective, aiming to analyze and address a fundamental issue of existing contrastive learning methods that either rely on a large batch size or a large dictionary of feature vectors. We consider a global objective for contrastive learning, which contrasts each positive pair with all negative pairs for an anchor point. From the optimization perspective, we explain why existing methods such as SimCLR require a large batch size in order to achieve a satisfactory result. In order to remove such requirement, we propose a memory-efficient Stochastic Optimization algorithm for solving the Global objective of Contrastive Learning of Representations, named SogCLR. We show that its optimization error is negligible under a reasonable condition after a sufficient number of iterations or is diminishing for a slightly different global contrastive objective. Empirically, we demonstrate that SogCLR with small batch size (e.g., 256) can achieve similar performance as SimCLR with large batch size (e.g., 8192) on self-supervised learning task on ImageNet-1K. We also attempt to show that the proposed optimization technique is generic and can be applied to solving other contrastive losses, e.g., two-way contrastive losses for bimodal contrastive learning. The proposed method is implemented in our open-sourced library LibAUC (www.libauc.org).
    Forget-me-not! Contrastive Critics for Mitigating Posterior Collapse. (arXiv:2207.09535v1 [cs.LG])
    Variational autoencoders (VAEs) suffer from posterior collapse, where the powerful neural networks used for modeling and inference optimize the objective without meaningfully using the latent representation. We introduce inference critics that detect and incentivize against posterior collapse by requiring correspondence between latent variables and the observations. By connecting the critic's objective to the literature in self-supervised contrastive representation learning, we show both theoretically and empirically that optimizing inference critics increases the mutual information between observations and latents, mitigating posterior collapse. This approach is straightforward to implement and requires significantly less training time than prior methods, yet obtains competitive results on three established datasets. Overall, the approach lays the foundation to bridge the previously disconnected frameworks of contrastive learning and probabilistic modeling with variational autoencoders, underscoring the benefits both communities may find at their intersection.
  • Open

    Helmholtz resonator revisited
    We finished a bottle of wine this evening, and I blew across the top as I often do. (Don’t worry: I only do this at home. If we’re ever in a restaurant together, I won’t embarrass you by blowing across the neck of an empty bottle.) The pitch sounded lower than I expected, so I […] Helmholtz resonator revisited first appeared on John D. Cook.  ( 5 min )
  • Open

    Practical Deep Learning for Coders 2022
    Contents A new edition About the course Students and results About deep learning The lessons 1: Getting started 2: Deployment 3: Neural net foundations 4: Natural Language (NLP) 5: From-scratch model 6: Random forests 7: Collaborative filtering and embeddings 8: Convolutions (CNNs) A vibrant community Get started A new edition Today we’re releasing Practical Deep Learning for Coders 2022—a complete from-scratch rewrite of fast.ai’s most popular course, that’s been two years in the making. Previous fast.ai courses have been studied by hundreds of thousands of students, from all walks of life, from all parts of the world. fast.ai’s videos have been viewed over 6,000,000 times already! The major differences are: A much bigger focus on interactive explorations. Students in the course build…  ( 6 min )

  • Open

    Running Dall-e mini on Windows? (Or: Are there any equivalent text-to-image AI's I can run on a windows PC with a 2080 TI?)
    Hello! I started the journey of running Dall-e Mini locally on my windows PC. I'm no python expert, and I ran into a problem installing Jax, which seems to be a requirement. Apparently there are no publicly available windows builds (and building for windows seems to be somewhat of a pain!) I guess I have a few questions: Can anyone help me get past this roadblock? Are there any pre-packaged builds of Dall-e Mini for Windows? Are there any other text-to-image networks that I can run on my Windows PC with a 2080 TI. Thank you! submitted by /u/CapnFlisto [link] [comments]  ( 86 min )
    Mirror of Another World
    submitted by /u/Weak_Individual_2010 [link] [comments]  ( 86 min )
    New Robot Eye Improves Computer Vision | New Nvidia Quantum Computing Platform | Soft Robot Heart | AI Diagnoses Fetal Ultrasound Birth Defects
    submitted by /u/tohelpyou88 [link] [comments]  ( 86 min )
    this video was supposed to be much longer, but I had technical problems and had to stopped the work.
    submitted by /u/nalr00n [link] [comments]  ( 86 min )
    Used some AI generated art for some of the backgrounds in my new animated short
    submitted by /u/zestysnacks [link] [comments]  ( 86 min )
    AI Asked Me Out on a Date - Talking To AI
    Hi, I created a channel in which AI GPT-3 will answer interesting questions. How do you like these videos? What questions would you like to ask AI GPT-3?AI submitted by /u/alexander0055 [link] [comments]  ( 86 min )
    Minqi Jiang, UCL, on environment and curriculum design for general RL agents
    Here is a podcast episode with Minqi Jiang where we discuss RL environment and curriculum design, open-endedness, emergent communication, and much more! submitted by /u/thejashGI [link] [comments]  ( 86 min )
    OpenAI python edge detection
    Prompt: Code a python script that takes an input image and outputs a version of the image with edge detection. ​ Output: import cv2 import numpy as np # read in image img = cv2.imread('image.jpg') # convert to grayscale gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # apply Gaussian blur blur = cv2.GaussianBlur(gray, (5, 5), 0) # apply Canny edge detection canny = cv2.Canny(blur, 50, 150) # show image cv2.imshow('canny', canny) cv2.waitKey(0) Input image: ​ https://preview.redd.it/kj0353mz6sc91.jpg?width=799&format=pjpg&auto=webp&s=d96277a1c38ca13a3324e30ea540870e61b93f2b Output: ​ https://preview.redd.it/zirhfo9d7sc91.png?width=799&format=png&auto=webp&s=1a1997d59bed934e96f1d09eeea8996efc21ee5e submitted by /u/fmurph22 [link] [comments]  ( 86 min )
    OpenAI announces pricing for DALL-E 2: AI images are almost free
    submitted by /u/much_successes [link] [comments]  ( 86 min )
    Is there any more big AI text generation models available for free/trial?
    So, I've been experimenting with a lot of different text generation models lately. And I was wondering if there's any more of them I can access. I know about j1-jumbo (178b), GPT-3 (176b), BLOOM (176b) and YaLM (100b). They're available for public and you can access them. Is there any more big models that you can use for free or a limited trial? Please, don't recommend me any models/services that uses GPT-J, GPT-Neo or any other model with less than 21 billion parameters. submitted by /u/Nilaier_Music [link] [comments]  ( 86 min )
    Hey, I've kicked off weekly Discord meetups for Cohere called co:lab friday. This Friday 5 pm CET we will be looking at article recommender demo built with Cohere API. Come join me https://discord.gg/6tpzwNENgd
    submitted by /u/techn0_cratic [link] [comments]  ( 91 min )
    AGI/AI in FinTech Workshop
    submitted by /u/akolonin [link] [comments]  ( 91 min )
    AI Dream 1hour EPIC 1000 Subscribers Celebration!
    submitted by /u/LordPewPew777 [link] [comments]  ( 86 min )
    In this iteration: an amazing new model taking sketches and text to generate images and learn more about the risks behind powerful models like Dalle 2!
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 86 min )
    Anyone what these slightly strange videos are about on youtube... something to do with AI?
    I was googling around as I'm starting to get interested in AI and these videos came up in the search, I clicked on it despite thinking it was going to be low value but am now intrigued as to what they are for! Any Ideas? https://www.youtube.com/watch?v=RfNtuHQ42v8 submitted by /u/timjwes [link] [comments]  ( 86 min )
    No-Code AI: Integrate the NLP Cloud API Into A Bubble.io App
    Hello, Thanks to the rise of no-code platforms like Bubble.io and the creation of brand new cutting-edge AI models based on Transformers, like GPT-3, GPT-J, GPT-NeoX, Bart, and more, it is now possible to create advanced AI applications without writing a single line of code. I just made an article that shows how to connect the NLP Cloud API to a Bubble.io application in order to perform advanced AI operations like summarization, paraphrasing, NER, question answering, blog post generation, product description creation, and much more: https://nlpcloud.com/no-code-ai-integrate-nlp-cloud-api-into-bubble-io.html I hope that you will find this tuto useful! Please don't hesitate to comment! Julien submitted by /u/juliensalinas [link] [comments]  ( 91 min )
    0.5 ETH 1:1 NFT - Full 3D Animation + Music - AremStudio Collection on Foundation - Thank you for support🙏🏾 🤍
    submitted by /u/aremstudio [link] [comments]  ( 91 min )
    A tool that asks you to provide a sentence and gives you back SQL query on it.
    submitted by /u/mergisi [link] [comments]  ( 86 min )
    Genesis of Existence | Quantum Atomic Nebulae | 300 Subs Celebratory Video! | 4K UHD | 24 FPS
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 86 min )
    The police of the future. Hyper-skilled IA Agents
    submitted by /u/the_anonymizer [link] [comments]  ( 86 min )
    Well that's weird
    submitted by /u/Aditya-aka-Ishu [link] [comments]  ( 91 min )
    Death in spring
    submitted by /u/Hacknaut [link] [comments]  ( 86 min )
  • Open

    [P] AI Upscaler for movies with M1 support
    I played for awhile with some smaller AIs myself. I just saw that some people used pretrained networks like ESRGAN to upscale whole movies (mostly LOTR). Does somebody know a project like this one which is optimized/utilizes the M1 (especially the neural engine) which can be easily used to play around with some own .mp4 files? submitted by /u/Hard_Veur [link] [comments]  ( 87 min )
    [D] Minqi Jiang, UCL, on environment and curriculum design for general RL agents
    Here is a podcast episode with Minqi Jiang where we discuss RL environment and curriculum design, open-endedness, emergent communication, and much more! submitted by /u/thejashGI [link] [comments]  ( 87 min )
    [R] Beyond neural scaling laws: beating power law scaling via data pruning - Meta AI
    Paper: https://arxiv.org/abs/2206.14486 Abstract: Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning. However, these improvements through scaling alone require considerable costs in compute and energy. Here we focus on the scaling of error with dataset size and show how both in theory and practice we can break beyond power law scaling and reduce it to exponential scaling instead if we have access to a high-quality data pruning metric that ranks the order in which training examples should be discarded to achieve any pruned dataset size. We then test this new exponential scaling prediction with pruned dataset size empirically, and indeed observe better than power law scaling performance on ResNets trained on CIFAR-10, SVHN, and ImageNet. Given the importance of finding high-quality pruning metrics, we perform the first large-scale benchmarking study of ten different data pruning metrics on ImageNet. We find most existing high performing metrics scale poorly to ImageNet, while the best are computationally intensive and require labels for every image. We therefore developed a new simple, cheap and scalable self-supervised pruning metric that demonstrates comparable performance to the best supervised metrics. Overall, our work suggests that the discovery of good data-pruning metrics may provide a viable path forward to substantially improved neural scaling laws, thereby reducing the resource costs of modern deep learning. https://preview.redd.it/aii70mn86sc91.jpg?width=1126&format=pjpg&auto=webp&s=4a6dfd713384c0016d5433feba7028f8173e3347 https://preview.redd.it/lqj66on86sc91.jpg?width=1118&format=pjpg&auto=webp&s=7795db10157b2fb59b8778725643ad7ef3f47462 submitted by /u/Singularian2501 [link] [comments]  ( 89 min )
    [N] OpenAI blog post "DALL·E Now Available in Beta". DALL-E 2 is a text-to-image system. Pricing details are included. Commercial usage is now allowed.
    OpenAI blog post. How DALL·E Credits Work. Links to DALL-E Content policy and Terms of use, along with older archived versions. submitted by /u/Wiskkey [link] [comments]  ( 90 min )
    [N] ICML 2022 WiFi
    The ICML 2022 WiFi network is "ICML" and the password is "conference". Just in case anyone needs it! submitted by /u/noblepaldamar [link] [comments]  ( 88 min )
    [P] Fetch the Top ML tweets of the week!
    ​ 🍰 Slice of ML 🍰 Hi folks, We recently built out a nice CLI that allows you to fetch the top ML tweets of the day/week! You can see a demo in the video above. You can check out how we built it here & check out the repo here! submitted by /u/BlockDesigns [link] [comments]  ( 87 min )
    [D] Image to Image Translation with Vanilla style GANs?
    Has any work been done on modifying vanilla style GANs for tasks resembling typical image to image translation? Clarifying a bit, by "vanilla" I mean a typical GAN architecture like DCGAN with simply one generator and discriminator (as opposed to say CycleGAN). And by "image to image" I mean, say, feeding the generator an image as input instead of a latent vector. Now of course, that's not how traditional GANs are set up. That latent vector generally gets transformed into something with many channels, so you can't just plug an image in as input and be on your way. But there's nothing to say you couldn't change the architecture slightly or map the image through some transformer to fit it. Imagining the case then where we had images from two similar domains (perhaps bald faces and faces with hair), it would seem to me that such a generator might could learn to create the new domain from such an input (given enough images). My intuition then is it could do much the same as what's found in attempts like CycleGAN, but without the constraint that the output image resembles the input. The output may be realistic to the new domain, but there's no guarantee that the generator didn't hallucinate away most of the input features to get it there (but maybe it could be similar!) since this is an unpaired image scenario. Application might be more in the art side of things but wanted to know if this had been explored yet? submitted by /u/jshkk [link] [comments]  ( 89 min )
    [P] Getting Started with MLflow Model Registry
    TLDR; MLflow Model Registry allows you to keep track of different Machine Learning models and their versions, as well as tracking their changes, stages and artifacts. Link to the post: https://mlopshowto.com/keeping-your-machine-learning-models-on-the-right-track-getting-started-with-mlflow-part-2-bbc980a1f8dc The Companion Github Repo for this post contains a quickstarter project showcasing some of the capabilities of MLflow Model Registry submitted by /u/j0selit0342 [link] [comments]  ( 87 min )
    [D] Storing and dockerizing ML artefacts
    I am trying to think through my overall approach creating ML models and then pulling those models into containers for serving. Right now the overall approach is a little messy as there are a few various approaches being followed. What I would like to achieve is one way to store model artefacts and then pull those into dockers. It would be nice to build an ergonomic API around the underlying solution so it felt a little like using hugging face model zoo or similar. I would appreciate input on how people are solving this today! submitted by /u/ydennisy [link] [comments]  ( 88 min )
    [D] ACL Rolling Review June 2022
    Hi, did anyone receive reviews and meta-reviews for their ARR June submissions? We were told that the author's response is open before July 20, but none of our four submissions have received any reviews up until now. submitted by /u/bunsenfeng [link] [comments]  ( 87 min )
    [P] Canopy cover estimation from aerial images
    Hi, I would like to develop a model to estimate canopy cover from aerial images. Below are three examples from different growth stages: Canopy Cover - May Canopy Cover - June Canopy Cover - July So far, I have been thinking about three possible approaches: classification object detection segmentation In the case of segmentation, every image as a whole would be rated from 1 to 10 depending on the occurrence of the rows with missing cover (10 - all rows are perfect, 1 - all rows have empty spaces). The issue, in this case, is rating subjectivity - it is hard to determine the objective criteria to assign a class to a whole image. In the case of object detection, the idea is to rotate the images and then detect every row and for every row detect blank space. Compared to classification, it is possible to objectively determine what is a blank space, and then for every row, it would be possible to calculate the percentage of good coverage. In the case of segmentation, the annotation is too time-consuming and thus I believe it is better to consider other options. I would appreciate any comments or suggestions about the possible approach to solve this challenge. submitted by /u/ThickDoctor007 [link] [comments]  ( 109 min )
    [D] Software resources for Meta Reinforcement Learning
    For those of you working on Meta Reinforcement Learning, do you implement all your algorithms from scratch? do you clone a github repo and modify it according to your needs? or what do you do? I also found the python library learn2learn. It looks great, but haven't tried it and it seems it's not being maintained. Also I find the documentation is relatively poor. Does anyone have experience using this library? I was attempting to implement MAML with Vanilla Policy Gradient, but I have not been able to get it work. I know the original MAML paper used TRPO methods instead of vanilla policy gradient; but I have seen other papers such as this one that trained MAML with VPG. On that note, for the most experienced and willing to share: what are some of the tips/tricks you have for someone starting research on Meta RL? I am familiar with the theory, but I am having a hard time with implementations. submitted by /u/carlml [link] [comments]  ( 88 min )
    [R] Towards Geometric Deep Learning III: First Geometric Architectures (Blog Post)
    Geometric Deep Learning approaches a broad class of ML problems from the perspectives of symmetry and invariance, providing a common blueprint for neural network architectures as diverse as CNNs, GNNs, and Transformers. In a new series of posts, we study how these ideas have taken us from ancient Greece to convolutional neural networks. Blog post link. submitted by /u/hardmaru [link] [comments]  ( 87 min )
    [P] Fastest and most accurate version of the Exponential Smoothing (ETS) Algorithm for Python
    Recently, the Nixtla team released a new version of ETS for Python. The implementation, optimized using numba, is 400% faster than StatsModels and 1.5x faster than R's, with improved accuracy and robustness. With the Ray integration of StatsForecast for distributed computing, the ETS can fit 1,000,000 series in under 5 min for non-seasonal data and 25 minutes for seasonal data. The ETS algorithm is especially suited for data with seasonality and trend. ETS computes a weighted average over all observations in the input time series dataset as its prediction. In contrast to moving average methods with constant weights, ETS weights exponentially decrease over time, capturing long-term dependencies while prioritizing new observations. Please star ⭐️ the repo if you like it. :) https://github.com/Nixtla/statsforecast ​ https://preview.redd.it/lsc0igzdomc91.png?width=1700&format=png&auto=webp&s=341f0a69225a917edd4a15f37416b4b52f0a4726 https://preview.redd.it/uocfwrzfomc91.png?width=1574&format=png&auto=webp&s=eee58870c325dee50c6e544178f2ae04a546d2e2 submitted by /u/fedegarzar [link] [comments]  ( 91 min )
    [D] Best practice and tips & tricks to write scientific papers in LaTeX, with figures generated in Python, rapid prototyping in jupyter, and running larger experiments on supercomputers.
    I've been using Jupyter, and honestly, it is pretty good for rapid prototyping however it falls apart when I try to follow proper software engineering practices such as abstraction and encapsulation. Downloading a jupyter notebook as python code is honestly so dirty and there is a bunch of metadata that is not necessary. For the longest time, it was also not practical to upload a jupyter notebook to a version control system. I also find that I spend a lot of energy transferring files, scaling up, and moving my codebase over to a supercomputer to run more extensive experiments with more trials. So, I'm wondering what the best practices are for researchers following proper software engineering practices while scaling up to run larger experiments with more trials and keeping track of log files, model zoos, transferring figures to latex etc... submitted by /u/Studyr3ddit [link] [comments]  ( 94 min )
  • Open

    New Robot Eye Improves Computer Vision | New Nvidia Quantum Computing Platform | Soft Robot Heart | AI Diagnoses Fetal Ultrasound Birth Defects
    submitted by /u/tohelpyou88 [link] [comments]  ( 91 min )
    Learning material/courses to begin building neural networks?
    Sorry if this has been answered before (I'm sure it has but I'm having a difficult time finding an answer). If I'm wanting to learn to build neural networks, is there an appropriate path to this? I've seen many recommend "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems" by Geron so I've purchased this book. I'm not sure what courses or concepts I need to have a grasp on before working through the material in this book (Python? Statistics? Calc?) Thank you for any help or links to a good learning path to tackle neural networks. submitted by /u/redditTee123 [link] [comments]  ( 86 min )
  • Open

    Ratio test counterexample
    Given a sequence a1, a2, a3, let L be the limit of the ratio of consecutive terms: Then the series converges if L 1. However, that’s not the full story. Here is an example from Ernesto Cesàro (1859–1906) that shows the ratio test to be more subtle than […] Ratio test counterexample first appeared on John D. Cook.  ( 4 min )
    Whittaker and Watson
    Whittaker and Watson’s analysis textbook is a true classic. My only complaint about the book is that the typesetting is poor. I said years ago that I wish someone would redo the book in LaTeX and touch it up a bit. I found out while writing my previous post that in fact someone has done […] Whittaker and Watson first appeared on John D. Cook.  ( 5 min )
  • Open

    Minqi Jiang, UCL, on environment and curriculum design for general RL agents
    Here is a podcast episode with Minqi Jiang where we discuss RL environment and curriculum design, open-endedness, emergent communication, and much more! submitted by /u/thejashGI [link] [comments]  ( 118 min )
    Why can't my agent learn as optimally after giving it a new initialization position?
    So I'm training a robot to walk in simulation - things were going great, peaking at like 70m traveled in 40 seconds. Then I reoriented the joint positions of the legs and reassigned the frames of reference for each joint (e.g., made each leg section perpendicular/parallel to the others and set the new positions to 0 degrees) so it would be easier to calibrate the physical robot in the future. However, even with a brand new random policy, my agent is completely unable to match its former optimal reward, and is even struggling to learn at all. How is this possible? I'm not changing anything super fundamental about the robot - in theory the robot should still be able to move about like before, just with different joint angles because of the difference frame of reference. submitted by /u/TryLettingGo [link] [comments]  ( 87 min )
    How I trained a neural network to play my mobile game
    The Game I recently wrote a litte mobile game as an experiment. The game is a match 3 type of game (think Bejeweled or Candy Crush) where levels get harder gradually until you’re game over and have to start again at the first level. The Observation Space Except for some initial tutorial levels, the core of the game is a board of 7x7 tiles each containing one of the 5 basic colors or one of 7 special items. The number of possible board states lies somewhere between 10³⁴ and 10⁵². Game board with 7x7 tiles and 5 basic colors The Action Space The goal of the game is to group three or more of the same color together in order to create a match. By matching more than three at once you get special items with specific behavior (e.g. arial damage on activation). By allowing an agent to swipe …  ( 100 min )
    A newbie gym environment problem
    Could anyone help me please. I am working on the following toy problem. I have four buttons. Each button has a cost when it is pressed and also a probability of winning the game. I want to learn which button to press. To do this I am trying to make a gym environment. Here is my attempt: https://bpa.st/QTWA ​ # 4 buttons to start with class ButtonsEnv(Env): def __init__(self): # Which button we can press self.action_space = Discrete(4) # The state low = np.array([0]*4) high = np.array([1]*4) self.observation_space = Box(low, high, dtype=np.float32) # Set start state self.state = [0]*4 self.probabilities = [random() for _ in range(4)] self.costs = [random() for _ in range(4)] # Set episode length self.episode_length = 60 self.reward = 0 def step(self, action): self.reward = -self.costs[action] if random() < self.probabilities[action]: self.state[action] = 1 done = True else: done = False # Reduce episode length by 1 second self.episode_length -= 1 ​ # Apply temperature noise #self.state += random.randint(-1,1) # Set placeholder for info info = {} # Return step information return np.array(self.state, dtype=np.float32), self.reward, done, info ​ def render(self): # Implement viz print(self.state) def reset(self): # Reset state self.state = [0]*4 # Reset episode length self.episode_length = 60 self.reward = 0 return np.array(self.state, dtype=np.float32) ​ ​ I don't understand the different roles that observation_space and state are playing. Which of these would be used by any RL algorithm? ​ In my code the state doesn't provide any information about which button to press next. That is entirely in self.costs and self.probabilities. ​ Can anyone help me rewrite this properly please? submitted by /u/wiggyhat [link] [comments]  ( 87 min )
    Reinforcement Learning: How to train an agent to learn a general strategy to escape a maze?
    Hello, I am new to the reinforcement learning field. "Escaping maze" is one of the games people use to train and test their algorithms. In almost all blogs that I have seen, the maze is created once, and an agent is trained to escape that maze in the shortest possible time (or to optimize any other reward metric). After that, testing is also conducted for the same maze that was used in training. If my understanding up to this point is correct, then the following questions arise. 1) Isn't the resultant policy overfitted where the agent learns to solve only one particular maze? 2) I have seen some general strategies (like following the wall) that can be used to escape at least simple mazes (i.e., the maze does not have short-cuts via bridges or circular paths). Can we not train the agent to learn such a generalized strategy that is independent of the exact maze structure? I agree that the agent won't take the shortest path for all mazes, but at least it can be trained to escape the general maze structure. Shouldn't RL strive to learn general policy? I realize that a lot of RL-trained policies can deal with stochasticity related to an environment. But here, I am interested in obtaining a general policy that is not related to exact environment structure? Since the RL agent learns from its experience, I feel obtaining such a policy should be possible. Any comments/suggestions are welcome. Thanks! submitted by /u/CoffeeBean05 [link] [comments]  ( 88 min )
    What are the first 5 papers I should read on RL?
    I'm new to RL and would live to read (beyond the practical work I'm doing) about the most important discoveries/results from the field. What papers should I read? Thank you! submitted by /u/hcrx [link] [comments]  ( 89 min )
  • Open

    Build taxonomy-based contextual targeting using AWS Media Intelligence and Hugging Face BERT
    As new data privacy regulations like GDPR (General Data Protection Regulation, 2017) have come into effect, customers are under increased pressure to monetize media assets while abiding by the new rules. Monetizing media while respecting privacy regulations requires the ability to automatically extract granular metadata from assets like text, images, video, and audio files at […]  ( 10 min )
  • Open

    DALL·E Now Available in Beta
    We’ll invite 1 million people from our waitlist over the coming weeks. Users can create with DALL·E using free credits that refill every month, and buy additional credits in 115-generation increments for $15. Join DALL·E 2 waitlist DALL·E, the AI system that  ( 4 min )
  • Open

    DeepSpeed Compression: A composable library for extreme compression and zero-cost quantization
    Large-scale models are revolutionizing deep learning and AI research, driving major improvements in language understanding, generating creative texts, multi-lingual translation and many more. But despite their remarkable capabilities, the models’ large size creates latency and cost constraints that hinder the deployment of applications on top of them. In particular, increased inference time and memory consumption […] The post DeepSpeed Compression: A composable library for extreme compression and zero-cost quantization appeared first on Microsoft Research.  ( 16 min )
  • Open

    4 Ways AI is Shaping the Future of Interactive Games
    Gaming has moved from a niche sector to the mainstream. Games have become a part of everyday lexicon like never before, and the technological progress evident within game UIs has played a role. The gaming landscape is highly diverse. The post 4 Ways AI is Shaping the Future of Interactive Games appeared first on Data Science Central.  ( 19 min )
    6 Reasons Why Today’s Physical Security Teams Can’t Rely on Walkie-Talkie Radios
    Introduction Portable radio communication devices like walkie-talkie radios have supported security services for ages. Radio communication first gained traction during World War 1 when the military used Walkie-Talkie Radios exclusively to stay connected with their troops. Cut to today, we see security agents who are in charge of protecting people or property, using walkie-talkie radios… Read More »6 Reasons Why Today’s Physical Security Teams Can’t Rely on Walkie-Talkie Radios The post 6 Reasons Why Today’s Physical Security Teams Can’t Rely on Walkie-Talkie Radios appeared first on Data Science Central.  ( 19 min )
    DSC Weekly 19 July 2022: From Knowledge Graphs to Transformation as a Service
    Announcements Achieving endpoint visibility to ward off the threat of a breach has never been more important than it is in the age of data proliferation and hybrid workplaces. Multiple endpoints and locations heighten that risk, making it essential for CISOs and IT security teams to overcome common challenges. Find out how organizations can reach… Read More »DSC Weekly 19 July 2022: From Knowledge Graphs to Transformation as a Service The post DSC Weekly 19 July 2022: From Knowledge Graphs to Transformation as a Service appeared first on Data Science Central.  ( 22 min )
  • Open

    Lucid Motors’ Mike Bell on Software-Defined Innovation for the Luxury EV Brand
    AI and electric vehicle technology breakthroughs are transforming the automotive industry. These developments pave the way for new innovators, attracting technical prowess and design philosophies from Silicon Valley. Mike Bell, senior vice president of digital at Lucid Motors, sees continuous innovation coupled with over-the-air updates as key to designing sustainable, award-winning intelligent vehicles that provide Read article > The post Lucid Motors’ Mike Bell on Software-Defined Innovation for the Luxury EV Brand appeared first on NVIDIA Blog.  ( 4 min )
  • Open

    A technique to improve both fairness and accuracy in artificial intelligence
    Methods that make a machine-learning model’s predictions more accurate overall can reduce accuracy for underrepresented subgroups. A new approach can help.  ( 7 min )
  • Open

    Image Augmentation with Keras Preprocessing Layers and tf.image
    When we work on a machine learning problem related to images, not only we need to collect some images as training data, but also need to employ augmentation to create variations in the image. It is especially true for more complex object recognition problems. There are many ways for image augmentation. You may use some […] The post Image Augmentation with Keras Preprocessing Layers and tf.image appeared first on Machine Learning Mastery.  ( 27 min )
  • Open

    GNNRank: Learning Global Rankings from Pairwise Comparisons via Directed Graph Neural Networks. (arXiv:2202.00211v3 [cs.LG] UPDATED)
    Recovering global rankings from pairwise comparisons has wide applications from time synchronization to sports team ranking. Pairwise comparisons corresponding to matches in a competition can be construed as edges in a directed graph (digraph), whose nodes represent e.g. competitors with an unknown rank. In this paper, we introduce neural networks into the ranking recovery problem by proposing the so-called GNNRank, a trainable GNN-based framework with digraph embedding. Moreover, new objectives are devised to encode ranking upsets/violations. The framework involves a ranking score estimation approach, and adds an inductive bias by unfolding the Fiedler vector computation of the graph constructed from a learnable similarity matrix. Experimental results on extensive data sets show that our methods attain competitive and often superior performance against baselines, as well as showing promising transfer ability. Codes and preprocessed data are at: \url{https://github.com/SherylHYX/GNNRank}.  ( 2 min )
    Uncertainty Minimization for Personalized Federated Semi-Supervised Learning. (arXiv:2205.02438v2 [cs.LG] UPDATED)
    Since federated learning (FL) has been introduced as a decentralized learning technique with privacy preservation, statistical heterogeneity of distributed data stays the main obstacle to achieve robust performance and stable convergence in FL applications. Model personalization methods have been studied to overcome this problem. However, existing approaches are mainly under the prerequisite of fully labeled data, which is unrealistic in practice due to the requirement of expertise. The primary issue caused by partial-labeled condition is that, clients with deficient labeled data can suffer from unfair performance gain because they lack adequate insights of local distribution to customize the global model. To tackle this problem, 1) we propose a novel personalized semi-supervised learning paradigm which allows partial-labeled or unlabeled clients to seek labeling assistance from data-related clients (helper agents), thus to enhance their perception of local data; 2) based on this paradigm, we design an uncertainty-based data-relation metric to ensure that selected helpers can provide trustworthy pseudo labels instead of misleading the local training; 3) to mitigate the network overload introduced by helper searching, we further develop a helper selection protocol to achieve efficient communication with negligible performance sacrifice. Experiments show that our proposed method can obtain superior performance and more stable convergence than other related works with partial labeled data, especially in highly heterogeneous setting.  ( 3 min )
    FLAIR: Federated Learning Annotated Image Repository. (arXiv:2207.08869v1 [cs.LG])
    Cross-device federated learning is an emerging machine learning (ML) paradigm where a large population of devices collectively train an ML model while the data remains on the devices. This research field has a unique set of practical challenges, and to systematically make advances, new datasets curated to be compatible with this paradigm are needed. Existing federated learning benchmarks in the image domain do not accurately capture the scale and heterogeneity of many real-world use cases. We introduce FLAIR, a challenging large-scale annotated image dataset for multi-label classification suitable for federated learning. FLAIR has 429,078 images from 51,414 Flickr users and captures many of the intricacies typically encountered in federated learning, such as heterogeneous user data and a long-tailed label distribution. We implement multiple baselines in different learning setups for different tasks on this dataset. We believe FLAIR can serve as a challenging benchmark for advancing the state-of-the art in federated learning. Dataset access and the code for the benchmark are available at \url{https://github.com/apple/ml-flair}.  ( 2 min )
    Human-to-Robot Imitation in the Wild. (arXiv:2207.09450v1 [cs.RO])
    We approach the problem of learning by watching humans in the wild. While traditional approaches in Imitation and Reinforcement Learning are promising for learning in the real world, they are either sample inefficient or are constrained to lab settings. Meanwhile, there has been a lot of success in processing passive, unstructured human data. We propose tackling this problem via an efficient one-shot robot learning algorithm, centered around learning from a third-person perspective. We call our method WHIRL: In-the-Wild Human Imitating Robot Learning. WHIRL extracts a prior over the intent of the human demonstrator, using it to initialize our agent's policy. We introduce an efficient real-world policy learning scheme that improves using interactions. Our key contributions are a simple sampling-based policy optimization approach, a novel objective function for aligning human and robot videos as well as an exploration method to boost sample efficiency. We show one-shot generalization and success in real-world settings, including 20 different manipulation tasks in the wild. Videos and talk at https://human2robot.github.io  ( 2 min )
    Collaboration of Experts: Achieving 80% Top-1 Accuracy on ImageNet with 100M FLOPs. (arXiv:2107.03815v2 [cs.CV] UPDATED)
    In this paper, we propose a Collaboration of Experts (CoE) framework to pool together the expertise of multiple networks towards a common aim. Each expert is an individual network with expertise on a unique portion of the dataset, which enhances the collective capacity. Given a sample, an expert is selected by the delegator, which simultaneously outputs a rough prediction to support early termination. To fulfill this framework, we propose three modules to impel each model to play its role, namely weight generation module (WGM), label generation module (LGM) and variance calculation module (VCM). Our method achieves the state-of-the-art performance on ImageNet, 80.7% top-1 accuracy with 194M FLOPs. Combined with PWLU activation function and CondConv, CoE further achieves the accuracy of 80.0% with only 100M FLOPs for the first time. More importantly, our method is hardware friendly and achieves a 3-6x speedup compared with some existing conditional computation approaches.  ( 2 min )
    Using Neural Networks by Modelling Semi-Active Shock Absorber. (arXiv:2207.09141v1 [eess.SY])
    A permanently increasing number of on-board automotive control systems requires new approaches to their digital mapping that improves functionality in terms of adaptability and robustness as well as enables their easier on-line software update. As it can be concluded from many recent studies, various methods applying neural networks (NN) can be good candidates for relevant digital twin (DT) tools in automotive control system design, for example, for controller parameterization and condition monitoring. However, the NN-based DT has strong requirements to an adequate amount of data to be used in training and design. In this regard, the paper presents an approach, which demonstrates how the regression tasks can be efficiently handled by the modeling of a semi-active shock absorber within the DT framework. The approach is based on the adaptation of time series augmentation techniques to the stationary data that increases the variance of the latter. Such a solution gives a background to elaborate further data engineering methods for the data preparation of sophisticated databases.  ( 2 min )
    Bayesian Generational Population-Based Training. (arXiv:2207.09405v1 [cs.LG])
    Reinforcement learning (RL) offers the potential for training generally capable agents that can interact autonomously in the real world. However, one key limitation is the brittleness of RL algorithms to core hyperparameters and network architecture choice. Furthermore, non-stationarities such as evolving training data and increased agent complexity mean that different hyperparameters and architectures may be optimal at different points of training. This motivates AutoRL, a class of methods seeking to automate these design choices. One prominent class of AutoRL methods is Population-Based Training (PBT), which have led to impressive performance in several large scale settings. In this paper, we introduce two new innovations in PBT-style methods. First, we employ trust-region based Bayesian Optimization, enabling full coverage of the high-dimensional mixed hyperparameter search space. Second, we show that using a generational approach, we can also learn both architectures and hyperparameters jointly on-the-fly in a single training run. Leveraging the new highly parallelizable Brax physics engine, we show that these innovations lead to large performance gains, significantly outperforming the tuned baseline while learning entire configurations on the fly. Code is available at https://github.com/xingchenwan/bgpbt.  ( 2 min )
    Data Science and Machine Learning in Education. (arXiv:2207.09060v1 [physics.ed-ph])
    The growing role of data science (DS) and machine learning (ML) in high-energy physics (HEP) is well established and pertinent given the complex detectors, large data, sets and sophisticated analyses at the heart of HEP research. Moreover, exploiting symmetries inherent in physics data have inspired physics-informed ML as a vibrant sub-field of computer science research. HEP researchers benefit greatly from materials widely available materials for use in education, training and workforce development. They are also contributing to these materials and providing software to DS/ML-related fields. Increasingly, physics departments are offering courses at the intersection of DS, ML and physics, often using curricula developed by HEP researchers and involving open software and data used in HEP. In this white paper, we explore synergies between HEP research and DS/ML education, discuss opportunities and challenges at this intersection, and propose community activities that will be mutually beneficial.  ( 2 min )
    Learning inducing points and uncertainty on molecular data. (arXiv:2207.07654v2 [physics.chem-ph] UPDATED)
    Uncertainty control and scalability to large datasets are the two main issues for the deployment of Gaussian process models into the autonomous material and chemical space exploration pipelines. One way to address both of these issues is by introducing the latent inducing variables and choosing the right approximation for the marginal log-likelihood objective. Here, we show that variational learning of the inducing points in the high-dimensional molecular descriptor space significantly improves both the prediction quality and uncertainty estimates on test configurations from a sample molecular dynamics dataset. Additionally, we show that inducing points can learn to represent the configurations of the molecules of different types that were not present within the initialization set of inducing points. Among several evaluated approximate marginal log-likelihood objectives, we show that the predictive log-likelihood provides both the predictive quality comparable to the exact Gaussian process model and excellent uncertainty control. Finally, we comment on whether a machine learning model makes predictions by interpolating the molecular configurations in high-dimensional descriptor space. We show that despite our intuition, and even for densely sampled molecular dynamics datasets, most of the predictions are done in the extrapolation regime.  ( 2 min )
    PACTran: PAC-Bayesian Metrics for Estimating the Transferability of Pretrained Models to Classification Tasks. (arXiv:2203.05126v2 [cs.LG] UPDATED)
    With the increasing abundance of pretrained models in recent years, the problem of selecting the best pretrained checkpoint for a particular downstream classification task has been gaining increased attention. Although several methods have recently been proposed to tackle the selection problem (e.g. LEEP, H-score), these methods resort to applying heuristics that are not well motivated by learning theory. In this paper we present PACTran, a theoretically grounded family of metrics for pretrained model selection and transferability measurement. We first show how to derive PACTran metrics from the optimal PAC-Bayesian bound under the transfer learning setting. We then empirically evaluate three metric instantiations of PACTran on a number of vision tasks (VTAB) as well as a language-and-vision (OKVQA) task. An analysis of the results shows PACTran is a more consistent and effective transferability measure compared to existing selection methods.  ( 2 min )
    Generalization Bounds via Convex Analysis. (arXiv:2202.04985v3 [stat.ML] UPDATED)
    Since the celebrated works of Russo and Zou (2016,2019) and Xu and Raginsky (2017), it has been well known that the generalization error of supervised learning algorithms can be bounded in terms of the mutual information between their input and the output, given that the loss of any fixed hypothesis has a subgaussian tail. In this work, we generalize this result beyond the standard choice of Shannon's mutual information to measure the dependence between the input and the output. Our main result shows that it is indeed possible to replace the mutual information by any strongly convex function of the joint input-output distribution, with the subgaussianity condition on the losses replaced by a bound on an appropriately chosen norm capturing the geometry of the dependence measure. This allows us to derive a range of generalization bounds that are either entirely new or strengthen previously known ones. Examples include bounds stated in terms of $p$-norm divergences and the Wasserstein-2 distance, which are respectively applicable for heavy-tailed loss distributions and highly smooth loss functions. Our analysis is entirely based on elementary tools from convex analysis by tracking the growth of a potential function associated with the dependence measure and the loss function.  ( 3 min )
    Implicit Gradient Regularization. (arXiv:2009.11162v3 [cs.LG] UPDATED)
    Gradient descent can be surprisingly good at optimizing deep neural networks without overfitting and without explicit regularization. We find that the discrete steps of gradient descent implicitly regularize models by penalizing gradient descent trajectories that have large loss gradients. We call this Implicit Gradient Regularization (IGR) and we use backward error analysis to calculate the size of this regularization. We confirm empirically that implicit gradient regularization biases gradient descent toward flat minima, where test errors are small and solutions are robust to noisy parameter perturbations. Furthermore, we demonstrate that the implicit gradient regularization term can be used as an explicit regularizer, allowing us to control this gradient regularization directly. More broadly, our work indicates that backward error analysis is a useful theoretical approach to the perennial question of how learning rate, model size, and parameter regularization interact to determine the properties of overparameterized models optimized with gradient descent.  ( 2 min )
    Unrolled algorithms for group synchronization. (arXiv:2207.09418v1 [eess.SP])
    The group synchronization problem involves estimating a collection of group elements from noisy measurements of their pairwise ratios. This task is a key component in many computational problems, including the molecular reconstruction problem in single-particle cryo-electron microscopy (cryo-EM). The standard methods to estimate the group elements are based on iteratively applying linear and non-linear operators. Motivated by the structural similarity to deep neural networks, we adopt the concept of algorithm unrolling, where training data is used to optimize the algorithm. We design unrolled algorithms for several group synchronization instances, including synchronization over the group of 3-D rotations: the synchronization problem in cryo-EM. We also apply a similar approach to the multi-reference alignment problem. We show by numerical experiments that the unrolling strategy outperforms existing synchronization algorithms in a wide variety of scenarios.  ( 2 min )
    Lazy Estimation of Variable Importance for Large Neural Networks. (arXiv:2207.09097v1 [stat.ML])
    As opaque predictive models increasingly impact many areas of modern life, interest in quantifying the importance of a given input variable for making a specific prediction has grown. Recently, there has been a proliferation of model-agnostic methods to measure variable importance (VI) that analyze the difference in predictive power between a full model trained on all variables and a reduced model that excludes the variable(s) of interest. A bottleneck common to these methods is the estimation of the reduced model for each variable (or subset of variables), which is an expensive process that often does not come with theoretical guarantees. In this work, we propose a fast and flexible method for approximating the reduced model with important inferential guarantees. We replace the need for fully retraining a wide neural network by a linearization initialized at the full model parameters. By adding a ridge-like penalty to make the problem convex, we prove that when the ridge penalty parameter is sufficiently large, our method estimates the variable importance measure with an error rate of $O(\frac{1}{\sqrt{n}})$ where $n$ is the number of training samples. We also show that our estimator is asymptotically normal, enabling us to provide confidence bounds for the VI estimates. We demonstrate through simulations that our method is fast and accurate under several data-generating regimes, and we demonstrate its real-world applicability on a seasonal climate forecasting example.  ( 3 min )
    $\ell_\infty$-Robustness and Beyond: Unleashing Efficient Adversarial Training. (arXiv:2112.00378v2 [cs.LG] UPDATED)
    Neural networks are vulnerable to adversarial attacks: adding well-crafted, imperceptible perturbations to their input can modify their output. Adversarial training is one of the most effective approaches in training robust models against such attacks. However, it is much slower than vanilla training of neural networks since it needs to construct adversarial examples for the entire training data at every iteration, hampering its effectiveness. Recently, Fast Adversarial Training (FAT) was proposed that can obtain robust models efficiently. However, the reasons behind its success are not fully understood, and more importantly, it can only train robust models for $\ell_\infty$-bounded attacks as it uses FGSM during training. In this paper, by leveraging the theory of coreset selection, we show how selecting a small subset of training data provides a general, more principled approach toward reducing the time complexity of robust training. Unlike existing methods, our approach can be adapted to a wide variety of training objectives, including TRADES, $\ell_p$-PGD, and Perceptual Adversarial Training (PAT). Our experimental results indicate that our approach speeds up adversarial training by 2-3 times while experiencing a slight reduction in the clean and robust accuracy.  ( 3 min )
    Flexible learning of quantum states with generative query neural networks. (arXiv:2202.06804v2 [quant-ph] UPDATED)
    Deep neural networks are a powerful tool for the characterization of quantum states. Existing networks are typically trained with experimental data gathered from the specific quantum state that needs to be characterized. But is it possible to train a neural network offline and to make predictions about quantum states other than the ones used for the training? Here we introduce a model of network that can be trained with classically simulated data from a fiducial set of states and measurements, and can later be used to characterize quantum states that share structural similarities with the states in the fiducial set. With little guidance of quantum physics, the network builds its own data-driven representation of quantum states, and then uses it to predict the outcome statistics of quantum measurements that have not been performed yet. The state representation produced by the network can also be used for tasks beyond the prediction of outcome statistics, including clustering of quantum states and identification of different phases of matter. Our network model provides a flexible approach that can be applied to online learning scenarios, where predictions must be generated as soon as experimental data become available, and to blind learning scenarios where the learner has only access to an encrypted description of the quantum hardware.  ( 3 min )
    Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning. (arXiv:2207.09081v1 [cs.LG])
    As a pivotal component to attaining generalizable solutions in human intelligence, reasoning provides great potential for reinforcement learning (RL) agents' generalization towards varied goals by summarizing part-to-whole arguments and discovering cause-and-effect relations. However, how to discover and represent causalities remains a huge gap that hinders the development of causal RL. In this paper, we augment Goal-Conditioned RL (GCRL) with Causal Graph (CG), a structure built upon the relation between objects and events. We novelly formulate the GCRL problem into variational likelihood maximization with CG as latent variables. To optimize the derived objective, we propose a framework with theoretical performance guarantees that alternates between two steps: using interventional data to estimate the posterior of CG; using CG to learn generalizable models and interpretable policies. Due to the lack of public benchmarks that verify generalization capability under reasoning, we design nine tasks and then empirically show the effectiveness of the proposed method against five baselines on these tasks. Further theoretical analysis shows that our performance improvement is attributed to the virtuous cycle of causal discovery, transition modeling, and policy training, which aligns with the experimental evidence in extensive ablation studies.  ( 2 min )
    SafeDrug: Dual Molecular Graph Encoders for Recommending Effective and Safe Drug Combinations. (arXiv:2105.02711v2 [cs.LG] UPDATED)
    Medication recommendation is an essential task of AI for healthcare. Existing works focused on recommending drug combinations for patients with complex health conditions solely based on their electronic health records. Thus, they have the following limitations: (1) some important data such as drug molecule structures have not been utilized in the recommendation process. (2) drug-drug interactions (DDI) are modeled implicitly, which can lead to sub-optimal results. To address these limitations, we propose a DDI-controllable drug recommendation model named SafeDrug to leverage drugs' molecule structures and model DDIs explicitly. SafeDrug is equipped with a global message passing neural network (MPNN) module and a local bipartite learning module to fully encode the connectivity and functionality of drug molecules. SafeDrug also has a controllable loss function to control DDI levels in the recommended drug combinations effectively. On a benchmark dataset, our SafeDrug is relatively shown to reduce DDI by 19.43% and improves 2.88% on Jaccard similarity between recommended and actually prescribed drug combinations over previous approaches. Moreover, SafeDrug also requires much fewer parameters than previous deep learning-based approaches, leading to faster training by about 14% and around 2x speed-up in inference.  ( 3 min )
    GATE: Gated Additive Tree Ensemble for Tabular Classification and Regression. (arXiv:2207.08548v2 [cs.LG] UPDATED)
    We propose a novel high-performance, parameter and computationally efficient deep learning architecture for tabular data, Gated Additive Tree Ensemble(GATE). GATE uses a gating mechanism, inspired from GRU, as a feature representation learning unit with an in-built feature selection mechanism. We combine it with an ensemble of differentiable, non-linear decision trees, re-weighted with simple self-attention to predict our desired output. We demonstrate that GATE is a competitive alternative to SOTA approaches like GBDTs, NODE, FT Transformers, etc. by experiments on several public datasets (both classification and regression). The code will be uploaded as soon as the paper comes out of review.  ( 2 min )
    Federated Learning Aggregation: New Robust Algorithms with Guarantees. (arXiv:2205.10864v2 [stat.ML] UPDATED)
    Federated Learning has been recently proposed for distributed model training at the edge. The principle of this approach is to aggregate models learned on distributed clients to obtain a new more general "average" model (FedAvg). The resulting model is then redistributed to clients for further training. To date, the most popular federated learning algorithm uses coordinate-wise averaging of the model parameters for aggregation. In this paper, we carry out a complete general mathematical convergence analysis to evaluate aggregation strategies in a federated learning framework. From this, we derive novel aggregation algorithms which are able to modify their model architecture by differentiating client contributions according to the value of their losses. Moreover, we go beyond the assumptions introduced in theory, by evaluating the performance of these strategies and by comparing them with the one of FedAvg in classification tasks in both the IID and the Non-IID framework without additional hypothesis.  ( 2 min )
    A label-efficient two-sample test. (arXiv:2111.08861v5 [cs.LG] UPDATED)
    Two-sample tests evaluate whether two samples are realizations of the same distribution (the null hypothesis) or two different distributions (the alternative hypothesis). We consider a new setting for this problem where sample features are easily measured whereas sample labels are unknown and costly to obtain. Accordingly, we devise a three-stage framework in service of performing an effective two-sample test with only a small number of sample label queries: first, a classifier is trained with samples uniformly labeled to model the posterior probabilities of the labels; second, a novel query scheme dubbed \emph{bimodal query} is used to query labels of samples from both classes, and last, the classical Friedman-Rafsky (FR) two-sample test is performed on the queried samples. Theoretical analysis and extensive experiments performed on several datasets demonstrate that the proposed test controls the Type I error and has decreased Type II error relative to uniform querying and certainty-based querying. Source code for our algorithms and experimental results is available at \url{https://github.com/wayne0908/Label-Efficient-Two-Sample}.
    VoloGAN: Adversarial Domain Adaptation for Synthetic Depth Data. (arXiv:2207.09204v1 [cs.CV])
    We present VoloGAN, an adversarial domain adaptation network that translates synthetic RGB-D images of a high-quality 3D model of a person, into RGB-D images that could be generated with a consumer depth sensor. This system is especially useful to generate high amount training data for single-view 3D reconstruction algorithms replicating the real-world capture conditions, being able to imitate the style of different sensor types, for the same high-end 3D model database. The network uses a CycleGAN framework with a U-Net architecture for the generator and a discriminator inspired by SIV-GAN. We use different optimizers and learning rate schedules to train the generator and the discriminator. We further construct a loss function that considers image channels individually and, among other metrics, evaluates the structural similarity. We demonstrate that CycleGANs can be used to apply adversarial domain adaptation of synthetic 3D data to train a volumetric video generator model having only few training samples.
    MLGOPerf: An ML Guided Inliner to Optimize Performance. (arXiv:2207.08389v2 [cs.PL] UPDATED)
    For the past 25 years, we have witnessed an extensive application of Machine Learning to the Compiler space; the selection and the phase-ordering problem. However, limited works have been upstreamed into the state-of-the-art compilers, i.e., LLVM, to seamlessly integrate the former into the optimization pipeline of a compiler to be readily deployed by the user. MLGO was among the first of such projects and it only strives to reduce the code size of a binary with an ML-based Inliner using Reinforcement Learning. This paper presents MLGOPerf; the first end-to-end framework capable of optimizing performance using LLVM's ML-Inliner. It employs a secondary ML model to generate rewards used for training a retargeted Reinforcement learning agent, previously used as the primary model by MLGO. It does so by predicting the post-inlining speedup of a function under analysis and it enables a fast training framework for the primary model which otherwise wouldn't be practical. The experimental results show MLGOPerf is able to gain up to 1.8% and 2.2% with respect to LLVM's optimization at O3 when trained for performance on SPEC CPU2006 and Cbench benchmarks, respectively. Furthermore, the proposed approach provides up to 26% increased opportunities to autotune code regions for our benchmarks which can be translated into an additional 3.7% speedup value.
    Minimum Description Length Control. (arXiv:2207.08258v2 [cs.LG] UPDATED)
    We propose a novel framework for multitask reinforcement learning based on the minimum description length (MDL) principle. In this approach, which we term MDL-control (MDL-C), the agent learns the common structure among the tasks with which it is faced and then distills it into a simpler representation which facilitates faster convergence and generalization to new tasks. In doing so, MDL-C naturally balances adaptation to each task with epistemic uncertainty about the task distribution. We motivate MDL-C via formal connections between the MDL principle and Bayesian inference, derive theoretical performance guarantees, and demonstrate MDL-C's empirical effectiveness on both discrete and high-dimensional continuous control tasks. %Empirically, this framework is used to modify existing policy optimization approaches and improves their multitask performance in both discrete and high-dimensional continuous control problems.
    Deep learning generates custom-made logistic regression models for explaining how breast cancer subtypes are classified. (arXiv:2001.06988v2 [cs.LG] UPDATED)
    Differentiating the intrinsic subtypes of breast cancer is crucial for deciding the best treatment strategy. Deep learning can predict the subtypes from genetic information more accurately than conventional statistical methods, but to date, deep learning has not been directly utilized to examine which genes are associated with which subtypes. To clarify the mechanisms embedded in the intrinsic subtypes, we developed an explainable deep learning model called a point-wise linear (PWL) model that generates a custom-made logistic regression for each patient. Logistic regression, which is familiar to both physicians and medical informatics researchers, allows us to analyze the importance of the feature variables, and the PWL model harnesses these practical abilities of logistic regression. In this study, we show that analyzing breast cancer subtypes is clinically beneficial for patients and one of the best ways to validate the capability of the PWL model. First, we trained the PWL model with RNA-seq data to predict PAM50 intrinsic subtypes and applied it to the 41/50 genes of PAM50 through the subtype prediction task. Second, we developed a deep enrichment analysis method to reveal the relationships between the PAM50 subtypes and the copy numbers of breast cancer. Our findings showed that the PWL model utilized genes relevant to the cell cycle-related pathways. These preliminary successes in breast cancer subtype analysis demonstrate the potential of our analysis strategy to clarify the mechanisms underlying breast cancer and improve overall clinical outcomes.
    Actor-Critic based Improper Reinforcement Learning. (arXiv:2207.09090v1 [cs.LG])
    We consider an improper reinforcement learning setting where a learner is given $M$ base controllers for an unknown Markov decision process, and wishes to combine them optimally to produce a potentially new controller that can outperform each of the base ones. This can be useful in tuning across controllers, learnt possibly in mismatched or simulated environments, to obtain a good controller for a given target environment with relatively few trials. Towards this, we propose two algorithms: (1) a Policy Gradient-based approach; and (2) an algorithm that can switch between a simple Actor-Critic (AC) based scheme and a Natural Actor-Critic (NAC) scheme depending on the available information. Both algorithms operate over a class of improper mixtures of the given controllers. For the first case, we derive convergence rate guarantees assuming access to a gradient oracle. For the AC-based approach we provide convergence rate guarantees to a stationary point in the basic AC case and to a global optimum in the NAC case. Numerical results on (i) the standard control theoretic benchmark of stabilizing an cartpole; and (ii) a constrained queueing task show that our improper policy optimization algorithm can stabilize the system even when the base policies at its disposal are unstable.
    Is Vertical Logistic Regression Privacy-Preserving? A Comprehensive Privacy Analysis and Beyond. (arXiv:2207.09087v1 [cs.CR])
    We consider vertical logistic regression (VLR) trained with mini-batch gradient descent -- a setting which has attracted growing interest among industries and proven to be useful in a wide range of applications including finance and medical research. We provide a comprehensive and rigorous privacy analysis of VLR in a class of open-source Federated Learning frameworks, where the protocols might differ between one another, yet a procedure of obtaining local gradients is implicitly shared. We first consider the honest-but-curious threat model, in which the detailed implementation of protocol is neglected and only the shared procedure is assumed, which we abstract as an oracle. We find that even under this general setting, single-dimension feature and label can still be recovered from the other party under suitable constraints of batch size, thus demonstrating the potential vulnerability of all frameworks following the same philosophy. Then we look into a popular instantiation of the protocol based on Homomorphic Encryption (HE). We propose an active attack that significantly weaken the constraints on batch size in the previous analysis via generating and compressing auxiliary ciphertext. To address the privacy leakage within the HE-based protocol, we develop a simple-yet-effective countermeasure based on Differential Privacy (DP), and provide both utility and privacy guarantees for the updated algorithm. Finally, we empirically verify the effectiveness of our attack and defense on benchmark datasets. Altogether, our findings suggest that all vertical federated learning frameworks that solely depend on HE might contain severe privacy risks, and DP, which has already demonstrated its power in horizontal federated learning, can also play a crucial role in the vertical setting, especially when coupled with HE or secure multi-party computation (MPC) techniques.
    Multi-parametric Analysis for Mixed Integer Linear Programming: An Application to Transmission Planning and Congestion Control. (arXiv:2207.09325v1 [math.OC])
    Enhancing existing transmission lines is a useful tool to combat transmission congestion and guarantee transmission security with increasing demand and boosting the renewable energy source. This study concerns the selection of lines whose capacity should be expanded and by how much from the perspective of independent system operator (ISO) to minimize the system cost with the consideration of transmission line constraints and electricity generation and demand balance conditions, and incorporating ramp-up and startup ramp rates, shutdown ramp rates, ramp-down rate limits and minimum up and minimum down times. For that purpose, we develop the ISO unit commitment and economic dispatch model and show it as a right-hand side uncertainty multiple parametric analysis for the mixed integer linear programming (MILP) problem. We first relax the binary variable to continuous variables and employ the Lagrange method and Karush-Kuhn-Tucker conditions to obtain optimal solutions (optimal decision variables and objective function) and critical regions associated with active and inactive constraints. Further, we extend the traditional branch and bound method for the large-scale MILP problem by determining the upper bound of the problem at each node, then comparing the difference between the upper and lower bounds and reaching the approximate optimal solution within the decision makers' tolerated error range. In additional, the objective function's first derivative on the parameters of each line is used to inform the selection of lines to ease congestion and maximize social welfare. Finally, the amount of capacity upgrade will be chosen by balancing the cost-reduction rate of the objective function on parameters and the cost of the line upgrade. Our findings are supported by numerical simulation and provide transmission line planners with decision-making guidance.
    Regret Minimization with Noisy Observations. (arXiv:2207.09435v1 [cs.DS])
    In a typical optimization problem, the task is to pick one of a number of options with the lowest cost or the highest value. In practice, these cost/value quantities often come through processes such as measurement or machine learning, which are noisy, with quantifiable noise distributions. To take these noise distributions into account, one approach is to assume a prior for the values, use it to build a posterior, and then apply standard stochastic optimization to pick a solution. However, in many practical applications, such prior distributions may not be available. In this paper, we study such scenarios using a regret minimization model. In our model, the task is to pick the highest one out of $n$ values. The values are unknown and chosen by an adversary, but can be observed through noisy channels, where additive noises are stochastically drawn from known distributions. The goal is to minimize the regret of our selection, defined as the expected difference between the highest and the selected value on the worst-case choices of values. We show that the na\"ive algorithm of picking the highest observed value has regret arbitrarily worse than the optimum, even when $n = 2$ and the noises are unbiased in expectation. On the other hand, we propose an algorithm which gives a constant-approximation to the optimal regret for any $n$. Our algorithm is conceptually simple, computationally efficient, and requires only minimal knowledge of the noise distributions.
    Finite-Sample Maximum Likelihood Estimation of Location. (arXiv:2206.02348v2 [math.ST] UPDATED)
    We consider 1-dimensional location estimation, where we estimate a parameter $\lambda$ from $n$ samples $\lambda + \eta_i$, with each $\eta_i$ drawn i.i.d. from a known distribution $f$. For fixed $f$ the maximum-likelihood estimate (MLE) is well-known to be optimal in the limit as $n \to \infty$: it is asymptotically normal with variance matching the Cram\'er-Rao lower bound of $\frac{1}{n\mathcal{I}}$, where $\mathcal{I}$ is the Fisher information of $f$. However, this bound does not hold for finite $n$, or when $f$ varies with $n$. We show for arbitrary $f$ and $n$ that one can recover a similar theory based on the Fisher information of a smoothed version of $f$, where the smoothing radius decays with $n$.
    Signed Network Embedding with Application to Simultaneous Detection of Communities and Anomalies. (arXiv:2207.09324v1 [cs.SI])
    Signed networks are frequently observed in real life with additional sign information associated with each edge, yet such information has been largely ignored in existing network models. This paper develops a unified embedding model for signed networks to disentangle the intertwined balance structure and anomaly effect, which can greatly facilitate the downstream analysis, including community detection, anomaly detection, and network inference. The proposed model captures both balance structure and anomaly effect through a low rank plus sparse matrix decomposition, which are jointly estimated via a regularized formulation. Its theoretical guarantees are established in terms of asymptotic consistency and finite-sample probability bounds for network embedding, community detection and anomaly detection. The advantage of the proposed embedding model is also demonstrated through extensive numerical experiments on both synthetic networks and an international relation network.
    Long-term Reproducibility for Neural Architecture Search. (arXiv:2207.04821v2 [cs.LG] UPDATED)
    It is a sad reflection of modern academia that code is often ignored after publication -- there is no academic 'kudos' for bug fixes / maintenance. Code is often unavailable or, if available, contains bugs, is incomplete, or relies on out-of-date / unavailable libraries. This has a significant impact on reproducibility and general scientific progress. Neural Architecture Search (NAS) is no exception to this, with some prior work in reproducibility. However, we argue that these do not consider long-term reproducibility issues. We therefore propose a checklist for long-term NAS reproducibility. We evaluate our checklist against common NAS approaches along with proposing how we can retrospectively make these approaches more long-term reproducible.
    A coherence parameter characterizing generative compressed sensing with Fourier measurements. (arXiv:2207.09340v1 [cs.IT])
    In Bora et al. (2017), a mathematical framework was developed for compressed sensing guarantees in the setting where the measurement matrix is Gaussian and the signal structure is the range of a generative neural network (GNN). The problem of compressed sensing with GNNs has since been extensively analyzed when the measurement matrix and/or network weights follow a subgaussian distribution. We move beyond the subgaussian assumption, to measurement matrices that are derived by sampling uniformly at random rows of a unitary matrix (including subsampled Fourier measurements as a special case). Specifically, we prove the first known restricted isometry guarantee for generative compressed sensing with subsampled isometries, and provide recovery bounds with nearly order-optimal sample complexity, addressing an open problem of Scarlett et al. (2022, p. 10). Recovery efficacy is characterized by the coherence, a new parameter, which measures the interplay between the range of the network and the measurement matrix. Our approach relies on subspace counting arguments and ideas central to high-dimensional probability. Furthermore, we propose a regularization strategy for training GNNs to have favourable coherence with the measurement operator. We provide compelling numerical simulations that support this regularized training strategy: our strategy yields low coherence networks that require fewer measurements for signal recovery. This, together with our theoretical results, supports coherence as a natural quantity for characterizing generative compressed sensing with subsampled isometries.
    Algorithm and System Co-design for Efficient Subgraph-based Graph Representation Learning. (arXiv:2202.13538v2 [cs.LG] UPDATED)
    Subgraph-based graph representation learning (SGRL) has been recently proposed to deal with some fundamental challenges encountered by canonical graph neural networks (GNNs), and has demonstrated advantages in many important data science applications such as link, relation and motif prediction. However, current SGRL approaches suffer from scalability issues since they require extracting subgraphs for each training or test query. Recent solutions that scale up canonical GNNs may not apply to SGRL. Here, we propose a novel framework SUREL for scalable SGRL by co-designing the learning algorithm and its system support. SUREL adopts walk-based decomposition of subgraphs and reuses the walks to form subgraphs, which substantially reduces the redundancy of subgraph extraction and supports parallel computation. Experiments over six homogeneous, heterogeneous and higher-order graphs with millions of nodes and edges demonstrate the effectiveness and scalability of SUREL. In particular, compared to SGRL baselines, SUREL achieves 10$\times$ speed-up with comparable or even better prediction performance; while compared to canonical GNNs, SUREL achieves 50% prediction accuracy improvement.
    FakeCLR: Exploring Contrastive Learning for Solving Latent Discontinuity in Data-Efficient GANs. (arXiv:2207.08630v2 [cs.CV] UPDATED)
    Data-Efficient GANs (DE-GANs), which aim to learn generative models with a limited amount of training data, encounter several challenges for generating high-quality samples. Since data augmentation strategies have largely alleviated the training instability, how to further improve the generative performance of DE-GANs becomes a hotspot. Recently, contrastive learning has shown the great potential of increasing the synthesis quality of DE-GANs, yet related principles are not well explored. In this paper, we revisit and compare different contrastive learning strategies in DE-GANs, and identify (i) the current bottleneck of generative performance is the discontinuity of latent space; (ii) compared to other contrastive learning strategies, Instance-perturbation works towards latent space continuity, which brings the major improvement to DE-GANs. Based on these observations, we propose FakeCLR, which only applies contrastive learning on perturbed fake samples, and devises three related training techniques: Noise-related Latent Augmentation, Diversity-aware Queue, and Forgetting Factor of Queue. Our experimental results manifest the new state of the arts on both few-shot generation and limited-data generation. On multiple datasets, FakeCLR acquires more than 15% FID improvement compared to existing DE-GANs. Code is available at https://github.com/iceli1007/FakeCLR.
    A Prospective Approach for Human-to-Human Interaction Recognition from Wi-Fi Channel Data using Attention Bidirectional Gated Recurrent Neural Network with GUI Application Implementation. (arXiv:2202.08146v3 [cs.LG] UPDATED)
    Recent advances in 5G wireless technology and socioeconomic transformation have brought a paradigm shift in sensor applications. Wi-Fi signal demonstrates a strong correlation between its temporal variation and body movements, which can be leveraged to recognize human activity. In this article, we demonstrate the cognitive ability of device free mutual human-to-human interaction recognition method based on the time scale Wi-Fi channel state information. The mutual activities examined are steady-state, approaching, departing, handshaking, high-five, hugging, kicking (left-leg), kicking (right-leg), pointing (left-hand), pointing (right-hand), punching(left-hand), punching (right-hand), and pushing. We explore and propose a Self-Attention furnished Bidirectional Gated Recurrent Neural Network model to classify 13 human-to-human mutual interaction types from the time-series data. Our proposed model can recognize a two subject pair mutual interaction with a maximum benchmark accuracy of 94%. This has been expanded for ten subject pairs, which secured a benchmark accuracy of 88% with improved classification around the interaction-transition region. Also, an executable graphical user interface (GUI) is developed, using the PyQt5 python module, to subsequently display the overall mutual human-interaction recognition procedure in real-time. Finally, we conclude with a brief discourse regarding the possible solutions to the handicaps that resulted in curtailments observed during the study. Such, Wi-Fi channel perturbation pattern analysis is believed to be an efficient, economical and privacy-friendly approach to be potentially utilized in mutual human-interaction recognition for indoor activity monitoring, surveillance system, smart health monitoring systems and independent assisted living.
    A Classification of $G$-invariant Shallow Neural Networks. (arXiv:2205.09219v3 [cs.LG] UPDATED)
    When trying to fit a deep neural network (DNN) to a $G$-invariant target function with respect to a group $G$, it only makes sense to constrain the DNN to be $G$-invariant as well. However, there can be many different ways to do this, thus raising the problem of "$G$-invariant neural architecture design": What is the optimal $G$-invariant architecture for a given problem? Before we can consider the optimization problem itself, we must understand the search space, the architectures in it, and how they relate to one another. In this paper, we take a first step towards this goal; we prove a theorem that gives a classification of all $G$-invariant single-hidden-layer or "shallow" neural network ($G$-SNN) architectures with ReLU activation for any finite orthogonal group $G$. The proof is based on a correspondence of every $G$-SNN to a signed permutation representation of $G$ acting on the hidden neurons. The classification is equivalently given in terms of the first cohomology classes of $G$, thus admitting a topological interpretation. Based on a code implementation, we enumerate the $G$-SNN architectures for some example groups $G$ and visualize their structure. We draw the network morphisms between the enumerated architectures that can be leveraged during neural architecture search (NAS). Finally, we prove that architectures corresponding to inequivalent cohomology classes in a given cohomology ring coincide in function space only when their weight matrices are zero, and we discuss the implications of this in the context of NAS.
    Unsupervised Ground Metric Learning using Wasserstein Singular Vectors. (arXiv:2102.06278v3 [stat.ML] UPDATED)
    Defining meaningful distances between samples in a dataset is a fundamental problem in machine learning. Optimal Transport (OT) lifts a distance between features (the "ground metric") to a geometrically meaningful distance between samples. However, there is usually no straightforward choice of ground metric. Supervised ground metric learning approaches exist but require labeled data. In absence of labels, only ad-hoc ground metrics remain. Unsupervised ground metric learning is thus a fundamental problem to enable data-driven applications of OT. In this paper, we propose for the first time a canonical answer by simultaneously computing an OT distance between samples and between features of a dataset. These distance matrices emerge naturally as positive singular vectors of the function mapping ground metrics to OT distances. We provide criteria to ensure the existence and uniqueness of these singular vectors. We then introduce scalable computational methods to approximate them in high-dimensional settings, using stochastic approximation and entropic regularization. Finally, we showcase Wasserstein Singular Vectors on a single-cell RNA-sequencing dataset.
    Causal Balancing for Domain Generalization. (arXiv:2206.05263v2 [cs.LG] UPDATED)
    While machine learning models rapidly advance the state-of-the-art on various real-world tasks, out-of-domain (OOD) generalization remains a challenging problem given the vulnerability of these models to spurious correlations. We propose a causally-motivated balanced mini-batch sampling strategy to transform the observed train distribution to a balanced distribution that is free of spurious correlations. We argue that the Bayes optimal classifier trained on such balanced distribution is minimax optimal across a diverse enough environment space. We also provide an identifiability guarantee of the latent variable model of the proposed underlying data generation process with invariant causal mechanisms, by utilizing enough number of train environments. Experiments are conducted on three domain generalization datasets, demonstrating empirically that our balanced mini-batch sampling strategy improves the performance of four different established domain generalization model baselines compared to the random mini-batch sampling strategy.
    How do Quadratic Regularizers Prevent Catastrophic Forgetting: The Role of Interpolation. (arXiv:2102.02805v4 [cs.LG] UPDATED)
    Catastrophic forgetting undermines the effectiveness of deep neural networks (DNNs) in scenarios such as continual learning and lifelong learning. While several methods have been proposed to tackle this problem, there is limited work explaining why these methods work well. This paper has the goal of better explaining a popularly used technique for avoiding catastrophic forgetting: quadratic regularization. We show that quadratic regularizers prevent forgetting of past tasks by interpolating current and previous values of model parameters at every training iteration. Over multiple training iterations, this interpolation operation reduces the learning rates of more important model parameters, thereby minimizing their movement. Our analysis also reveals two drawbacks of quadratic regularization: (a) dependence of parameter interpolation on training hyperparameters, which often leads to training instability and (b) assignment of lower importance to deeper layers, which are generally the place forgetting occurs in DNNs. Via a simple modification to the order of operations, we show these drawbacks can be easily avoided, resulting in 6.2\% higher average accuracy at 4.5\% lower average forgetting. We confirm the robustness of our results by training over 2000 models in different settings. Code available at \url{https://github.com/EkdeepSLubana/QRforgetting}
    Patch-level Representation Learning for Self-supervised Vision Transformers. (arXiv:2206.07990v3 [cs.CV] UPDATED)
    Recent self-supervised learning (SSL) methods have shown impressive results in learning visual representations from unlabeled images. This paper aims to improve their performance further by utilizing the architectural advantages of the underlying neural network, as the current state-of-the-art visual pretext tasks for SSL do not enjoy the benefit, i.e., they are architecture-agnostic. In particular, we focus on Vision Transformers (ViTs), which have gained much attention recently as a better architectural choice, often outperforming convolutional networks for various visual tasks. The unique characteristic of ViT is that it takes a sequence of disjoint patches from an image and processes patch-level representations internally. Inspired by this, we design a simple yet effective visual pretext task, coined SelfPatch, for learning better patch-level representations. To be specific, we enforce invariance against each patch and its neighbors, i.e., each patch treats similar neighboring patches as positive samples. Consequently, training ViTs with SelfPatch learns more semantically meaningful relations among patches (without using human-annotated labels), which can be beneficial, in particular, to downstream tasks of a dense prediction type. Despite its simplicity, we demonstrate that it can significantly improve the performance of existing SSL methods for various visual tasks, including object detection and semantic segmentation. Specifically, SelfPatch significantly improves the recent self-supervised ViT, DINO, by achieving +1.3 AP on COCO object detection, +1.2 AP on COCO instance segmentation, and +2.9 mIoU on ADE20K semantic segmentation.
    A Comparative Survey of Deep Active Learning. (arXiv:2203.13450v3 [cs.LG] UPDATED)
    While deep learning (DL) is data-hungry and usually relies on extensive labeled data to deliver good performance, Active Learning (AL) reduces labeling costs by selecting a small proportion of samples from unlabeled data for labeling and training. Therefore, Deep Active Learning (DAL) has risen as a feasible solution for maximizing model performance under a limited labeling cost/budget in recent years. Although abundant methods of DAL have been developed and various literature reviews conducted, the performance evaluation of DAL methods under fair comparison settings is not yet available. Our work intends to fill this gap. In this work, We construct a DAL toolkit, DeepAL+, by re-implementing 19 highly-cited DAL methods. We survey and categorize DAL-related works and construct comparative experiments across frequently used datasets and DAL algorithms. Additionally, we explore some factors (e.g., batch size, number of epochs in the training process) that influence the efficacy of DAL, which provides better references for researchers to design their DAL experiments or carry out DAL-related applications.
    Learnable Mixed-precision and Dimension Reduction Co-design for Low-storage Activation. (arXiv:2207.07931v2 [eess.IV] UPDATED)
    Recently, deep convolutional neural networks (CNNs) have achieved many eye-catching results. However, deploying CNNs on resource-constrained edge devices is constrained by limited memory bandwidth for transmitting large intermediated data during inference, i.e., activation. Existing research utilizes mixed-precision and dimension reduction to reduce computational complexity but pays less attention to its application for activation compression. To further exploit the redundancy in activation, we propose a learnable mixed-precision and dimension reduction co-design system, which separates channels into groups and allocates specific compression policies according to their importance. In addition, the proposed dynamic searching technique enlarges search space and finds out the optimal bit-width allocation automatically. Our experimental results show that the proposed methods improve 3.54%/1.27% in accuracy and save 0.18/2.02 bits per value over existing mixed-precision methods on ResNet18 and MobileNetv2, respectively.
    Semi-supervised Predictive Clustering Trees for (Hierarchical) Multi-label Classification. (arXiv:2207.09237v1 [cs.LG])
    Semi-supervised learning (SSL) is a common approach to learning predictive models using not only labeled examples, but also unlabeled examples. While SSL for the simple tasks of classification and regression has received a lot of attention from the research community, this is not properly investigated for complex prediction tasks with structurally dependent variables. This is the case of multi-label classification and hierarchical multi-label classification tasks, which may require additional information, possibly coming from the underlying distribution in the descriptive space provided by unlabeled examples, to better face the challenging task of predicting simultaneously multiple class labels. In this paper, we investigate this aspect and propose a (hierarchical) multi-label classification method based on semi-supervised learning of predictive clustering trees. We also extend the method towards ensemble learning and propose a method based on the random forest approach. Extensive experimental evaluation conducted on 23 datasets shows significant advantages of the proposed method and its extension with respect to their supervised counterparts. Moreover, the method preserves interpretability and reduces the time complexity of classical tree-based models.
    Study of the performance and scalability of federated learning for medical imaging with intermittent clients. (arXiv:2207.08581v2 [cs.LG] UPDATED)
    Federated learning is a data decentralization privacy-preserving technique used to perform machine or deep learning in a secure way. In this paper we present theoretical aspects about federated learning, such as the presentation of an aggregation operator, different types of federated learning, and issues to be taken into account in relation to the distribution of data from the clients, together with the exhaustive analysis of a use case where the number of clients varies. Specifically, a use case of medical image analysis is proposed, using chest X-ray images obtained from an open data repository. In addition to the advantages related to privacy, improvements in predictions (in terms of accuracy and area under the curve) and reduction of execution times will be studied with respect to the classical case (the centralized approach). Different clients will be simulated from the training data, selected in an unbalanced manner, i.e., they do not all have the same number of data. The results of considering three or ten clients are exposed and compared between them and against the centralized case. Two approaches to follow will be analyzed in the case of intermittent clients, as in a real scenario some clients may leave the training, and some new ones may enter the training. The evolution of the results for the test set in terms of accuracy, area under the curve and execution time is shown as the number of clients into which the original data is divided increases. Finally, improvements and future work in the field are proposed.
    Robust outlier detection by de-biasing VAE likelihoods. (arXiv:2108.08760v3 [cs.LG] UPDATED)
    Deep networks often make confident, yet, incorrect, predictions when tested with outlier data that is far removed from their training distributions. Likelihoods computed by deep generative models (DGMs) are a candidate metric for outlier detection with unlabeled data. Yet, previous studies have shown that DGM likelihoods are unreliable and can be easily biased by simple transformations to input data. Here, we examine outlier detection with variational autoencoders (VAEs), among the simplest of DGMs. We propose novel analytical and algorithmic approaches to ameliorate key biases with VAE likelihoods. Our bias corrections are sample-specific, computationally inexpensive, and readily computed for various decoder visible distributions. Next, we show that a well-known image pre-processing technique -- contrast stretching -- extends the effectiveness of bias correction to further improve outlier detection. Our approach achieves state-of-the-art accuracies with nine grayscale and natural image datasets, and demonstrates significant advantages -- both with speed and performance -- over four recent, competing approaches. In summary, lightweight remedies suffice to achieve robust outlier detection with VAEs.
    Adversarial Training Improves Joint Energy-Based Generative Modelling. (arXiv:2207.08950v1 [cs.LG])
    We propose the novel framework for generative modelling using hybrid energy-based models. In our method we combine the interpretable input gradients of the robust classifier and Langevin Dynamics for sampling. Using the adversarial training we improve not only the training stability, but robustness and generative modelling of the joint energy-based models.
    Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural Networks. (arXiv:2207.08629v2 [cs.LG] UPDATED)
    Graph Neural Networks (GNNs) tend to suffer from high computation costs due to the exponentially increasing scale of graph data and the number of model parameters, which restricts their utility in practical applications. To this end, some recent works focus on sparsifying GNNs with the lottery ticket hypothesis (LTH) to reduce inference costs while maintaining performance levels. However, the LTH-based methods suffer from two major drawbacks: 1) they require exhaustive and iterative training of dense models, resulting in an extremely large training computation cost, and 2) they only trim graph structures and model parameters but ignore the node feature dimension, where significant redundancy exists. To overcome the above limitations, we propose a comprehensive graph gradual pruning framework termed CGP. This is achieved by designing a during-training graph pruning paradigm to dynamically prune GNNs within one training process. Unlike LTH-based methods, the proposed CGP approach requires no re-training, which significantly reduces the computation costs. Furthermore, we design a co-sparsifying strategy to comprehensively trim all three core elements of GNNs: graph structures, node features, and model parameters. Meanwhile, aiming at refining the pruning operation, we introduce a regrowth process into our CGP framework, in order to re-establish the pruned but important connections. The proposed CGP is evaluated by using a node classification task across 6 GNN architectures, including shallow models (GCN and GAT), shallow-but-deep-propagation models (SGC and APPNP), and deep models (GCNII and ResGCN), on a total of 14 real-world graph datasets, including large-scale graph datasets from the challenging Open Graph Benchmark. Experiments reveal that our proposed strategy greatly improves both training and inference efficiency while matching or even exceeding the accuracy of existing methods.
    Time Is MattEr: Temporal Self-supervision for Video Transformers. (arXiv:2207.09067v1 [cs.CV])
    Understanding temporal dynamics of video is an essential aspect of learning better video representations. Recently, transformer-based architectural designs have been extensively explored for video tasks due to their capability to capture long-term dependency of input sequences. However, we found that these Video Transformers are still biased to learn spatial dynamics rather than temporal ones, and debiasing the spurious correlation is critical for their performance. Based on the observations, we design simple yet effective self-supervised tasks for video models to learn temporal dynamics better. Specifically, for debiasing the spatial bias, our method learns the temporal order of video frames as extra self-supervision and enforces the randomly shuffled frames to have low-confidence outputs. Also, our method learns the temporal flow direction of video tokens among consecutive frames for enhancing the correlation toward temporal dynamics. Under various video action recognition tasks, we demonstrate the effectiveness of our method and its compatibility with state-of-the-art Video Transformers.
    Lightweight Automated Feature Monitoring for Data Streams. (arXiv:2207.08640v2 [cs.LG] UPDATED)
    Monitoring the behavior of automated real-time stream processing systems has become one of the most relevant problems in real world applications. Such systems have grown in complexity relying heavily on high dimensional input data, and data hungry Machine Learning (ML) algorithms. We propose a flexible system, Feature Monitoring (FM), that detects data drifts in such data sets, with a small and constant memory footprint and a small computational cost in streaming applications. The method is based on a multi-variate statistical test and is data driven by design (full reference distributions are estimated from the data). It monitors all features that are used by the system, while providing an interpretable features ranking whenever an alarm occurs (to aid in root cause analysis). The computational and memory lightness of the system results from the use of Exponential Moving Histograms. In our experimental study, we analyze the system's behavior with its parameters and, more importantly, show examples where it detects problems that are not directly related to a single feature. This illustrates how FM eliminates the need to add custom signals to detect specific types of problems and that monitoring the available space of features is often enough.
    Can You Fool AI by Doing a 180? $\unicode{x2013}$ A Case Study on Authorship Analysis of Texts by Arata Osada. (arXiv:2207.09085v1 [cs.CL])
    This paper is our attempt at answering a twofold question covering the areas of ethics and authorship analysis. Firstly, since the methods used for performing authorship analysis imply that an author can be recognized by the content he or she creates, we were interested in finding out whether it would be possible for an author identification system to correctly attribute works to authors if in the course of years they have undergone a major psychological transition. Secondly, and from the point of view of the evolution of an author's ethical values, we checked what it would mean if the authorship attribution system encounters difficulties in detecting single authorship. We set out to answer those questions through performing a binary authorship analysis task using a text classifier based on a pre-trained transformer model and a baseline method relying on conventional similarity metrics. For the test set, we chose works of Arata Osada, a Japanese educator and specialist in the history of education, with half of them being books written before the World War II and another half in the 1950s, in between which he underwent a transformation in terms of political opinions. As a result, we were able to confirm that in the case of texts authored by Arata Osada in a time span of more than 10 years, while the classification accuracy drops by a large margin and is substantially lower than for texts by other non-fiction writers, confidence scores of the predictions remain at a similar level as in the case of a shorter time span, indicating that the classifier was in many instances tricked into deciding that texts written over a time span of multiple years were actually written by two different people, which in turn leads us to believe that such a change can affect authorship analysis, and that historical events have great impact on a person's ethical outlook as expressed in their writings.  ( 3 min )
    A Unifying Causal Framework for Analyzing Dataset Shift-stable Learning Algorithms. (arXiv:1905.11374v5 [stat.ML] UPDATED)
    Recent interest in the external validity of prediction models (i.e., the problem of different train and test distributions, known as dataset shift) has produced many methods for finding predictive distributions that are invariant to dataset shifts and can be used for prediction in new, unseen environments. However, these methods consider different types of shifts and have been developed under disparate frameworks, making it difficult to theoretically analyze how solutions differ with respect to stability and accuracy. Taking a causal graphical view, we use a flexible graphical representation to express various types of dataset shifts. Given a known graph of the data generating process, we show that all invariant distributions correspond to a causal hierarchy of graphical operators which disable the edges in the graph that are responsible for the shifts. The hierarchy provides a common theoretical underpinning for understanding when and how stability to shifts can be achieved, and in what ways stable distributions can differ. We use it to establish conditions for minimax optimal performance across environments, and derive new algorithms that find optimal stable distributions. Using this new perspective, we empirically demonstrate that that there is a tradeoff between minimax and average performance.
    VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformer. (arXiv:2203.04099v2 [cs.SD] UPDATED)
    This paper presents an audio-visual approach for voice separation which produces state-of-the-art results at a low latency in two scenarios: speech and singing voice. The model is based on a two-stage network. Motion cues are obtained with a lightweight graph convolutional network that processes face landmarks. Then, both audio and motion features are fed to an audio-visual transformer which produces a fairly good estimation of the isolated target source. In a second stage, the predominant voice is enhanced with an audio-only network. We present different ablation studies and comparison to state-of-the-art methods. Finally, we explore the transferability of models trained for speech separation in the task of singing voice separation. The demos, code, and weights are available in https://ipcv.github.io/VoViT/
    Deep equilibrium networks are sensitive to initialization statistics. (arXiv:2207.09432v1 [cs.LG])
    Deep equilibrium networks (DEQs) are a promising way to construct models which trade off memory for compute. However, theoretical understanding of these models is still lacking compared to traditional networks, in part because of the repeated application of a single set of weights. We show that DEQs are sensitive to the higher order statistics of the matrix families from which they are initialized. In particular, initializing with orthogonal or symmetric matrices allows for greater stability in training. This gives us a practical prescription for initializations which allow for training with a broader range of initial weight scales.
    GAP: Differentially Private Graph Neural Networks with Aggregation Perturbation. (arXiv:2203.00949v2 [cs.LG] UPDATED)
    In this paper, we study the problem of learning Graph Neural Networks (GNNs) with Differential Privacy (DP). We propose a novel differentially private GNN based on Aggregation Perturbation (GAP), which adds stochastic noise to the GNN's aggregation function to statistically obfuscate the presence of a single edge (edge-level privacy) or a single node and all its adjacent edges (node-level privacy). Tailored to the specifics of private learning, GAP's new architecture is composed of three separate modules: (i) the encoder module, where we learn private node embeddings without relying on the edge information; (ii) the aggregation module, where we compute noisy aggregated node embeddings based on the graph structure; and (iii) the classification module, where we train a neural network on the private aggregations for node classification without further querying the graph edges. GAP's major advantage over previous approaches is that it can benefit from multi-hop neighborhood aggregations, and guarantees both edge-level and node-level DP not only for training, but also at inference with no additional costs beyond the training's privacy budget. We analyze GAP's formal privacy guarantees using R\'enyi DP and conduct empirical experiments over three real-world graph datasets. We demonstrate that GAP offers significantly better accuracy-privacy trade-offs than state-of-the-art DP-GNN approaches and naive MLP-based baselines.
    A Hybrid Recommender System for Recommending Smartphones to Prospective Customers. (arXiv:2105.12876v2 [cs.IR] UPDATED)
    Recommender Systems are a subclass of machine learning systems that employ sophisticated information filtering strategies to reduce the search time and suggest the most relevant items to any particular user. Hybrid recommender systems combine multiple recommendation strategies in different ways to benefit from their complementary advantages. Some hybrid recommender systems have combined collaborative filtering and content-based approaches to build systems that are more robust. In this paper, we propose a hybrid recommender system, which combines Alternating Least Squares (ALS) based collaborative filtering with deep learning to enhance recommendation performance as well as overcome the limitations associated with the collaborative filtering approach, especially concerning its cold start problem. In essence, we use the outputs from ALS (collaborative filtering) to influence the recommendations from a Deep Neural Network (DNN), which combines characteristic, contextual, structural and sequential information, in a big data processing framework. We have conducted several experiments in testing the efficacy of the proposed hybrid architecture in recommending smartphones to prospective customers and compared its performance with other open-source recommenders. The results have shown that the proposed system has outperformed several existing hybrid recommender systems.
    Do Not Sleep on Linear Models: Simple and Interpretable Techniques Outperform Deep Learning for Sleep Scoring. (arXiv:2207.07753v2 [stat.ML] UPDATED)
    Over the last few years, research in automatic sleep scoring has mainly focused on developing increasingly complex deep learning architectures. However, recently these approaches achieved only marginal improvements, often at the expense of requiring more data and more expensive training procedures. Despite all these efforts and their satisfactory performance, automatic sleep staging solutions are not widely adopted in a clinical context yet. We argue that most deep learning solutions for sleep scoring are limited in their real-world applicability as they are hard to train, deploy, and reproduce. Moreover, these solutions lack interpretability and transparency, which are often key to increase adoption rates. In this work, we revisit the problem of sleep stage classification using classical machine learning. Results show that state-of-the-art performance can be achieved with a conventional machine learning pipeline consisting of preprocessing, feature extraction, and a simple machine learning model. In particular, we analyze the performance of a linear model and a non-linear (gradient boosting) model. Our approach surpasses state-of-the-art (that uses the same data) on two public datasets: Sleep-EDF SC-20 (MF1 0.810) and Sleep-EDF ST (MF1 0.795), while achieving competitive results on Sleep-EDF SC-78 (MF1 0.775) and MASS SS3 (MF1 0.817). We show that, for the sleep stage scoring task, the expressiveness of an engineered feature vector is on par with the internally learned representations of deep learning models. This observation opens the door to clinical adoption, as a representative feature vector allows to leverage both the interpretability and successful track record of traditional machine learning models.
    Machine Learning in Orbit Estimation: a Survey. (arXiv:2207.08993v1 [astro-ph.EP])
    Since the late '50s, when the first artificial satellite was launched, the number of resident space objects (RSOs) has steadily increased. It is estimated that around 1 Million objects larger than 1 cm are currently orbiting the Earth, with only 30,000, larger than 10 cm, presently being tracked. To avert a chain reaction of collisions, termed Kessler Syndrome, it is indispensable to accurately track and predict space debris and satellites' orbit alike. Current physics-based methods have errors in the order of kilometres for 7 days predictions, which is insufficient when considering space debris that have mostly less than 1 meter. Typically, this failure is due to uncertainty around the state of the space object at the beginning of the trajectory, forecasting errors in environmental conditions such as atmospheric drag, as well as specific unknown characteristics such as mass or geometry of the RSO. Leveraging data-driven techniques, namely machine learning, the orbit prediction accuracy can be enhanced: by deriving unmeasured objects' characteristics, improving non-conservative forces' effects, and by the superior abstraction capacity that Deep Learning models have of modelling highly complex non-linear systems. In this survey, we provide an overview of the current work being done in this field.
    MoEC: Mixture of Expert Clusters. (arXiv:2207.09094v1 [cs.CL])
    Sparsely Mixture of Experts (MoE) has received great interest due to its promising scaling capability with affordable computational overhead. MoE converts dense layers into sparse experts, and utilizes a gated routing network to make experts conditionally activated. However, as the number of experts grows, MoE with outrageous parameters suffers from overfitting and sparse data allocation. Such problems are especially severe on tasks with limited data, thus hindering the progress for MoE models to improve performance by scaling up. In this work, we propose Mixture of Expert Clusters - a general approach to enable expert layers to learn more diverse and appropriate knowledge by imposing variance-based constraints on the routing stage. We further propose a cluster-level expert dropout strategy specifically designed for the expert cluster structure. Our experiments reveal that MoEC could improve performance on machine translation and natural language understanding tasks, and raise the performance upper bound for scaling up experts under limited data. We also verify that MoEC plays a positive role in mitigating overfitting and sparse data allocation.
    A Convolutional Neural Network Approach to Supernova Time-Series Classification. (arXiv:2207.09440v1 [astro-ph.IM])
    One of the brightest objects in the universe, supernovae (SNe) are powerful explosions marking the end of a star's lifetime. Supernova (SN) type is defined by spectroscopic emission lines, but obtaining spectroscopy is often logistically unfeasible. Thus, the ability to identify SNe by type using time-series image data alone is crucial, especially in light of the increasing breadth and depth of upcoming telescopes. We present a convolutional neural network method for fast supernova time-series classification, with observed brightness data smoothed in both the wavelength and time directions with Gaussian process regression. We apply this method to full duration and truncated SN time-series, to simulate retrospective as well as real-time classification performance. Retrospective classification is used to differentiate cosmologically useful Type Ia SNe from other SN types, and this method achieves >99% accuracy on this task. We are also able to differentiate between 6 SN types with 60% accuracy given only two nights of data and 98% accuracy retrospectively.
    A Unified Single-loop Alternating Gradient Projection Algorithm for Nonconvex-Concave and Convex-Nonconcave Minimax Problems. (arXiv:2006.02032v3 [math.OC] UPDATED)
    Much recent research effort has been directed to the development of efficient algorithms for solving minimax problems with theoretical convergence guarantees due to the relevance of these problems to a few emergent applications. In this paper, we propose a unified single-loop alternating gradient projection (AGP) algorithm for solving smooth nonconvex-(strongly) concave and (strongly) convex-nonconcave minimax problems. AGP employs simple gradient projection steps for updating the primal and dual variables alternatively at each iteration. We show that it can find an $\varepsilon$-stationary point of the objective function in $\mathcal{O}\left( \varepsilon ^{-2} \right)$ (resp. $\mathcal{O}\left( \varepsilon ^{-4} \right)$) iterations under nonconvex-strongly concave (resp. nonconvex-concave) setting. Moreover, its gradient complexity to obtain an $\varepsilon$-stationary point of the objective function is bounded by $\mathcal{O}\left( \varepsilon ^{-2} \right)$ (resp., $\mathcal{O}\left( \varepsilon ^{-4} \right)$) under the strongly convex-nonconcave (resp., convex-nonconcave) setting. To the best of our knowledge, this is the first time that a simple and unified single-loop algorithm is developed for solving both nonconvex-(strongly) concave and (strongly) convex-nonconcave minimax problems. Moreover, the complexity results for solving the latter (strongly) convex-nonconcave minimax problems have never been obtained before in the literature. Numerical results show the efficiency of the proposed AGP algorithm. Furthermore, we extend the AGP algorithm by presenting a block alternating proximal gradient (BAPG) algorithm for solving more general multi-block nonsmooth nonconvex-(strongly) concave and (strongly) convex-nonconcave minimax problems. We can similarly establish the gradient complexity of the proposed algorithm under these four different settings.
    Incremental Task Learning with Incremental Rank Updates. (arXiv:2207.09074v1 [cs.CV])
    Incremental Task learning (ITL) is a category of continual learning that seeks to train a single network for multiple tasks (one after another), where training data for each task is only available during the training of that task. Neural networks tend to forget older tasks when they are trained for the newer tasks; this property is often known as catastrophic forgetting. To address this issue, ITL methods use episodic memory, parameter regularization, masking and pruning, or extensible network structures. In this paper, we propose a new incremental task learning framework based on low-rank factorization. In particular, we represent the network weights for each layer as a linear combination of several rank-1 matrices. To update the network for a new task, we learn a rank-1 (or low-rank) matrix and add that to the weights of every layer. We also introduce an additional selector vector that assigns different weights to the low-rank matrices learned for the previous tasks. We show that our approach performs better than the current state-of-the-art methods in terms of accuracy and forgetting. Our method also offers better memory efficiency compared to episodic memory- and mask-based approaches. Our code will be available at https://github.com/CSIPlab/task-increment-rank-update.git
    Harnessing Interpretable Machine Learning for Holistic Inverse Design of Origami. (arXiv:2204.07235v2 [cond-mat.soft] UPDATED)
    This work harnesses interpretable machine learning methods to address the challenging inverse design problem of origami-inspired systems. We show that a decision tree-random forest method is particularly suitable for fitting origami databases, containing both design features and functional performance, to generate human-understandable decision rules for the inverse design of functional origami. First, the tree method is unique because it can handle complex interactions between categorical features and continuous features, allowing it to compare different origami patterns for a design. Second, this interpretable method can tackle multi-objective problems for designing functional origami with multiple and multi-physical performance targets. Finally, the method can extend existing shape-fitting algorithms for origami to consider non-geometrical performance. The proposed framework enables holistic inverse design of origami, considering both shape and function, to build novel reconfigurable structures for various applications such as metamaterials, deployable structures, soft robots, biomedical devices, and many more.
    On the development of a Bayesian optimisation framework for complex unknown systems. (arXiv:2207.09154v1 [cs.LG])
    Bayesian optimisation provides an effective method to optimise expensive black box functions. It has recently been applied to problems in fluid dynamics. This paper studies and compares common Bayesian optimisation algorithms empirically on a range of synthetic test functions. It investigates the choice of acquisition function and number of training samples, exact calculation of acquisition functions and Monte Carlo based approaches and both single-point and multi-point optimisation. The test functions considered cover a wide selection of challenges and therefore serve as an ideal test bed to understand the performance of Bayesian optimisation and to identify general situations where Bayesian optimisation performs well and poorly. This knowledge can be utilised in applications, including those in fluid dynamics, where objective functions are unknown. The results of this investigation show that the choices to be made are less relevant for relatively simple functions, while optimistic acquisition functions such as Upper Confidence Bound should be preferred for more complex objective functions. Furthermore, results from the Monte Carlo approach are comparable to results from analytical acquisition functions. In instances where the objective function allows parallel evaluations, the multi-point approach offers a quicker alternative, yet it may potentially require more objective function evaluations.
    Abstract Demonstrations and Adaptive Exploration for Efficient and Stable Multi-step Sparse Reward Reinforcement Learning. (arXiv:2207.09243v1 [cs.RO])
    Although Deep Reinforcement Learning (DRL) has been popular in many disciplines including robotics, state-of-the-art DRL algorithms still struggle to learn long-horizon, multi-step and sparse reward tasks, such as stacking several blocks given only a task-completion reward signal. To improve learning efficiency for such tasks, this paper proposes a DRL exploration technique, termed A^2, which integrates two components inspired by human experiences: Abstract demonstrations and Adaptive exploration. A^2 starts by decomposing a complex task into subtasks, and then provides the correct orders of subtasks to learn. During training, the agent explores the environment adaptively, acting more deterministically for well-mastered subtasks and more stochastically for ill-learnt subtasks. Ablation and comparative experiments are conducted on several grid-world tasks and three robotic manipulation tasks. We demonstrate that A^2 can aid popular DRL algorithms (DQN, DDPG, and SAC) to learn more efficiently and stably in these environments.
    Bounding generalization error with input compression: An empirical study with infinite-width networks. (arXiv:2207.09408v1 [cs.LG])
    Estimating the Generalization Error (GE) of Deep Neural Networks (DNNs) is an important task that often relies on availability of held-out data. The ability to better predict GE based on a single training set may yield overarching DNN design principles to reduce a reliance on trial-and-error, along with other performance assessment advantages. In search of a quantity relevant to GE, we investigate the Mutual Information (MI) between the input and final layer representations, using the infinite-width DNN limit to bound MI. An existing input compression-based GE bound is used to link MI and GE. To the best of our knowledge, this represents the first empirical study of this bound. In our attempt to empirically falsify the theoretical bound, we find that it is often tight for best-performing models. Furthermore, it detects randomization of training labels in many cases, reflects test-time perturbation robustness, and works well given only few training samples. These results are promising given that input compression is broadly applicable where MI can be estimated with confidence.
    Survival Prediction of Brain Cancer with Incomplete Radiology, Pathology, Genomics, and Demographic Data. (arXiv:2203.04419v2 [cs.LG] UPDATED)
    Integrating cross-department multi-modal data (e.g., radiological, pathological, genomic, and clinical data) is ubiquitous in brain cancer diagnosis and survival prediction. To date, such an integration is typically conducted by human physicians (and panels of experts), which can be subjective and semi-quantitative. Recent advances in multi-modal deep learning, however, have opened a door to leverage such a process to a more objective and quantitative manner. Unfortunately, the prior arts of using four modalities on brain cancer survival prediction are limited by a "complete modalities" setting (i.e., with all modalities available). Thus, there are still open questions on how to effectively predict brain cancer survival from the incomplete radiological, pathological, genomic, and demographic data (e.g., one or more modalities might not be collected for a patient). For instance, should we use both complete and incomplete data, and more importantly, how to use those data? To answer the preceding questions, we generalize the multi-modal learning on cross-department multi-modal data to a missing data setting. Our contribution is three-fold: 1) We introduce optimal multi-modal learning with missing data (MMD) pipeline with optimized hardware consumption and computational efficiency; 2) We extend multi-modal learning on radiological, pathological, genomic, and demographic data into missing data scenarios; 3) a large-scale public dataset (with 962 patients) is collected to systematically evaluate glioma tumor survival prediction using four modalities. The proposed method improved the C-index of survival prediction from 0.7624 to 0.8053.
    Don't Forget to Buy Milk: Contextually Aware Grocery Reminder Household Robot. (arXiv:2207.09050v1 [cs.RO])
    Assistive robots operating in household environments would require items to be available in the house to perform assistive tasks. However, when these items run out, the assistive robot must remind its user to buy the missing items. In this paper, we present a computational architecture that can allow a robot to learn personalized contextual knowledge of a household through interactions with its user. The architecture can then use the learned knowledge to make predictions about missing items from the household over a long period of time. The architecture integrates state-of-the-art perceptual learning algorithms, cognitive models of memory encoding and learning, a reasoning module for predicting missing items from the household, and a graphical user interface (GUI) to interact with the user. The architecture is integrated with the Fetch mobile manipulator robot and validated in a large indoor environment with multiple contexts and objects. Our experimental results show that the robot can adapt to an environment by learning contextual knowledge through interactions with its user. The robot can also use the learned knowledge to correctly predict missing items over multiple weeks and it is robust against sensory and perceptual errors.
    CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving. (arXiv:2203.07724v2 [cs.CV] UPDATED)
    Contemporary deep-learning object detection methods for autonomous driving usually assume prefixed categories of common traffic participants, such as pedestrians and cars. Most existing detectors are unable to detect uncommon objects and corner cases (e.g., a dog crossing a street), which may lead to severe accidents in some situations, making the timeline for the real-world application of reliable autonomous driving uncertain. One main reason that impedes the development of truly reliably self-driving systems is the lack of public datasets for evaluating the performance of object detectors on corner cases. Hence, we introduce a challenging dataset named CODA that exposes this critical problem of vision-based detectors. The dataset consists of 1500 carefully selected real-world driving scenes, each containing four object-level corner cases (on average), spanning more than 30 object categories. On CODA, the performance of standard object detectors trained on large-scale autonomous driving datasets significantly drops to no more than 12.8% in mAR. Moreover, we experiment with the state-of-the-art open-world object detector and find that it also fails to reliably identify the novel objects in CODA, suggesting that a robust perception system for autonomous driving is probably still far from reach. We expect our CODA dataset to facilitate further research in reliable detection for real-world autonomous driving. Our dataset will be released at https://coda-dataset.github.io.
    Neural Greedy Pursuit for Feature Selection. (arXiv:2207.09390v1 [cs.LG])
    We propose a greedy algorithm to select $N$ important features among $P$ input features for a non-linear prediction problem. The features are selected one by one sequentially, in an iterative loss minimization procedure. We use neural networks as predictors in the algorithm to compute the loss and hence, we refer to our method as neural greedy pursuit (NGP). NGP is efficient in selecting $N$ features when $N \ll P$, and it provides a notion of feature importance in a descending order following the sequential selection procedure. We experimentally show that NGP provides better performance than several feature selection methods such as DeepLIFT and Drop-one-out loss. In addition, we experimentally show a phase transition behavior in which perfect selection of all $N$ features without false positives is possible when the training data size exceeds a threshold.
    Data-Centric Epidemic Forecasting: A Survey. (arXiv:2207.09370v1 [cs.LG])
    The COVID-19 pandemic has brought forth the importance of epidemic forecasting for decision makers in multiple domains, ranging from public health to the economy as a whole. While forecasting epidemic progression is frequently conceptualized as being analogous to weather forecasting, however it has some key differences and remains a non-trivial task. The spread of diseases is subject to multiple confounding factors spanning human behavior, pathogen dynamics, weather and environmental conditions. Research interest has been fueled by the increased availability of rich data sources capturing previously unobservable facets and also due to initiatives from government public health and funding agencies. This has resulted, in particular, in a spate of work on 'data-centered' solutions which have shown potential in enhancing our forecasting capabilities by leveraging non-traditional data sources as well as recent innovations in AI and machine learning. This survey delves into various data-driven methodological and practical advancements and introduces a conceptual framework to navigate through them. First, we enumerate the large number of epidemiological datasets and novel data streams that are relevant to epidemic forecasting, capturing various factors like symptomatic online surveys, retail and commerce, mobility, genomics data and more. Next, we discuss methods and modeling paradigms focusing on the recent data-driven statistical and deep-learning based methods as well as on the novel class of hybrid models that combine domain knowledge of mechanistic models with the effectiveness and flexibility of statistical approaches. We also discuss experiences and challenges that arise in real-world deployment of these forecasting systems including decision-making informed by forecasts. Finally, we highlight some challenges and open problems found across the forecasting pipeline.
    Evaluating State of the Art, Forecasting Ensembles- and Meta-learning Strategies for Model Fusion. (arXiv:2203.03279v3 [cs.LG] UPDATED)
    Techniques of hybridisation and ensemble learning are popular model fusion techniques for improving the predictive power of forecasting methods. With limited research that instigates combining these two promising approaches, this paper focuses on the utility of the Exponential-Smoothing-Recurrent Neural Network (ES-RNN) in the pool of base models for different ensembles. We compare against some state of the art ensembling techniques and arithmetic model averaging as a benchmark. We experiment with the M4 forecasting data set of 100,000 time-series, and the results show that the Feature-based Forecast Model Averaging (FFORMA), on average, is the best technique for late data fusion with the ES-RNN. However, considering the M4's Daily subset of data, stacking was the only successful ensemble at dealing with the case where all base model performances are similar. Our experimental results indicate that we attain state of the art forecasting results compared to N-BEATS as a benchmark. We conclude that model averaging is a more robust ensemble than model selection and stacking strategies. Further, the results show that gradient boosting is superior for implementing ensemble learning strategies.
    Robust Training of Neural Networks Using Scale Invariant Architectures. (arXiv:2202.00980v2 [cs.LG] UPDATED)
    In contrast to SGD, adaptive gradient methods like Adam allow robust training of modern deep networks, especially large language models. However, the use of adaptivity not only comes at the cost of extra memory but also raises the fundamental question: can non-adaptive methods like SGD enjoy similar benefits? In this paper, we provide an affirmative answer to this question by proposing to achieve both robust and memory-efficient training via the following general recipe: (1) modify the architecture and make it scale invariant, i.e. the scale of parameter doesn't affect the output of the network, (2) train with SGD and weight decay, and optionally (3) clip the global gradient norm proportional to weight norm multiplied by $\sqrt{\tfrac{2\lambda}{\eta}}$, where $\eta$ is learning rate and $\lambda$ is weight decay. We show that this general approach is robust to rescaling of parameter and loss by proving that its convergence only depends logarithmically on the scale of initialization and loss, whereas the standard SGD might not even converge for many initializations. Following our recipe, we design a scale invariant version of BERT, called SIBERT, which when trained simply by vanilla SGD achieves performance comparable to BERT trained by adaptive methods like Adam on downstream tasks.
    Adversarial Bandits with Knapsacks. (arXiv:1811.11881v8 [cs.DS] UPDATED)
    We consider Bandits with Knapsacks (henceforth, BwK), a general model for multi-armed bandits under supply/budget constraints. In particular, a bandit algorithm needs to solve a well-known knapsack problem: find an optimal packing of items into a limited-size knapsack. The BwK problem is a common generalization of numerous motivating examples, which range from dynamic pricing to repeated auctions to dynamic ad allocation to network routing and scheduling. While the prior work on BwK focused on the stochastic version, we pioneer the other extreme in which the outcomes can be chosen adversarially. This is a considerably harder problem, compared to both the stochastic version and the "classic" adversarial bandits, in that regret minimization is no longer feasible. Instead, the objective is to minimize the competitive ratio: the ratio of the benchmark reward to the algorithm's reward. We design an algorithm with competitive ratio O(log T) relative to the best fixed distribution over actions, where T is the time horizon; we also prove a matching lower bound. The key conceptual contribution is a new perspective on the stochastic version of the problem. We suggest a new algorithm for the stochastic version, which builds on the framework of regret minimization in repeated games and admits a substantially simpler analysis compared to prior work. We then analyze this algorithm for the adversarial version and use it as a subroutine to solve the latter.
    MCTensor: A High-Precision Deep Learning Library with Multi-Component Floating-Point. (arXiv:2207.08867v1 [cs.LG])
    In this paper, we introduce MCTensor, a library based on PyTorch for providing general-purpose and high-precision arithmetic for DL training. MCTensor is used in the same way as PyTorch Tensor: we implement multiple basic, matrix-level computation operators and NN modules for MCTensor with identical PyTorch interface. Our algorithms achieve high precision computation and also benefits from heavily-optimized PyTorch floating-point arithmetic. We evaluate MCTensor arithmetic against PyTorch native arithmetic for a series of tasks, where models using MCTensor in float16 would match or outperform the PyTorch model with float32 or float64 precision.
    Beyond Transmitting Bits: Context, Semantics, and Task-Oriented Communications. (arXiv:2207.09353v1 [cs.IT])
    Communication systems to date primarily aim at reliably communicating bit sequences. Such an approach provides efficient engineering designs that are agnostic to the meanings of the messages or to the goal that the message exchange aims to achieve. Next generation systems, however, can be potentially enriched by folding message semantics and goals of communication into their design. Further, these systems can be made cognizant of the context in which communication exchange takes place, providing avenues for novel design insights. This tutorial summarizes the efforts to date, starting from its early adaptations, semantic-aware and task-oriented communications, covering the foundations, algorithms and potential implementations. The focus is on approaches that utilize information theory to provide the foundations, as well as the significant role of learning in semantics and task-aware communications.
    SphereFed: Hyperspherical Federated Learning. (arXiv:2207.09413v1 [cs.LG])
    Federated Learning aims at training a global model from multiple decentralized devices (i.e. clients) without exchanging their private local data. A key challenge is the handling of non-i.i.d. (independent identically distributed) data across multiple clients that may induce disparities of their local features. We introduce the Hyperspherical Federated Learning (SphereFed) framework to address the non-i.i.d. issue by constraining learned representations of data points to be on a unit hypersphere shared by clients. Specifically, all clients learn their local representations by minimizing the loss with respect to a fixed classifier whose weights span the unit hypersphere. After federated training in improving the global model, this classifier is further calibrated with a closed-form solution by minimizing a mean squared loss. We show that the calibration solution can be computed efficiently and distributedly without direct access of local data. Extensive experiments indicate that our SphereFed approach is able to improve the accuracy of multiple existing federated learning algorithms by a considerable margin (up to 6% on challenging datasets) with enhanced computation and communication efficiency across datasets and model architectures.
    XG-BoT: An Explainable Deep Graph Neural Network for Botnet Detection and Forensics. (arXiv:2207.09088v1 [cs.CR])
    In this paper, we proposed XG-BoT, an explainable deep graph neural network model for botnet node detection. The proposed model is mainly composed of a botnet detector and an explainer for automatic forensics. The XG-BoT detector can effectively detect malicious botnet nodes under large-scale networks. Specifically, it utilizes a grouped reversible residual connection with a graph isomorphism network to learn expressive node representations from the botnet communication graphs. The explainer in XG-BoT can perform automatic network forensics by highlighting suspicious network flows and related botnet nodes. We evaluated XG-BoT on real-world, large-scale botnet network graphs. Overall, XG-BoT is able to outperform the state-of-the-art in terms of evaluation metrics. In addition, we show that the XG-BoT explainer can generate useful explanations based on GNNExplainer for automatic network forensics.
    Identity Testing for High-Dimensional Distributions via Entropy Tensorization. (arXiv:2207.09102v1 [cs.DS])
    We present improved algorithms and matching statistical and computational lower bounds for the problem of identity testing $n$-dimensional distributions. In the identity testing problem, we are given as input an explicit distribution $\mu$, an $\varepsilon>0$, and access to a sampling oracle for a hidden distribution $\pi$. The goal is to distinguish whether the two distributions $\mu$ and $\pi$ are identical or are at least $\varepsilon$-far apart. When there is only access to full samples from the hidden distribution $\pi$, it is known that exponentially many samples may be needed, and hence previous works have studied identity testing with additional access to various conditional sampling oracles. We consider here a significantly weaker conditional sampling oracle, called the Coordinate Oracle, and provide a fairly complete computational and statistical characterization of the identity testing problem in this new model. We prove that if an analytic property known as approximate tensorization of entropy holds for the visible distribution $\mu$, then there is an efficient identity testing algorithm for any hidden $\pi$ that uses $\tilde{O}(n/\varepsilon)$ queries to the Coordinate Oracle. Approximate tensorization of entropy is a classical tool for proving optimal mixing time bounds of Markov chains for high-dimensional distributions, and recently has been established for many families of distributions via spectral independence. We complement our algorithmic result for identity testing with a matching $\Omega(n/\varepsilon)$ statistical lower bound for the number of queries under the Coordinate Oracle. We also prove a computational phase transition: for sparse antiferromagnetic Ising models over $\{+1,-1\}^n$, in the regime where approximate tensorization of entropy fails, there is no efficient identity testing algorithm unless RP=NP.
    FedX: Unsupervised Federated Learning with Cross Knowledge Distillation. (arXiv:2207.09158v1 [cs.CV])
    This paper presents FedX, an unsupervised federated learning framework. Our model learns unbiased representation from decentralized and heterogeneous local data. It employs a two-sided knowledge distillation with contrastive learning as a core component, allowing the federated system to function without requiring clients to share any data features. Furthermore, its adaptable architecture can be used as an add-on module for existing unsupervised algorithms in federated settings. Experiments show that our model improves performance significantly (1.58--5.52pp) on five unsupervised algorithms.
    Similarity of Pre-trained and Fine-tuned Representations. (arXiv:2207.09225v1 [cs.LG])
    In transfer learning, only the last part of the networks - the so-called head - is often fine-tuned. Representation similarity analysis shows that the most significant change still occurs in the head even if all weights are updatable. However, recent results from few-shot learning have shown that representation change in the early layers, which are mostly convolutional, is beneficial, especially in the case of cross-domain adaption. In our paper, we find out whether that also holds true for transfer learning. In addition, we analyze the change of representation in transfer learning, both during pre-training and fine-tuning, and find out that pre-trained structure is unlearned if not usable.
    Theseus: A Library for Differentiable Nonlinear Optimization. (arXiv:2207.09442v1 [cs.RO])
    We present Theseus, an efficient application-agnostic open source library for differentiable nonlinear least squares (DNLS) optimization built on PyTorch, providing a common framework for end-to-end structured learning in robotics and vision. Existing DNLS implementations are application specific and do not always incorporate many ingredients important for efficiency. Theseus is application-agnostic, as we illustrate with several example applications that are built using the same underlying differentiable components, such as second-order optimizers, standard costs functions, and Lie groups. For efficiency, Theseus incorporates support for sparse solvers, automatic vectorization, batching, GPU acceleration, and gradient computation with implicit differentiation and direct loss minimization. We do extensive performance evaluation in a set of applications, demonstrating significant efficiency gains and better scalability when these features are incorporated. Project page: https://sites.google.com/view/theseus-ai
    Sufficient Statistic Memory AMP. (arXiv:2112.15327v3 [cs.IT] UPDATED)
    Approximate message passing (AMP) type algorithms have been widely used in the signal reconstruction of certain large random linear systems. A key feature of the AMP-type algorithms is that their dynamics can be correctly described by state evolution. However, the state evolution does not necessarily be convergent. To solve the convergence problem of the state evolution of AMP-type algorithms in principle, this paper proposes a memory AMP (MAMP) under a sufficient statistic condition, named sufficient statistic MAMP (SS-MAMP). We show that the covariance matrices of SS-MAMP are L-banded and convergent. Given an arbitrary MAMP, we can construct an SS-MAMP by damping, which not only ensures the convergence of the state evolution, but also preserves the orthogonality, i.e., its dynamics can be correctly described by state evolution. As a byproduct, we prove that the Bayes-optimal orthogonal/vector AMP (BO-OAMP/VAMP) is an SS-MAMP. As a result, we reveal two interesting properties of BO-OAMP/VAMP for large systems: 1) the covariance matrices are L-banded and are convergent, and 2) damping and memory are not needed (i.e., do not bring performance improvement). As an example, we construct a sufficient statistic Bayes-optimal MAMP (SS-BO-MAMP) whose state evolution converges to the minimum (i.e., Bayes-optimal) mean square error (MSE) predicted by replica methods. In addition, the MSE of SS-BO-MAMP is not worse than the original BO-MAMP. Finally, simulations are provided to verify the theoretical results.
    Green, Quantized Federated Learning over Wireless Networks: An Energy-Efficient Design. (arXiv:2207.09387v1 [cs.LG])
    In this paper, a green, quantized FL framework, which represents data with a finite precision level in both local training and uplink transmission, is proposed. Here, the finite precision level is captured through the use of quantized neural networks (QNNs) that quantize weights and activations in fixed-precision format. In the considered FL model, each device trains its QNN and transmits a quantized training result to the base station. Energy models for the local training and the transmission with quantization are rigorously derived. To minimize the energy consumption and the number of communication rounds simultaneously, a multi-objective optimization problem is formulated with respect to the number of local iterations, the number of selected devices, and the precision levels for both local training and transmission while ensuring convergence under a target accuracy constraint. To solve this problem, the convergence rate of the proposed FL system is analytically derived with respect to the system control variables. Then, the Pareto boundary of the problem is characterized to provide efficient solutions using the normal boundary inspection method. Design insights on balancing the tradeoff between the two objectives are drawn from using the Nash bargaining solution and analyzing the derived convergence rate. Simulation results show that the proposed FL framework can reduce energy consumption until convergence by up to 52% compared to a baseline FL algorithm that represents data with full precision.
    The Implicit Bias of Gradient Descent on Separable Data. (arXiv:1710.10345v5 [stat.ML] UPDATED)
    We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization n more complex models and with other optimization methods.
    Online Dynamics Learning for Predictive Control with an Application to Aerial Robots. (arXiv:2207.09344v1 [cs.RO])
    In this work, we consider the task of improving the accuracy of dynamic models for model predictive control (MPC) in an online setting. Even though prediction models can be learned and applied to model-based controllers, these models are often learned offline. In this offline setting, training data is first collected and a prediction model is learned through an elaborated training procedure. After the model is trained to a desired accuracy, it is then deployed in a model predictive controller. However, since the model is learned offline, it does not adapt to disturbances or model errors observed during deployment. To improve the adaptiveness of the model and the controller, we propose an online dynamics learning framework that continually improves the accuracy of the dynamic model during deployment. We adopt knowledge-based neural ordinary differential equations (KNODE) as the dynamic models, and use techniques inspired by transfer learning to continually improve the model accuracy. We demonstrate the efficacy of our framework with a quadrotor robot, and verify the framework in both simulations and physical experiments. Results show that the proposed approach is able to account for disturbances that are possibly time-varying, while maintaining good trajectory tracking performance.
    Deeply-Learned Generalized Linear Models with Missing Data. (arXiv:2207.08911v1 [stat.ML])
    Deep Learning (DL) methods have dramatically increased in popularity in recent years, with significant growth in their application to supervised learning problems in the biomedical sciences. However, the greater prevalence and complexity of missing data in modern biomedical datasets present significant challenges for DL methods. Here, we provide a formal treatment of missing data in the context of deeply learned generalized linear models, a supervised DL architecture for regression and classification problems. We propose a new architecture, \textit{dlglm}, that is one of the first to be able to flexibly account for both ignorable and non-ignorable patterns of missingness in input features and response at training time. We demonstrate through statistical simulation that our method outperforms existing approaches for supervised learning tasks in the presence of missing not at random (MNAR) missingness. We conclude with a case study of a Bank Marketing dataset from the UCI Machine Learning Repository, in which we predict whether clients subscribed to a product based on phone survey data.
    Over-the-Air Federated Edge Learning with Hierarchical Clustering. (arXiv:2207.09232v1 [cs.LG])
    We examine federated learning (FL) with over-the-air (OTA) aggregation, where mobile users (MUs) aim to reach a consensus on a global model with the help of a parameter server (PS) that aggregates the local gradients. In OTA FL, MUs train their models using local data at every training round and transmit their gradients simultaneously using the same frequency band in an uncoded fashion. Based on the received signal of the superposed gradients, the PS performs a global model update. While the OTA FL has a significantly decreased communication cost, it is susceptible to adverse channel effects and noise. Employing multiple antennas at the receiver side can reduce these effects, yet the path-loss is still a limiting factor for users located far away from the PS. To ameliorate this issue, in this paper, we propose a wireless-based hierarchical FL scheme that uses intermediate servers (ISs) to form clusters at the areas where the MUs are more densely located. Our scheme utilizes OTA cluster aggregations for the communication of the MUs with their corresponding IS, and OTA global aggregations from the ISs to the PS. We present a convergence analysis for the proposed algorithm, and show through numerical evaluations of the derived analytical expressions and experimental results that utilizing ISs results in a faster convergence and a better performance than the OTA FL alone while using less transmit power. We also validate the results on the performance using different number of cluster iterations with different datasets and data distributions. We conclude that the best choice of cluster aggregations depends on the data distribution among the MUs and the clusters.
    Riemannian Stochastic Gradient Method for Nested Composition Optimization. (arXiv:2207.09350v1 [math.OC])
    This work considers optimization of composition of functions in a nested form over Riemannian manifolds where each function contains an expectation. This type of problems is gaining popularity in applications such as policy evaluation in reinforcement learning or model customization in meta-learning. The standard Riemannian stochastic gradient methods for non-compositional optimization cannot be directly applied as stochastic approximation of inner functions create bias in the gradients of the outer functions. For two-level composition optimization, we present a Riemannian Stochastic Composition Gradient Descent (R-SCGD) method that finds an approximate stationary point, with expected squared Riemannian gradient smaller than $\epsilon$, in $O(\epsilon^{-2})$ calls to the stochastic gradient oracle of the outer function and stochastic function and gradient oracles of the inner function. Furthermore, we generalize the R-SCGD algorithms for problems with multi-level nested compositional structures, with the same complexity of $O(\epsilon^{-2})$ for the first-order stochastic oracle. Finally, the performance of the R-SCGD method is numerically evaluated over a policy evaluation problem in reinforcement learning.
    EVE: Environmental Adaptive Neural Network Models for Low-power Energy Harvesting System. (arXiv:2207.09258v1 [cs.LG])
    IoT devices are increasingly being implemented with neural network models to enable smart applications. Energy harvesting (EH) technology that harvests energy from ambient environment is a promising alternative to batteries for powering those devices due to the low maintenance cost and wide availability of the energy sources. However, the power provided by the energy harvester is low and has an intrinsic drawback of instability since it varies with the ambient environment. This paper proposes EVE, an automated machine learning (autoML) co-exploration framework to search for desired multi-models with shared weights for energy harvesting IoT devices. Those shared models incur significantly reduced memory footprint with different levels of model sparsity, latency, and accuracy to adapt to the environmental changes. An efficient on-device implementation architecture is further developed to efficiently execute each model on device. A run-time model extraction algorithm is proposed that retrieves individual model with negligible overhead when a specific model mode is triggered. Experimental results show that the neural networks models generated by EVE is on average 2.5X times faster than the baseline models without pruning and shared weights.
    Deep Sequence Models for Text Classification Tasks. (arXiv:2207.08880v1 [cs.CL])
    The exponential growth of data generated on the Internet in the current information age is a driving force for the digital economy. Extraction of information is the major value in an accumulated big data. Big data dependency on statistical analysis and hand-engineered rules machine learning algorithms are overwhelmed with vast complexities inherent in human languages. Natural Language Processing (NLP) is equipping machines to understand these human diverse and complicated languages. Text Classification is an NLP task which automatically identifies patterns based on predefined or undefined labeled sets. Common text classification application includes information retrieval, modeling news topic, theme extraction, sentiment analysis, and spam detection. In texts, some sequences of words depend on the previous or next word sequences to make full meaning; this is a challenging dependency task that requires the machine to be able to store some previous important information to impact future meaning. Sequence models such as RNN, GRU, and LSTM is a breakthrough for tasks with long-range dependencies. As such, we applied these models to Binary and Multi-class classification. Results generated were excellent with most of the models performing within the range of 80% and 94%. However, this result is not exhaustive as we believe there is room for improvement if machines are to compete with humans.
    Metadata Representations for Queryable ML Model Zoos. (arXiv:2207.09315v1 [cs.LG])
    Machine learning (ML) practitioners and organizations are building model zoos of pre-trained models, containing metadata describing properties of the ML models and datasets that are useful for reporting, auditing, reproducibility, and interpretability purposes. The metatada is currently not standardised; its expressivity is limited; and there is no interoperable way to store and query it. Consequently, model search, reuse, comparison, and composition are hindered. In this paper, we advocate for standardized ML model meta-data representation and management, proposing a toolkit supported to help practitioners manage and query that metadata.
    Heterogeneous Treatment Effect with Trained Kernels of the Nadaraya-Watson Regression. (arXiv:2207.09139v1 [cs.LG])
    A new method for estimating the conditional average treatment effect is proposed in the paper. It is called TNW-CATE (the Trainable Nadaraya-Watson regression for CATE) and based on the assumption that the number of controls is rather large whereas the number of treatments is small. TNW-CATE uses the Nadaraya-Watson regression for predicting outcomes of patients from the control and treatment groups. The main idea behind TNW-CATE is to train kernels of the Nadaraya-Watson regression by using a weight sharing neural network of a specific form. The network is trained on controls, and it replaces standard kernels with a set of neural subnetworks with shared parameters such that every subnetwork implements the trainable kernel, but the whole network implements the Nadaraya-Watson estimator. The network memorizes how the feature vectors are located in the feature space. The proposed approach is similar to the transfer learning when domains of source and target data are similar, but tasks are different. Various numerical simulation experiments illustrate TNW-CATE and compare it with the well-known T-learner, S-learner and X-learner for several types of the control and treatment outcome functions. The code of proposed algorithms implementing TNW-CATE is available in https://github.com/Stasychbr/TNW-CATE.
    Formal Algorithms for Transformers. (arXiv:2207.09238v1 [cs.LG])
    This document aims to be a self-contained, mathematically precise overview of transformer architectures and algorithms (*not* results). It covers what transformers are, how they are trained, what they are used for, their key architectural components, and a preview of the most prominent models. The reader is assumed to be familiar with basic ML terminology and simpler neural network architectures such as MLPs.
    A-SFS: Semi-supervised Feature Selection based on Multi-task Self-supervision. (arXiv:2207.09061v1 [cs.LG])
    Feature selection is an important process in machine learning. It builds an interpretable and robust model by selecting the features that contribute the most to the prediction target. However, most mature feature selection algorithms, including supervised and semi-supervised, fail to fully exploit the complex potential structure between features. We believe that these structures are very important for the feature selection process, especially when labels are lacking and data is noisy. To this end, we innovatively introduce a deep learning-based self-supervised mechanism into feature selection problems, namely batch-Attention-based Self-supervision Feature Selection(A-SFS). Firstly, a multi-task self-supervised autoencoder is designed to uncover the hidden structure among features with the support of two pretext tasks. Guided by the integrated information from the multi-self-supervised learning model, a batch-attention mechanism is designed to generate feature weights according to batch-based feature selection patterns to alleviate the impacts introduced by a handful of noisy data. This method is compared to 14 major strong benchmarks, including LightGBM and XGBoost. Experimental results show that A-SFS achieves the highest accuracy in most datasets. Furthermore, this design significantly reduces the reliance on labels, with only 1/10 labeled data needed to achieve the same performance as those state of art baselines. Results show that A-SFS is also most robust to the noisy and missing data.
    Robustar: Interactive Toolbox Supporting Precise Data Annotation for Robust Vision Learning. (arXiv:2207.08944v1 [cs.CV])
    We introduce the initial release of our software Robustar, which aims to improve the robustness of vision classification machine learning models through a data-driven perspective. Building upon the recent understanding that the lack of machine learning model's robustness is the tendency of the model's learning of spurious features, we aim to solve this problem from its root at the data perspective by removing the spurious features from the data before training. In particular, we introduce a software that helps the users to better prepare the data for training image classification models by allowing the users to annotate the spurious features at the pixel level of images. To facilitate this process, our software also leverages recent advances to help identify potential images and pixels worthy of attention and to continue the training with newly annotated data. Our software is hosted at the GitHub Repository https://github.com/HaohanWang/Robustar.
    Adaptive Learning for the Resource-Constrained Classification Problem. (arXiv:2207.09196v1 [cs.LG])
    Resource-constrained classification tasks are common in real-world applications such as allocating tests for disease diagnosis, hiring decisions when filling a limited number of positions, and defect detection in manufacturing settings under a limited inspection budget. Typical classification algorithms treat the learning process and the resource constraints as two separate and sequential tasks. Here we design an adaptive learning approach that considers resource constraints and learning jointly by iteratively fine-tuning misclassification costs. Via a structured experimental study using a publicly available data set, we evaluate a decision tree classifier that utilizes the proposed approach. The adaptive learning approach performs significantly better than alternative approaches, especially for difficult classification problems in which the performance of common approaches may be unsatisfactory. We envision the adaptive learning approach as an important addition to the repertoire of techniques for handling resource-constrained classification problems.
    Towards Learning Self-Organized Criticality of Rydberg Atoms using Graph Neural Networks. (arXiv:2207.08927v1 [physics.atom-ph])
    Self-Organized Criticality (SOC) is a ubiquitous dynamical phenomenon believed to be responsible for the emergence of universal scale-invariant behavior in many, seemingly unrelated systems, such as forest fires, virus spreading or atomic excitation dynamics. SOC describes the buildup of large-scale and long-range spatio-temporal correlations as a result of only local interactions and dissipation. The simulation of SOC dynamics is typically based on Monte-Carlo (MC) methods, which are however numerically expensive and do not scale beyond certain system sizes. We investigate the use of Graph Neural Networks (GNNs) as an effective surrogate model to learn the dynamics operator for a paradigmatic SOC system, inspired by an experimentally accessible physics example: driven Rydberg atoms. To this end, we generalize existing GNN simulation approaches to predict dynamics for the internal state of the node. We show that we can accurately reproduce the MC dynamics as well as generalize along the two important axes of particle number and particle density. This paves the way to model much larger systems beyond the limits of traditional MC methods. While the exact system is inspired by the dynamics of Rydberg atoms, the approach is quite general and can readily be applied to other systems.
    Discovering novel systemic biomarkers in photos of the external eye. (arXiv:2207.08998v1 [eess.IV])
    External eye photos were recently shown to reveal signs of diabetic retinal disease and elevated HbA1c. In this paper, we evaluate if external eye photos contain information about additional systemic medical conditions. We developed a deep learning system (DLS) that takes external eye photos as input and predicts multiple systemic parameters, such as those related to the liver (albumin, AST); kidney (eGFR estimated using the race-free 2021 CKD-EPI creatinine equation, the urine ACR); bone & mineral (calcium); thyroid (TSH); and blood count (Hgb, WBC, platelets). Development leveraged 151,237 images from 49,015 patients with diabetes undergoing diabetic eye screening in 11 sites across Los Angeles county, CA. Evaluation focused on 9 pre-specified systemic parameters and leveraged 3 validation sets (A, B, C) spanning 28,869 patients with and without diabetes undergoing eye screening in 3 independent sites in Los Angeles County, CA, and the greater Atlanta area, GA. We compared against baseline models incorporating available clinicodemographic variables (e.g. age, sex, race/ethnicity, years with diabetes). Relative to the baseline, the DLS achieved statistically significant superior performance at detecting AST>36, calcium=300, and WBC=300 and Hgb<11 by 7.3-13.2%. Our findings provide further evidence that external eye photos contain important biomarkers of systemic health spanning multiple organ systems. Further work is needed to investigate whether and how these biomarkers can be translated into clinical impact.
    Easy Batch Normalization. (arXiv:2207.08940v1 [cs.LG])
    It was shown that adversarial examples improve object recognition. But what about their opposite side, easy examples? Easy examples are samples that the machine learning model classifies correctly with high confidence. In our paper, we are making the first step toward exploring the potential benefits of using easy examples in the training procedure of neural networks. We propose to use an auxiliary batch normalization for easy examples for the standard and robust accuracy improvement.
    Balanced Contrastive Learning for Long-Tailed Visual Recognition. (arXiv:2207.09052v1 [cs.CV])
    Real-world data typically follow a long-tailed distribution, where a few majority categories occupy most of the data while most minority categories contain a limited number of samples. Classification models minimizing cross-entropy struggle to represent and classify the tail classes. Although the problem of learning unbiased classifiers has been well studied, methods for representing imbalanced data are under-explored. In this paper, we focus on representation learning for imbalanced data. Recently, supervised contrastive learning has shown promising performance on balanced data recently. However, through our theoretical analysis, we find that for long-tailed data, it fails to form a regular simplex which is an ideal geometric configuration for representation learning. To correct the optimization behavior of SCL and further improve the performance of long-tailed visual recognition, we propose a novel loss for balanced contrastive learning (BCL). Compared with SCL, we have two improvements in BCL: class-averaging, which balances the gradient contribution of negative classes; class-complement, which allows all classes to appear in every mini-batch. The proposed balanced contrastive learning (BCL) method satisfies the condition of forming a regular simplex and assists the optimization of cross-entropy. Equipped with BCL, the proposed two-branch framework can obtain a stronger feature representation and achieve competitive performance on long-tailed benchmark datasets such as CIFAR-10-LT, CIFAR-100-LT, ImageNet-LT, and iNaturalist2018. Our code is available at \href{https://github.com/FlamieZhu/BCL}{this URL}.
    A Deep Reinforcement Learning Approach for Finding Non-Exploitable Strategies in Two-Player Atari Games. (arXiv:2207.08894v1 [cs.LG])
    This paper proposes novel, end-to-end deep reinforcement learning algorithms for learning two-player zero-sum Markov games. Our objective is to find the Nash Equilibrium policies, which are free from exploitation by adversarial opponents. Distinct from prior efforts on finding Nash equilibria in extensive-form games such as Poker, which feature tree-structured transition dynamics and discrete state space, this paper focuses on Markov games with general transition dynamics and continuous state space. We propose (1) Nash DQN algorithm, which integrates DQN with a Nash finding subroutine for the joint value functions; and (2) Nash DQN Exploiter algorithm, which additionally adopts an exploiter for guiding agent's exploration. Our algorithms are the practical variants of theoretical algorithms which are guaranteed to converge to Nash equilibria in the basic tabular setting. Experimental evaluation on both tabular examples and two-player Atari games demonstrates the robustness of the proposed algorithms against adversarial opponents, as well as their advantageous performance over existing methods.
    FactGraph: Evaluating Factuality in Summarization with Semantic Graph Representations. (arXiv:2204.06508v2 [cs.CL] UPDATED)
    Despite recent improvements in abstractive summarization, most current approaches generate summaries that are not factually consistent with the source document, severely restricting their trust and usage in real-world applications. Recent works have shown promising improvements in factuality error identification using text or dependency arc entailments; however, they do not consider the entire semantic graph simultaneously. To this end, we propose FactGraph, a method that decomposes the document and the summary into structured meaning representations (MR), which are more suitable for factuality evaluation. MRs describe core semantic concepts and their relations, aggregating the main content in both document and summary in a canonical form, and reducing data sparsity. FactGraph encodes such graphs using a graph encoder augmented with structure-aware adapters to capture interactions among the concepts based on the graph connectivity, along with text representations using an adapter-based text encoder. Experiments on different benchmarks for evaluating factuality show that FactGraph outperforms previous approaches by up to 15%. Furthermore, FactGraph improves performance on identifying content verifiability errors and better captures subsentence-level factual inconsistencies.
    Towards Trustworthy Healthcare AI: Attention-Based Feature Learning for COVID-19 Screening With Chest Radiography. (arXiv:2207.09312v1 [eess.IV])
    Building AI models with trustworthiness is important especially in regulated areas such as healthcare. In tackling COVID-19, previous work uses convolutional neural networks as the backbone architecture, which has shown to be prone to over-caution and overconfidence in making decisions, rendering them less trustworthy -- a crucial flaw in the context of medical imaging. In this study, we propose a feature learning approach using Vision Transformers, which use an attention-based mechanism, and examine the representation learning capability of Transformers as a new backbone architecture for medical imaging. Through the task of classifying COVID-19 chest radiographs, we investigate into whether generalization capabilities benefit solely from Vision Transformers' architectural advances. Quantitative and qualitative evaluations are conducted on the trustworthiness of the models, through the use of "trust score" computation and a visual explainability technique. We conclude that the attention-based feature learning approach is promising in building trustworthy deep learning models for healthcare.
    Indoor Localization for Personalized Ambient Assisted Living of Multiple Users in Multi-Floor Smart Environments. (arXiv:2207.09025v1 [cs.AI])
    This paper presents a multifunctional interdisciplinary framework that makes four scientific contributions towards the development of personalized ambient assisted living, with a specific focus to address the different and dynamic needs of the diverse aging population in the future of smart living environments. First, it presents a probabilistic reasoning-based mathematical approach to model all possible forms of user interactions for any activity arising from the user diversity of multiple users in such environments. Second, it presents a system that uses this approach with a machine learning method to model individual user profiles and user-specific user interactions for detecting the dynamic indoor location of each specific user. Third, to address the need to develop highly accurate indoor localization systems for increased trust, reliance, and seamless user acceptance, the framework introduces a novel methodology where two boosting approaches Gradient Boosting and the AdaBoost algorithm are integrated and used on a decision tree-based learning model to perform indoor localization. Fourth, the framework introduces two novel functionalities to provide semantic context to indoor localization in terms of detecting each user's floor-specific location as well as tracking whether a specific user was located inside or outside a given spatial region in a multi-floor-based indoor setting. These novel functionalities of the proposed framework were tested on a dataset of localization-related Big Data collected from 18 different users who navigated in 3 buildings consisting of 5 floors and 254 indoor spatial regions. The results show that this approach of indoor localization for personalized AAL that models each specific user always achieves higher accuracy as compared to the traditional approach of modeling an average user.
    A sharp uniform-in-time error estimate for Stochastic Gradient Langevin Dynamics. (arXiv:2207.09304v1 [math.PR])
    We establish a sharp uniform-in-time error estimate for the Stochastic Gradient Langevin Dynamics (SGLD), which is a popular sampling algorithm. Under mild assumptions, we obtain a uniform-in-time $O(\eta^2)$ bound for the KL-divergence between the SGLD iteration and the Langevin diffusion, where $\eta$ is the step size (or learning rate). Our analysis is also valid for varying step sizes. Based on this, we are able to obtain an $O(\eta)$ bound for the distance between the SGLD iteration and the invariant distribution of the Langevin diffusion, in terms of Wasserstein or total variation distances.
    Multi-step domain adaptation by adversarial attack to $\mathcal{H} \Delta \mathcal{H}$-divergence. (arXiv:2207.08948v1 [cs.LG])
    Adversarial examples are transferable between different models. In our paper, we propose to use this property for multi-step domain adaptation. In unsupervised domain adaptation settings, we demonstrate that replacing the source domain with adversarial examples to $\mathcal{H} \Delta \mathcal{H}$-divergence can improve source classifier accuracy on the target domain. Our method can be connected to most domain adaptation techniques. We conducted a range of experiments and achieved improvement in accuracy on Digits and Office-Home datasets.
    Residual and Attentional Architectures for Vector-Symbols. (arXiv:2207.08953v1 [cs.LG])
    Vector-symbolic architectures (VSAs) provide methods for computing which are highly flexible and carry unique advantages. Concepts in VSAs are represented by 'symbols,' long vectors of values which utilize properties of high-dimensional spaces to represent and manipulate information. In this new work, we combine efficiency of the operations provided within the framework of the Fourier Holographic Reduced Representation (FHRR) VSA with the power of deep networks to construct novel VSA based residual and attention-based neural network architectures. Using an attentional FHRR architecture, we demonstrate that the same network architecture can address problems from different domains (image classification and molecular toxicity prediction) by encoding different information into the network's inputs, similar to the Perceiver model. This demonstrates a novel application of VSAs and a potential path to implementing state-of-the-art neural models on neuromorphic hardware.
    Analyzing Bagging Methods for Language Models. (arXiv:2207.09099v1 [cs.CL])
    Modern language models leverage increasingly large numbers of parameters to achieve performance on natural language understanding tasks. Ensembling these models in specific configurations for downstream tasks show even further performance improvements. In this paper, we perform an analysis of bagging language models and compare single language models to bagged ensembles that are roughly equivalent in terms of final model size. We explore an array of model bagging configurations for natural language understanding tasks with final ensemble sizes ranging from 300M parameters to 1.5B parameters and determine that our ensembling methods are at best roughly equivalent to single LM baselines. We note other positive effects of bagging and pruning in specific scenarios according to findings in our experiments such as variance reduction and minor performance improvements.
    When Deep Classifiers Agree: Analyzing Correlations between Learning Order and Image Statistics. (arXiv:2105.08997v2 [cs.LG] UPDATED)
    Although a plethora of architectural variants for deep classification has been introduced over time, recent works have found empirical evidence towards similarities in their training process. It has been hypothesized that neural networks converge not only to similar representations, but also exhibit a notion of empirical agreement on which data instances are learned first. Following in the latter works$'$ footsteps, we define a metric to quantify the relationship between such classification agreement over time, and posit that the agreement phenomenon can be mapped to core statistics of the investigated dataset. We empirically corroborate this hypothesis across the CIFAR10, Pascal, ImageNet and KTH-TIPS2 datasets. Our findings indicate that agreement seems to be independent of specific architectures, training hyper-parameters or labels, albeit follows an ordering according to image statistics.
    Revealing the CO2 emission reduction of ridesplitting and its determinants based on real-world data. (arXiv:2204.00777v2 [cs.LG] UPDATED)
    Ridesplitting, which is a form of pooled ridesourcing service, has great potential to alleviate the negative impacts of ridesourcing on the environment. However, most existing studies only explored its theoretical environmental benefits based on optimization models and simulations. By contrast, this study aims to reveal the real-world emission reduction of ridesplitting and its determinants based on the observed data of ridesourcing in Chengdu, China. Integrating the trip data with the COPERT model, this study calculates the CO2 emissions of shared rides (ridesplitting) and their substituted single rides (regular ridesourcing) to estimate the CO2 emission reduction of each ridesplitting trip. The results show that not all ridesplitting trips reduce emissions from ridesourcing in the real world. The CO2 emission reduction rate of ridesplitting varies from trip to trip, averaging at 43.15g/km. Then, interpretable machine learning models, gradient boosting machines, are applied to explore the relationship between the CO2 emission reduction rate of ridesplitting and its determinants. Based on the SHapley Additive exPlanations (SHAP) method, the overlap rate and detour rate of shared rides are identified to be the most important factors that determine the CO2 emission reduction rate of ridesplitting. Increasing the overlap rate, the number of shared rides, average speed, and ride distance ratio while decreasing the detour rate, actual trip distance, and ride distance gap can increase the CO2 emission reduction rate of ridesplitting. In addition, nonlinear effects and interactions of the determinants are examined through the partial dependence plots. To sum up, this study provides a scientific method for the government and ridesourcing companies to better assess and optimize the environmental benefits of ridesplitting.
    Multi-view hierarchical Variational AutoEncoders with Factor Analysis latent space. (arXiv:2207.09185v1 [cs.LG])
    Real-world databases are complex, they usually present redundancy and shared correlations between heterogeneous and multiple representations of the same data. Thus, exploiting and disentangling shared information between views is critical. For this purpose, recent studies often fuse all views into a shared nonlinear complex latent space but they lose the interpretability. To overcome this limitation, here we propose a novel method to combine multiple Variational AutoEncoders (VAE) architectures with a Factor Analysis latent space (FA-VAE). Concretely, we use a VAE to learn a private representation of each heterogeneous view in a continuous latent space. Then, we model the shared latent space by projecting every private variable to a low-dimensional latent space using a linear projection matrix. Thus, we create an interpretable hierarchical dependency between private and shared information. This way, the novel model is able to simultaneously: (i) learn from multiple heterogeneous views, (ii) obtain an interpretable hierarchical shared space, and, (iii) perform transfer learning between generative models.
    Explainability of deep vision-based autonomous driving systems: Review and challenges. (arXiv:2101.05307v2 [cs.CV] UPDATED)
    This survey reviews explainability methods for vision-based self-driving systems trained with behavior cloning. The concept of explainability has several facets and the need for explainability is strong in driving, a safety-critical application. Gathering contributions from several research fields, namely computer vision, deep learning, autonomous driving, explainable AI (X-AI), this survey tackles several points. First, it discusses definitions, context, and motivation for gaining more interpretability and explainability from self-driving systems, as well as the challenges that are specific to this application. Second, methods providing explanations to a black-box self-driving system in a post-hoc fashion are comprehensively organized and detailed. Third, approaches from the literature that aim at building more interpretable self-driving systems by design are presented and discussed in detail. Finally, remaining open-challenges and potential future research directions are identified and examined.
    Decorrelative Network Architecture for Robust Electrocardiogram Classification. (arXiv:2207.09031v1 [cs.LG])
    Artificial intelligence has made great progresses in medical data analysis, but the lack of robustness and interpretability has kept these methods from being widely deployed. In particular, data-driven models are vulnerable to adversarial attacks, which are small, targeted perturbations that dramatically degrade model performance. As a recent example, while deep learning has shown impressive performance in electrocardiogram (ECG) classification, Han et al. crafted realistic perturbations that fooled the network 74% of the time [2020]. Current adversarial defense paradigms are computationally intensive and impractical for many high dimensional problems. Previous research indicates that a network vulnerability is related to the features learned during training. We propose a novel approach based on ensemble decorrelation and Fourier partitioning for training parallel network arms into a decorrelated architecture to learn complementary features, significantly reducing the chance of a perturbation fooling all arms of the deep learning model. We test our approach in ECG classification, demonstrating a much-improved 77.2% chance of at least one correct network arm on the strongest adversarial attack tested, in contrast to a 21.7% chance from a comparable ensemble. Our approach does not require expensive optimization with adversarial samples, and thus can be scaled to large problems. These methods can easily be applied to other tasks for improved network robustness.
    Active-Learning-as-a-Service: An Efficient MLOps System for Data-Centric AI. (arXiv:2207.09109v1 [cs.LG])
    The success of today's AI applications requires not only model training (Model-centric) but also data engineering (Data-centric). In data-centric AI, active learning (AL) plays a vital role, but current AL tools can not perform AL tasks efficiently. To this end, this paper presents an efficient MLOps system for AL, named ALaaS (Active-Learning-as-a-Service). Specifically, ALaaS adopts a server-client architecture to support an AL pipeline and implements stage-level parallelism for high efficiency. Meanwhile, caching and batching techniques are employed to further accelerate the AL process. In addition to efficiency, ALaaS ensures accessibility with the help of the design philosophy of configuration-as-a-service. It also abstracts an AL process to several components and provides rich APIs for advanced users to extend the system to new scenarios. Extensive experiments show that ALaaS outperforms all other baselines in terms of latency and throughput. Further ablation studies demonstrate the effectiveness of our design as well as ALaaS's ease to use. Our code is available at \url{https://github.com/MLSysOps/alaas}.
    Training Large-Vocabulary Neural Language Models by Private Federated Learning for Resource-Constrained Devices. (arXiv:2207.08988v1 [cs.LG])
    Federated Learning (FL) is a technique to train models using data distributed across devices. Differential Privacy (DP) provides a formal privacy guarantee for sensitive data. Our goal is to train a large neural network language model (NNLM) on compute-constrained devices while preserving privacy using FL and DP. However, the DP-noise introduced to the model increases as the model size grows, which often prevents convergence. We propose Partial Embedding Updates (PEU), a novel technique to decrease noise by decreasing payload size. Furthermore, we adopt Low Rank Adaptation (LoRA) and Noise Contrastive Estimation (NCE) to reduce the memory demands of large models on compute-constrained devices. This combination of techniques makes it possible to train large-vocabulary language models while preserving accuracy and privacy.
    Benchmarking Machine Learning Robustness in Covid-19 Genome Sequence Classification. (arXiv:2207.08898v1 [q-bio.GN])
    The rapid spread of the COVID-19 pandemic has resulted in an unprecedented amount of sequence data of the SARS-CoV-2 genome -- millions of sequences and counting. This amount of data, while being orders of magnitude beyond the capacity of traditional approaches to understanding the diversity, dynamics, and evolution of viruses is nonetheless a rich resource for machine learning (ML) approaches as alternatives for extracting such important information from these data. It is of hence utmost importance to design a framework for testing and benchmarking the robustness of these ML models. This paper makes the first effort (to our knowledge) to benchmark the robustness of ML models by simulating biological sequences with errors. In this paper, we introduce several ways to perturb SARS-CoV-2 genome sequences to mimic the error profiles of common sequencing platforms such as Illumina and PacBio. We show from experiments on a wide array of ML models that some simulation-based approaches are more robust (and accurate) than others for specific embedding methods to certain adversarial attacks to the input sequences. Our benchmarking framework may assist researchers in properly assessing different ML models and help them understand the behavior of the SARS-CoV-2 virus or avoid possible future pandemics.
    A New Perspective on Stabilizing GANs training: Direct Adversarial Training. (arXiv:2008.09041v5 [eess.IV] UPDATED)
    Generative Adversarial Networks (GANs) are the most popular image generation models that have achieved remarkable progress on various computer vision tasks. However, training instability is still one of the open problems for all GAN-based algorithms. Quite a number of methods have been proposed to stabilize the training of GANs, the focuses of which were respectively put on the loss functions, regularization and normalization technologies, training algorithms, and model architectures. Different from the above methods, in this paper, a new perspective on stabilizing GANs training is presented. It is found that sometimes the images produced by the generator act like adversarial examples of the discriminator during the training process, which may be part of the reason causing the unstable training of GANs. With this finding, we propose the Direct Adversarial Training (DAT) method to stabilize the training process of GANs. Furthermore, we prove that the DAT method is able to minimize the Lipschitz constant of the discriminator adaptively. The advanced performance of DAT is verified on multiple loss functions, network architectures, hyper-parameters, and datasets. Specifically, DAT achieves significant improvements of 11.5% FID on CIFAR-100 unconditional generation based on SSGAN, 10.5% FID on STL-10 unconditional generation based on SSGAN, and 13.2% FID on LSUN-Bedroom unconditional generation based on SSGAN. Code will be available at https://github.com/iceli1007/DAT-GAN
    A Study of Deep CNN Model with Labeling Noise Based on Granular-ball Computing. (arXiv:2207.08810v1 [cs.LG])
    In supervised learning, the presence of noise can have a significant impact on decision making. Since many classifiers do not take label noise into account in the derivation of the loss function, including the loss functions of logistic regression, SVM, and AdaBoost, especially the AdaBoost iterative algorithm, whose core idea is to continuously increase the weight value of the misclassified samples, the weight of samples in many presence of label noise will be increased, leading to a decrease in model accuracy. In addition, the learning process of BP neural network and decision tree will also be affected by label noise. Therefore, solving the label noise problem is an important element of maintaining the robustness of the network model, which is of great practical significance. Granular ball computing is an important modeling method developed in the field of granular computing in recent years, which is an efficient, robust and scalable learning method. In this paper, we pioneered a granular ball neural network algorithm model, which adopts the idea of multi-granular to filter label noise samples during model training, solving the current problem of model instability caused by label noise in the field of deep learning, greatly reducing the proportion of label noise in training samples and improving the robustness of neural network models.
    The Caltech Fish Counting Dataset: A Benchmark for Multiple-Object Tracking and Counting. (arXiv:2207.09295v1 [cs.CV])
    We present the Caltech Fish Counting Dataset (CFC), a large-scale dataset for detecting, tracking, and counting fish in sonar videos. We identify sonar videos as a rich source of data for advancing low signal-to-noise computer vision applications and tackling domain generalization in multiple-object tracking (MOT) and counting. In comparison to existing MOT and counting datasets, which are largely restricted to videos of people and vehicles in cities, CFC is sourced from a natural-world domain where targets are not easily resolvable and appearance features cannot be easily leveraged for target re-identification. With over half a million annotations in over 1,500 videos sourced from seven different sonar cameras, CFC allows researchers to train MOT and counting algorithms and evaluate generalization performance at unseen test locations. We perform extensive baseline experiments and identify key challenges and opportunities for advancing the state of the art in generalization in MOT and counting.
    Uncertainty in Contrastive Learning: On the Predictability of Downstream Performance. (arXiv:2207.09336v1 [cs.LG])
    The superior performance of some of today's state-of-the-art deep learning models is to some extent owed to extensive (self-)supervised contrastive pretraining on large-scale datasets. In contrastive learning, the network is presented with pairs of positive (similar) and negative (dissimilar) datapoints and is trained to find an embedding vector for each datapoint, i.e., a representation, which can be further fine-tuned for various downstream tasks. In order to safely deploy these models in critical decision-making systems, it is crucial to equip them with a measure of their uncertainty or reliability. However, due to the pairwise nature of training a contrastive model, and the lack of absolute labels on the output (an abstract embedding vector), adapting conventional uncertainty estimation techniques to such models is non-trivial. In this work, we study whether the uncertainty of such a representation can be quantified for a single datapoint in a meaningful way. In other words, we explore if the downstream performance on a given datapoint is predictable, directly from its pre-trained embedding. We show that this goal can be achieved by directly estimating the distribution of the training data in the embedding space and accounting for the local consistency of the representations. Our experiments show that this notion of uncertainty for an embedding vector often strongly correlates with its downstream accuracy.
    NeuForm: Adaptive Overfitting for Neural Shape Editing. (arXiv:2207.08890v1 [cs.CV])
    Neural representations are popular for representing shapes, as they can be learned form sensor data and used for data cleanup, model completion, shape editing, and shape synthesis. Current neural representations can be categorized as either overfitting to a single object instance, or representing a collection of objects. However, neither allows accurate editing of neural scene representations: on the one hand, methods that overfit objects achieve highly accurate reconstructions, but do not generalize to unseen object configurations and thus cannot support editing; on the other hand, methods that represent a family of objects with variations do generalize but produce only approximate reconstructions. We propose NEUFORM to combine the advantages of both overfitted and generalizable representations by adaptively using the one most appropriate for each shape region: the overfitted representation where reliable data is available, and the generalizable representation everywhere else. We achieve this with a carefully designed architecture and an approach that blends the network weights of the two representations, avoiding seams and other artifacts. We demonstrate edits that successfully reconfigure parts of human-designed shapes, such as chairs, tables, and lamps, while preserving semantic integrity and the accuracy of an overfitted shape representation. We compare with two state-of-the-art competitors and demonstrate clear improvements in terms of plausibility and fidelity of the resultant edits.  ( 3 min )
    Gauge-equivariant flow models for sampling in lattice field theories with pseudofermions. (arXiv:2207.08945v1 [hep-lat])
    This work presents gauge-equivariant architectures for flow-based sampling in fermionic lattice field theories using pseudofermions as stochastic estimators for the fermionic determinant. This is the default approach in state-of-the-art lattice field theory calculations, making this development critical to the practical application of flow models to theories such as QCD. Methods by which flow-based sampling approaches can be improved via standard techniques such as even/odd preconditioning and the Hasenbusch factorization are also outlined. Numerical demonstrations in two-dimensional U(1) and SU(3) gauge theories with $N_f=2$ flavors of fermions are provided.  ( 2 min )
    MRCLens: an MRC Dataset Bias Detection Toolkit. (arXiv:2207.08943v1 [cs.CL])
    Many recent neural models have shown remarkable empirical results in Machine Reading Comprehension, but evidence suggests sometimes the models take advantage of dataset biases to predict and fail to generalize on out-of-sample data. While many other approaches have been proposed to address this issue from the computation perspective such as new architectures or training procedures, we believe a method that allows researchers to discover biases, and adjust the data or the models in an earlier stage will be beneficial. Thus, we introduce MRCLens, a toolkit that detects whether biases exist before users train the full model. For the convenience of introducing the toolkit, we also provide a categorization of common biases in MRC.  ( 2 min )
    Capabilities, Limitations and Challenges of Style Transfer with CycleGANs: A Study on Automatic Ring Design Generation. (arXiv:2207.08989v1 [cs.CV])
    Rendering programs have changed the design process completely as they permit to see how the products will look before they are fabricated. However, the rendering process is complicated and takes a significant amount of time, not only in the rendering itself but in the setting of the scene as well. Materials, lights and cameras need to be set in order to get the best quality results. Nevertheless, the optimal output may not be obtained in the first render. This all makes the rendering process a tedious process. Since Goodfellow et al. introduced Generative Adversarial Networks (GANs) in 2014 [1], they have been used to generate computer-assigned synthetic data, from non-existing human faces to medical data analysis or image style transfer. GANs have been used to transfer image textures from one domain to another. However, paired data from both domains was needed. When Zhu et al. introduced the CycleGAN model, the elimination of this expensive constraint permitted transforming one image from one domain into another, without the need for paired data. This work validates the applicability of CycleGANs on style transfer from an initial sketch to a final render in 2D that represents a 3D design, a step that is paramount in every product design process. We inquiry the possibilities of including CycleGANs as part of the design pipeline, more precisely, applied to the rendering of ring designs. Our contribution entails a crucial part of the process as it allows the customer to see the final product before buying. This work sets a basis for future research, showing the possibilities of GANs in design and establishing a starting point for novel applications to approach crafts design.  ( 3 min )
    The m-connecting imset and factorization for ADMG models. (arXiv:2207.08963v1 [stat.ML])
    Directed acyclic graph (DAG) models have become widely studied and applied in statistics and machine learning -- indeed, their simplicity facilitates efficient procedures for learning and inference. Unfortunately, these models are not closed under marginalization, making them poorly equipped to handle systems with latent confounding. Acyclic directed mixed graph (ADMG) models characterize margins of DAG models, making them far better suited to handle such systems. However, ADMG models have not seen wide-spread use due to their complexity and a shortage of statistical tools for their analysis. In this paper, we introduce the m-connecting imset which provides an alternative representation for the independence models induced by ADMGs. Furthermore, we define the m-connecting factorization criterion for ADMG models, characterized by a single equation, and prove its equivalence to the global Markov property. The m-connecting imset and factorization criterion provide two new statistical tools for learning and inference with ADMG models. We demonstrate the usefulness of these tools by formulating and evaluating a consistent scoring criterion with a closed form solution.  ( 2 min )
    RepBNN: towards a precise Binary Neural Network with Enhanced Feature Map via Repeating. (arXiv:2207.09049v1 [cs.CV])
    Binary neural network (BNN) is an extreme quantization version of convolutional neural networks (CNNs) with all features and weights mapped to just 1-bit. Although BNN saves a lot of memory and computation demand to make CNN applicable on edge or mobile devices, BNN suffers the drop of network performance due to the reduced representation capability after binarization. In this paper, we propose a new replaceable and easy-to-use convolution module RepConv, which enhances feature maps through replicating input or output along channel dimension by $\beta$ times without extra cost on the number of parameters and convolutional computation. We also define a set of RepTran rules to use RepConv throughout BNN modules like binary convolution, fully connected layer and batch normalization. Experiments demonstrate that after the RepTran transformation, a set of highly cited BNNs have achieved universally better performance than the original BNN versions. For example, the Top-1 accuracy of Rep-ReCU-ResNet-20, i.e., a RepBconv enhanced ReCU-ResNet-20, reaches 88.97% on CIFAR-10, which is 1.47% higher than that of the original network. And Rep-AdamBNN-ReActNet-A achieves 71.342% Top-1 accuracy on ImageNet, a fresh state-of-the-art result of BNNs. Code and models are available at:https://github.com/imfinethanks/Rep_AdamBNN.  ( 3 min )
    Online Learning with Off-Policy Feedback. (arXiv:2207.08956v1 [cs.LG])
    We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback. In this sequential decision making problem, the learner cannot directly observe its rewards, but instead sees the ones obtained by another unknown policy run in parallel (behavior policy). Instead of a standard exploration-exploitation dilemma, the learner has to face another challenge in this setting: due to limited observations outside of their control, the learner may not be able to estimate the value of each policy equally well. To address this issue, we propose a set of algorithms that guarantee regret bounds that scale with a natural notion of mismatch between any comparator policy and the behavior policy, achieving improved performance against comparators that are well-covered by the observations. We also provide an extension to the setting of adversarial linear contextual bandits, and verify the theoretical guarantees via a set of experiments. Our key algorithmic idea is adapting the notion of pessimistic reward estimators that has been recently popular in the context of off-policy reinforcement learning.  ( 2 min )
    Superficial White Matter Analysis: An Efficient Point-cloud-based Deep Learning Framework with Supervised Contrastive Learning for Consistent Tractography Parcellation across Populations and dMRI Acquisitions. (arXiv:2207.08975v1 [eess.IV])
    Diffusion MRI tractography is an advanced imaging technique that enables in vivo mapping of the brain's white matter connections. White matter parcellation classifies tractography streamlines into clusters or anatomically meaningful tracts. It enables quantification and visualization of whole-brain tractography. Currently, most parcellation methods focus on the deep white matter (DWM), whereas fewer methods address the superficial white matter (SWM) due to its complexity. We propose a novel two-stage deep-learning-based framework, Superficial White Matter Analysis (SupWMA), that performs an efficient and consistent parcellation of 198 SWM clusters from whole-brain tractography. A point-cloud-based network is adapted to our SWM parcellation task, and supervised contrastive learning enables more discriminative representations between plausible streamlines and outliers for SWM. We train our model on a large-scale tractography dataset including streamline samples from labeled SWM clusters and anatomically implausible streamline samples, and we perform testing on six independently acquired datasets of different ages and health conditions (including neonates and patients with space-occupying brain tumors). Compared to several state-of-the-art methods, SupWMA obtains highly consistent and accurate SWM parcellation results on all datasets, showing good generalization across the lifespan in health and disease. In addition, the computational speed of SupWMA is much faster than other methods.
    Assaying Out-Of-Distribution Generalization in Transfer Learning. (arXiv:2207.09239v1 [cs.LG])
    Since out-of-distribution generalization is a generally ill-posed problem, various proxy targets (e.g., calibration, adversarial robustness, algorithmic corruptions, invariance across shifts) were studied across different research programs resulting in different recommendations. While sharing the same aspirational goal, these approaches have never been tested under the same experimental conditions on real data. In this paper, we take a unified view of previous work, highlighting message discrepancies that we address empirically, and providing recommendations on how to measure the robustness of a model and how to improve it. To this end, we collect 172 publicly available dataset pairs for training and out-of-distribution evaluation of accuracy, calibration error, adversarial attacks, environment invariance, and synthetic corruptions. We fine-tune over 31k networks, from nine different architectures in the many- and few-shot setting. Our findings confirm that in- and out-of-distribution accuracies tend to increase jointly, but show that their relation is largely dataset-dependent, and in general more nuanced and more complex than posited by previous, smaller scale studies.
    Learning Action Translator for Meta Reinforcement Learning on Sparse-Reward Tasks. (arXiv:2207.09071v1 [cs.LG])
    Meta reinforcement learning (meta-RL) aims to learn a policy solving a set of training tasks simultaneously and quickly adapting to new tasks. It requires massive amounts of data drawn from training tasks to infer the common structure shared among tasks. Without heavy reward engineering, the sparse rewards in long-horizon tasks exacerbate the problem of sample efficiency in meta-RL. Another challenge in meta-RL is the discrepancy of difficulty level among tasks, which might cause one easy task dominating learning of the shared policy and thus preclude policy adaptation to new tasks. This work introduces a novel objective function to learn an action translator among training tasks. We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy and our objective function (approximately) upper bounds the value difference. We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training. Our approach empirically improves the sample efficiency and performance of meta-RL algorithms on sparse-reward tasks.
    Calibrated ensembles can mitigate accuracy tradeoffs under distribution shift. (arXiv:2207.08977v1 [cs.LG])
    We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accuracy is at odds with in-distribution (ID) accuracy: a robust classifier obtained via specialized techniques such as removing spurious features often has better OOD but worse ID accuracy compared to a standard classifier trained via ERM. In this paper, we find that ID-calibrated ensembles -- where we simply ensemble the standard and robust models after calibrating on only ID data -- outperforms prior state-of-the-art (based on self-training) on both ID and OOD accuracy. On eleven natural distribution shift datasets, ID-calibrated ensembles obtain the best of both worlds: strong ID accuracy and OOD accuracy. We analyze this method in stylized settings, and identify two important conditions for ensembles to perform well both ID and OOD: (1) we need to calibrate the standard and robust models (on ID data, because OOD data is unavailable), (2) OOD has no anticorrelated spurious features.  ( 2 min )
    Learning multi-robot coordination from demonstrations. (arXiv:2207.08892v1 [cs.RO])
    This paper develops a Distributed Differentiable Dynamic Game (DDDG) framework, which enables learning multi-robot coordination from demonstrations. We represent multi-robot coordination as a dynamic game, where the behavior of a robot is dictated by its own dynamics and objective that also depends on others' behavior. The coordination thus can be adapted by tuning the objective and dynamics of each robot. The proposed DDDG enables each robot to automatically tune its individual dynamics and objectives in a distributed manner by minimizing the mismatch between its trajectory and demonstrations. This process requires a new distributed design of the forward-pass, where all robots collaboratively seek Nash equilibrium behavior, and a backward-pass, where gradients are propagated via the communication graph. We test the DDDG in simulation with a team of quadrotors given different task configurations. The results demonstrate the capability of DDDG for learning multi-robot coordination from demonstrations
    SCARA: Scalable Graph Neural Networks with Feature-Oriented Optimization. (arXiv:2207.09179v1 [cs.LG])
    Recent advances in data processing have stimulated the demand for learning graphs of very large scales. Graph Neural Networks (GNNs), being an emerging and powerful approach in solving graph learning tasks, are known to be difficult to scale up. Most scalable models apply node-based techniques in simplifying the expensive graph message-passing propagation procedure of GNN. However, we find such acceleration insufficient when applied to million- or even billion-scale graphs. In this work, we propose SCARA, a scalable GNN with feature-oriented optimization for graph computation. SCARA efficiently computes graph embedding from node features, and further selects and reuses feature computation results to reduce overhead. Theoretical analysis indicates that our model achieves sub-linear time complexity with a guaranteed precision in propagation process as well as GNN training and inference. We conduct extensive experiments on various datasets to evaluate the efficacy and efficiency of SCARA. Performance comparison with baselines shows that SCARA can reach up to 100x graph propagation acceleration than current state-of-the-art methods with fast convergence and comparable accuracy. Most notably, it is efficient to process precomputation on the largest available billion-scale GNN dataset Papers100M (111M nodes, 1.6B edges) in 100 seconds.  ( 2 min )
    Quantum Feature Extraction for THz Multi-Layer Imaging. (arXiv:2207.09285v1 [quant-ph])
    A learning-based THz multi-layer imaging has been recently used for contactless three-dimensional (3D) positioning and encoding. We show a proof-of-concept demonstration of an emerging quantum machine learning (QML) framework to deal with depth variation, shadow effect, and double-sided content recognition, through an experimental validation.  ( 2 min )
    Utterance Weighted Multi-Dilation Temporal Convolutional Networks for Monaural Speech Dereverberation. (arXiv:2205.08455v2 [cs.SD] UPDATED)
    Speech dereverberation is an important stage in many speech technology applications. Recent work in this area has been dominated by deep neural network models. Temporal convolutional networks (TCNs) are deep learning models that have been proposed for sequence modelling in the task of dereverberating speech. In this work a weighted multi-dilation depthwise-separable convolution is proposed to replace standard depthwise-separable convolutions in TCN models. This proposed convolution enables the TCN to dynamically focus on more or less local information in its receptive field at each convolutional block in the network. It is shown that this weighted multi-dilation temporal convolutional network (WD-TCN) consistently outperforms the TCN across various model configurations and using the WD-TCN model is a more parameter efficient method to improve the performance of the model than increasing the number of convolutional blocks. The best performance improvement over the baseline TCN is 0.55 dB scale-invariant signal-to-distortion ratio (SISDR) and the best performing WD-TCN model attains 12.26 dB SISDR on the WHAMR dataset.
    Quantum Algorithms and Lower Bounds for Linear Regression with Norm Constraints. (arXiv:2110.13086v2 [quant-ph] UPDATED)
    Lasso and Ridge are important minimization problems in machine learning and statistics. They are versions of linear regression with squared loss where the vector $\theta\in\mathbb{R}^d$ of coefficients is constrained in either $\ell_1$-norm (for Lasso) or in $\ell_2$-norm (for Ridge). We study the complexity of quantum algorithms for finding $\varepsilon$-minimizers for these minimization problems. We show that for Lasso we can get a quadratic quantum speedup in terms of $d$ by speeding up the cost-per-iteration of the Frank-Wolfe algorithm, while for Ridge the best quantum algorithms are linear in $d$, as are the best classical algorithms. As a byproduct of our quantum lower bound for Lasso, we also prove the first classical lower bound for Lasso that is tight up to polylog-factors.  ( 2 min )
    Unified 2D and 3D Pre-Training of Molecular Representations. (arXiv:2207.08806v1 [cs.LG])
    Molecular representation learning has attracted much attention recently. A molecule can be viewed as a 2D graph with nodes/atoms connected by edges/bonds, and can also be represented by a 3D conformation with 3-dimensional coordinates of all atoms. We note that most previous work handles 2D and 3D information separately, while jointly leveraging these two sources may foster a more informative representation. In this work, we explore this appealing idea and propose a new representation learning method based on a unified 2D and 3D pre-training. Atom coordinates and interatomic distances are encoded and then fused with atomic representations through graph neural networks. The model is pre-trained on three tasks: reconstruction of masked atoms and coordinates, 3D conformation generation conditioned on 2D graph, and 2D graph generation conditioned on 3D conformation. We evaluate our method on 11 downstream molecular property prediction tasks: 7 with 2D information only and 4 with both 2D and 3D information. Our method achieves state-of-the-art results on 10 tasks, and the average improvement on 2D-only tasks is 8.3%. Our method also achieves significant improvement on two 3D conformation generation tasks.  ( 2 min )
    Implicit Regularization with Polynomial Growth in Deep Tensor Factorization. (arXiv:2207.08942v1 [cs.LG])
    We study the implicit regularization effects of deep learning in tensor factorization. While implicit regularization in deep matrix and 'shallow' tensor factorization via linear and certain type of non-linear neural networks promotes low-rank solutions with at most quadratic growth, we show that its effect in deep tensor factorization grows polynomially with the depth of the network. This provides a remarkably faithful description of the observed experimental behaviour. Using numerical experiments, we demonstrate the benefits of this implicit regularization in yielding a more accurate estimation and better convergence properties.  ( 2 min )
    On the Study of Sample Complexity for Polynomial Neural Networks. (arXiv:2207.08896v1 [cs.LG])
    As a general type of machine learning approach, artificial neural networks have established state-of-art benchmarks in many pattern recognition and data analysis tasks. Among various kinds of neural networks architectures, polynomial neural networks (PNNs) have been recently shown to be analyzable by spectrum analysis via neural tangent kernel, and particularly effective at image generation and face recognition. However, acquiring theoretical insight into the computation and sample complexity of PNNs remains an open problem. In this paper, we extend the analysis in previous literature to PNNs and obtain novel results on sample complexity of PNNs, which provides some insights in explaining the generalization ability of PNNs.  ( 2 min )
    Learning Sparsity-Promoting Regularizers using Bilevel Optimization. (arXiv:2207.08939v1 [cs.LG])
    We present a method for supervised learning of sparsity-promoting regularizers for denoising signals and images. Sparsity-promoting regularization is a key ingredient in solving modern signal reconstruction problems; however, the operators underlying these regularizers are usually either designed by hand or learned from data in an unsupervised way. The recent success of supervised learning (mainly convolutional neural networks) in solving image reconstruction problems suggests that it could be a fruitful approach to designing regularizers. Towards this end, we propose to denoise signals using a variational formulation with a parametric, sparsity-promoting regularizer, where the parameters of the regularizer are learned to minimize the mean squared error of reconstructions on a training set of ground truth image and measurement pairs. Training involves solving a challenging bilievel optimization problem; we derive an expression for the gradient of the training loss using the closed-form solution of the denoising problem and provide an accompanying gradient descent algorithm to minimize it. Our experiments with structured 1D signals and natural images show that the proposed method can learn an operator that outperforms well-known regularizers (total variation, DCT-sparsity, and unsupervised dictionary learning) and collaborative filtering for denoising. While the approach we present is specific to denoising, we believe that it could be adapted to the larger class of inverse problems with linear measurement models, giving it applicability in a wide range of signal reconstruction settings.  ( 3 min )
    I2I: Image to Icosahedral Projection for $\mathrm{SO}(3)$ Object Reasoning from Single-View Images. (arXiv:2207.08925v1 [cs.CV])
    Reasoning about 3D objects based on 2D images is challenging due to large variations in appearance caused by viewing the object from different orientations. Ideally, our model would be invariant or equivariant to changes in object pose. Unfortunately, this is typically not possible with 2D image input because we do not have an a priori model of how the image would change under out-of-plane object rotations. The only $\mathrm{SO}(3)$-equivariant models that currently exist require point cloud input rather than 2D images. In this paper, we propose a novel model architecture based on icosahedral group convolution that reasons in $\mathrm{SO(3)}$ by projecting the input image onto an icosahedron. As a result of this projection, the model is approximately equivariant to rotation in $\mathrm{SO}(3)$. We apply this model to an object pose estimation task and find that it outperforms reasonable baselines.  ( 2 min )
    Romanus: Robust Task Offloading in Modular Multi-Sensor Autonomous Driving Systems. (arXiv:2207.08865v1 [cs.DC])
    Due to the high performance and safety requirements of self-driving applications, the complexity of modern autonomous driving systems (ADS) has been growing, instigating the need for more sophisticated hardware which could add to the energy footprint of the ADS platform. Addressing this, edge computing is poised to encompass self-driving applications, enabling the compute-intensive autonomy-related tasks to be offloaded for processing at compute-capable edge servers. Nonetheless, the intricate hardware architecture of ADS platforms, in addition to the stringent robustness demands, set forth complications for task offloading which are unique to autonomous driving. Hence, we present $ROMANUS$, a methodology for robust and efficient task offloading for modular ADS platforms with multi-sensor processing pipelines. Our methodology entails two phases: (i) the introduction of efficient offloading points along the execution path of the involved deep learning models, and (ii) the implementation of a runtime solution based on Deep Reinforcement Learning to adapt the operating mode according to variations in the perceived road scene complexity, network connectivity, and server load. Experiments on the object detection use case demonstrated that our approach is 14.99% more energy-efficient than pure local execution while achieving a 77.06% reduction in risky behavior from a robust-agnostic offloading baseline.  ( 3 min )
    SeLoC-ML: Semantic Low-Code Engineering for Machine Learning Applications in Industrial IoT. (arXiv:2207.08818v1 [cs.SE])
    Internet of Things (IoT) is transforming the industry by bridging the gap between Information Technology (IT) and Operational Technology (OT). Machines are being integrated with connected sensors and managed by intelligent analytics applications, accelerating digital transformation and business operations. Bringing Machine Learning (ML) to industrial devices is an advancement aiming to promote the convergence of IT and OT. However, developing an ML application in industrial IoT (IIoT) presents various challenges, including hardware heterogeneity, non-standardized representations of ML models, device and ML model compatibility issues, and slow application development. Successful deployment in this area requires a deep understanding of hardware, algorithms, software tools, and applications. Therefore, this paper presents a framework called Semantic Low-Code Engineering for ML Applications (SeLoC-ML), built on a low-code platform to support the rapid development of ML applications in IIoT by leveraging Semantic Web technologies. SeLoC-ML enables non-experts to easily model, discover, reuse, and matchmake ML models and devices at scale. The project code can be automatically generated for deployment on hardware based on the matching results. Developers can benefit from semantic application templates, called recipes, to fast prototype end-user applications. The evaluations confirm an engineering effort reduction by a factor of at least three compared to traditional approaches on an industrial ML classification case study, showing the efficiency and usefulness of SeLoC-ML. We share the code and welcome any contributions.  ( 3 min )
    Consistent Polyhedral Surrogates for Top-$k$ Classification and Variants. (arXiv:2207.08873v1 [cs.LG])
    Top-$k$ classification is a generalization of multiclass classification used widely in information retrieval, image classification, and other extreme classification settings. Several hinge-like (piecewise-linear) surrogates have been proposed for the problem, yet all are either non-convex or inconsistent. For the proposed hinge-like surrogates that are convex (i.e., polyhedral), we apply the recent embedding framework of Finocchiaro et al. (2019; 2022) to determine the prediction problem for which the surrogate is consistent. These problems can all be interpreted as variants of top-$k$ classification, which may be better aligned with some applications. We leverage this analysis to derive constraints on the conditional label distributions under which these proposed surrogates become consistent for top-$k$. It has been further suggested that every convex hinge-like surrogate must be inconsistent for top-$k$. Yet, we use the same embedding framework to give the first consistent polyhedral surrogate for this problem.  ( 2 min )
    Is Integer Arithmetic Enough for Deep Learning Training?. (arXiv:2207.08822v1 [cs.LG])
    The ever-increasing computational complexity of deep learning models makes their training and deployment difficult on various cloud and edge platforms. Replacing floating-point arithmetic with low-bit integer arithmetic is a promising approach to save energy, memory footprint, and latency of deep learning models. As such, quantization has attracted the attention of researchers in recent years. However, using integer numbers to form a fully functional integer training pipeline including forward pass, back-propagation, and stochastic gradient descent is not studied in detail. Our empirical and mathematical results reveal that integer arithmetic is enough to train deep learning models. Unlike recent proposals, instead of quantization, we directly switch the number representation of computations. Our novel training method forms a fully integer training pipeline that does not change the trajectory of the loss and accuracy compared to floating-point, nor does it need any special hyper-parameter tuning, distribution adjustment, or gradient clipping. Our experimental results show that our proposed method is effective in a wide variety of tasks such as classification (including vision transformers), object detection, and semantic segmentation.  ( 2 min )
    Using attention methods to predict judicial outcomes. (arXiv:2207.08823v1 [cs.LG])
    Legal Judgment Prediction is one of the most acclaimed fields for the combined area of NLP, AI, and Law. By legal prediction we mean an intelligent systems capable to predict specific judicial characteristics, such as judicial outcome, a judicial class, predict an specific case. In this research, we have used AI classifiers to predict judicial outcomes in the Brazilian legal system. For this purpose, we developed a text crawler to extract data from the official Brazilian electronic legal systems. These texts formed a dataset of second-degree murder and active corruption cases. We applied different classifiers, such as Support Vector Machines and Neural Networks, to predict judicial outcomes by analyzing textual features from the dataset. Our research showed that Regression Trees, Gated Recurring Units and Hierarchical Attention Networks presented higher metrics for different subsets. As a final goal, we explored the weights of one of the algorithms, the Hierarchical Attention Networks, to find a sample of the most important words used to absolve or convict defendants.  ( 2 min )
    Prior Knowledge Guided Unsupervised Domain Adaptation. (arXiv:2207.08877v1 [cs.LG])
    The waive of labels in the target domain makes Unsupervised Domain Adaptation (UDA) an attractive technique in many real-world applications, though it also brings great challenges as model adaptation becomes harder without labeled target data. In this paper, we address this issue by seeking compensation from target domain prior knowledge, which is often (partially) available in practice, e.g., from human expertise. This leads to a novel yet practical setting where in addition to the training data, some prior knowledge about the target class distribution are available. We term the setting as Knowledge-guided Unsupervised Domain Adaptation (KUDA). In particular, we consider two specific types of prior knowledge about the class distribution in the target domain: Unary Bound that describes the lower and upper bounds of individual class probabilities, and Binary Relationship that describes the relations between two class probabilities. We propose a general rectification module that uses such prior knowledge to refine model generated pseudo labels. The module is formulated as a Zero-One Programming problem derived from the prior knowledge and a smooth regularizer. It can be easily plugged into self-training based UDA methods, and we combine it with two state-of-the-art methods, SHOT and DINE. Empirical results on four benchmarks confirm that the rectification module clearly improves the quality of pseudo labels, which in turn benefits the self-training stage. With the guidance from prior knowledge, the performances of both methods are substantially boosted. We expect our work to inspire further investigations in integrating prior knowledge in UDA. Code is available at https://github.com/tsun/KUDA.  ( 3 min )
    Why do tree-based models still outperform deep learning on tabular data?. (arXiv:2207.08815v1 [cs.LG])
    While deep learning has enabled tremendous progress on text and image datasets, its superiority on tabular data is not clear. We contribute extensive benchmarks of standard and novel deep learning methods as well as tree-based models such as XGBoost and Random Forests, across a large number of datasets and hyperparameter combinations. We define a standard set of 45 datasets from varied domains with clear characteristics of tabular data and a benchmarking methodology accounting for both fitting models and finding good hyperparameters. Results show that tree-based models remain state-of-the-art on medium-sized data ($\sim$10K samples) even without accounting for their superior speed. To understand this gap, we conduct an empirical investigation into the differing inductive biases of tree-based models and Neural Networks (NNs). This leads to a series of challenges which should guide researchers aiming to build tabular-specific NNs: 1. be robust to uninformative features, 2. preserve the orientation of the data, and 3. be able to easily learn irregular functions. To stimulate research on tabular architectures, we contribute a standard benchmark and raw data for baselines: every point of a 20 000 compute hours hyperparameter search for each learner.  ( 2 min )
    Accelerating Deep Learning Model Inference on Arm CPUs with Ultra-Low Bit Quantization and Runtime. (arXiv:2207.08820v1 [cs.LG])
    Deep Learning has been one of the most disruptive technological advancements in recent times. The high performance of deep learning models comes at the expense of high computational, storage and power requirements. Sensing the immediate need for accelerating and compressing these models to improve on-device performance, we introduce Deeplite Neutrino for production-ready optimization of the models and Deeplite Runtime for deployment of ultra-low bit quantized models on Arm-based platforms. We implement low-level quantization kernels for Armv7 and Armv8 architectures enabling deployment on the vast array of 32-bit and 64-bit Arm-based devices. With efficient implementations using vectorization, parallelization, and tiling, we realize speedups of up to 2x and 2.2x compared to TensorFlow Lite with XNNPACK backend on classification and detection models, respectively. We also achieve significant speedups of up to 5x and 3.2x compared to ONNX Runtime for classification and detection models, respectively.  ( 2 min )
    Contrastive Environmental Sound Representation Learning. (arXiv:2207.08825v1 [cs.SD])
    Machine hearing of the environmental sound is one of the important issues in the audio recognition domain. It gives the machine the ability to discriminate between the different input sounds that guides its decision making. In this work we exploit the self-supervised contrastive technique and a shallow 1D CNN to extract the distinctive audio features (audio representations) without using any explicit annotations.We generate representations of a given audio using both its raw audio waveform and spectrogram and evaluate if the proposed learner is agnostic to the type of audio input. We further use canonical correlation analysis (CCA) to fuse representations from the two types of input of a given audio and demonstrate that the fused global feature results in robust representation of the audio signal as compared to the individual representations. The evaluation of the proposed technique is done on both ESC-50 and UrbanSound8K. The results show that the proposed technique is able to extract most features of the environmental audio and gives an improvement of 12.8% and 0.9% on the ESC-50 and UrbanSound8K datasets respectively.  ( 2 min )
    The Multiple Subnetwork Hypothesis: Enabling Multidomain Learning by Isolating Task-Specific Subnetworks in Feedforward Neural Networks. (arXiv:2207.08821v1 [cs.LG])
    Neural networks have seen an explosion of usage and research in the past decade, particularly within the domains of computer vision and natural language processing. However, only recently have advancements in neural networks yielded performance improvements beyond narrow applications and translated to expanded multitask models capable of generalizing across multiple data types and modalities. Simultaneously, it has been shown that neural networks are overparameterized to a high degree, and pruning techniques have proved capable of significantly reducing the number of active weights within the network while largely preserving performance. In this work, we identify a methodology and network representational structure which allows a pruned network to employ previously unused weights to learn subsequent tasks. We employ these methodologies on well-known benchmarking datasets for testing purposes and show that networks trained using our approaches are able to learn multiple tasks, which may be related or unrelated, in parallel or in sequence without sacrificing performance on any task or exhibiting catastrophic forgetting.  ( 2 min )
    Research Trends and Applications of Data Augmentation Algorithms. (arXiv:2207.08817v1 [cs.LG])
    In the Machine Learning research community, there is a consensus regarding the relationship between model complexity and the required amount of data and computation power. In real world applications, these computational requirements are not always available, motivating research on regularization methods. In addition, current and past research have shown that simpler classification algorithms can reach state-of-the-art performance on computer vision tasks given a robust method to artificially augment the training dataset. Because of this, data augmentation techniques became a popular research topic in recent years. However, existing data augmentation methods are generally less transferable than other regularization methods. In this paper we identify the main areas of application of data augmentation algorithms, the types of algorithms used, significant research trends, their progression over time and research gaps in data augmentation literature. To do this, the related literature was collected through the Scopus database. Its analysis was done following network science, text mining and exploratory analysis approaches. We expect readers to understand the potential of data augmentation, as well as identify future research directions and open questions within data augmentation research.  ( 2 min )
    Fusion of Physiological and Behavioural Signals on SPD Manifolds with Application to Stress and Pain Detection. (arXiv:2207.08811v1 [cs.LG])
    Existing multimodal stress/pain recognition approaches generally extract features from different modalities independently and thus ignore cross-modality correlations. This paper proposes a novel geometric framework for multimodal stress/pain detection utilizing Symmetric Positive Definite (SPD) matrices as a representation that incorporates the correlation relationship of physiological and behavioural signals from covariance and cross-covariance. Considering the non-linearity of the Riemannian manifold of SPD matrices, well-known machine learning techniques are not suited to classify these matrices. Therefore, a tangent space mapping method is adopted to map the derived SPD matrix sequences to the vector sequences in the tangent space where the LSTM-based network can be applied for classification. The proposed framework has been evaluated on two public multimodal datasets, achieving both the state-of-the-art results for stress and pain detection tasks.  ( 2 min )
    Discovering Behavioral Predispositions in Data to Improve Human Activity Recognition. (arXiv:2207.08816v1 [cs.LG])
    The automatic, sensor-based assessment of challenging behavior of persons with dementia is an important task to support the selection of interventions. However, predicting behaviors like apathy and agitation is challenging due to the large inter- and intra-patient variability. Goal of this paper is to improve the recognition performance by making use of the observation that patients tend to show specific behaviors at certain times of the day or week. We propose to identify such segments of similar behavior via clustering the distributions of annotations of the time segments. All time segments within a cluster then consist of similar behaviors and thus indicate a behavioral predisposition (BPD). We utilize BPDs by training a classifier for each BPD. Empirically, we demonstrate that when the BPD per time segment is known, activity recognition performance can be substantially improved.  ( 2 min )
    3D Equivariant Molecular Graph Pretraining. (arXiv:2207.08824v1 [q-bio.QM])
    Pretraining molecular representation models without labels is fundamental to various applications. Conventional methods mainly process 2D molecular graphs and focus solely on 2D tasks, making their pretrained models incapable of characterizing 3D geometry and thus defective for downstream 3D tasks. In this work, we tackle 3D molecular pretraining in a complete and novel sense. In particular, we first propose to adopt an equivariant energy-based model as the backbone for pretraining, which enjoys the merit of fulfilling the symmetry of 3D space. Then we develop a node-level pretraining loss for force prediction, where we further exploit the Riemann-Gaussian distribution to ensure the loss to be E(3)-invariant, enabling more robustness. Moreover, a graph-level noise scale prediction task is also leveraged to further promote the eventual performance. We evaluate our model pretrained from a large-scale 3D dataset GEOM-QM9 on two challenging 3D benchmarks: MD17 and QM9. The experimental results support the better efficacy of our method against current state-of-the-art pretraining approaches, and verify the validity of our design for each proposed component.  ( 2 min )
    Audio Input Generates Continuous Frames to Synthesize Facial Video Using Generative Adiversarial Networks. (arXiv:2207.08813v1 [cs.SD])
    This paper presents a simple method for speech videos generation based on audio: given a piece of audio, we can generate a video of the target face speaking this audio. We propose Generative Adversarial Networks (GAN) with cut speech audio input as condition and use Convolutional Gate Recurrent Unit (GRU) in generator and discriminator. Our model is trained by exploiting the short audio and the frames in this duration. For training, we cut the audio and extract the face in the corresponding frames. We designed a simple encoder and compare the generated frames using GAN with and without GRU. We use GRU for temporally coherent frames and the results show that short audio can produce relatively realistic output results.  ( 2 min )
  • Open

    Federated Learning Aggregation: New Robust Algorithms with Guarantees. (arXiv:2205.10864v2 [stat.ML] UPDATED)
    Federated Learning has been recently proposed for distributed model training at the edge. The principle of this approach is to aggregate models learned on distributed clients to obtain a new more general "average" model (FedAvg). The resulting model is then redistributed to clients for further training. To date, the most popular federated learning algorithm uses coordinate-wise averaging of the model parameters for aggregation. In this paper, we carry out a complete general mathematical convergence analysis to evaluate aggregation strategies in a federated learning framework. From this, we derive novel aggregation algorithms which are able to modify their model architecture by differentiating client contributions according to the value of their losses. Moreover, we go beyond the assumptions introduced in theory, by evaluating the performance of these strategies and by comparing them with the one of FedAvg in classification tasks in both the IID and the Non-IID framework without additional hypothesis.  ( 2 min )
    Unsupervised Ground Metric Learning using Wasserstein Singular Vectors. (arXiv:2102.06278v3 [stat.ML] UPDATED)
    Defining meaningful distances between samples in a dataset is a fundamental problem in machine learning. Optimal Transport (OT) lifts a distance between features (the "ground metric") to a geometrically meaningful distance between samples. However, there is usually no straightforward choice of ground metric. Supervised ground metric learning approaches exist but require labeled data. In absence of labels, only ad-hoc ground metrics remain. Unsupervised ground metric learning is thus a fundamental problem to enable data-driven applications of OT. In this paper, we propose for the first time a canonical answer by simultaneously computing an OT distance between samples and between features of a dataset. These distance matrices emerge naturally as positive singular vectors of the function mapping ground metrics to OT distances. We provide criteria to ensure the existence and uniqueness of these singular vectors. We then introduce scalable computational methods to approximate them in high-dimensional settings, using stochastic approximation and entropic regularization. Finally, we showcase Wasserstein Singular Vectors on a single-cell RNA-sequencing dataset.  ( 2 min )
    Deep learning generates custom-made logistic regression models for explaining how breast cancer subtypes are classified. (arXiv:2001.06988v2 [cs.LG] UPDATED)
    Differentiating the intrinsic subtypes of breast cancer is crucial for deciding the best treatment strategy. Deep learning can predict the subtypes from genetic information more accurately than conventional statistical methods, but to date, deep learning has not been directly utilized to examine which genes are associated with which subtypes. To clarify the mechanisms embedded in the intrinsic subtypes, we developed an explainable deep learning model called a point-wise linear (PWL) model that generates a custom-made logistic regression for each patient. Logistic regression, which is familiar to both physicians and medical informatics researchers, allows us to analyze the importance of the feature variables, and the PWL model harnesses these practical abilities of logistic regression. In this study, we show that analyzing breast cancer subtypes is clinically beneficial for patients and one of the best ways to validate the capability of the PWL model. First, we trained the PWL model with RNA-seq data to predict PAM50 intrinsic subtypes and applied it to the 41/50 genes of PAM50 through the subtype prediction task. Second, we developed a deep enrichment analysis method to reveal the relationships between the PAM50 subtypes and the copy numbers of breast cancer. Our findings showed that the PWL model utilized genes relevant to the cell cycle-related pathways. These preliminary successes in breast cancer subtype analysis demonstrate the potential of our analysis strategy to clarify the mechanisms underlying breast cancer and improve overall clinical outcomes.
    Treatment Effect Risk: Bounds and Inference. (arXiv:2201.05893v2 [stat.ME] UPDATED)
    Since the average treatment effect (ATE) measures the change in social welfare, even if positive, there is a risk of negative effect on, say, some 10% of the population. Assessing such risk is difficult, however, because any one individual treatment effect (ITE) is never observed, so the 10% worst-affected cannot be identified, while distributional treatment effects only compare the first deciles within each treatment group, which does not correspond to any 10%-subpopulation. In this paper we consider how to nonetheless assess this important risk measure, formalized as the conditional value at risk (CVaR) of the ITE-distribution. We leverage the availability of pre-treatment covariates and characterize the tightest-possible upper and lower bounds on ITE-CVaR given by the covariate-conditional average treatment effect (CATE) function. We then proceed to study how to estimate these bounds efficiently from data and construct confidence intervals. This is challenging even in randomized experiments as it requires understanding the distribution of the unknown CATE function, which can be very complex if we use rich covariates so as to best control for heterogeneity. We develop a debiasing method that overcomes this and prove it enjoys favorable statistical properties even when CATE and other nuisances are estimated by black-box machine learning or even inconsistently. Studying a hypothetical change to French job-search counseling services, our bounds and inference demonstrate a small social benefit entails a negative impact on a substantial subpopulation.
    GNNRank: Learning Global Rankings from Pairwise Comparisons via Directed Graph Neural Networks. (arXiv:2202.00211v3 [cs.LG] UPDATED)
    Recovering global rankings from pairwise comparisons has wide applications from time synchronization to sports team ranking. Pairwise comparisons corresponding to matches in a competition can be construed as edges in a directed graph (digraph), whose nodes represent e.g. competitors with an unknown rank. In this paper, we introduce neural networks into the ranking recovery problem by proposing the so-called GNNRank, a trainable GNN-based framework with digraph embedding. Moreover, new objectives are devised to encode ranking upsets/violations. The framework involves a ranking score estimation approach, and adds an inductive bias by unfolding the Fiedler vector computation of the graph constructed from a learnable similarity matrix. Experimental results on extensive data sets show that our methods attain competitive and often superior performance against baselines, as well as showing promising transfer ability. Codes and preprocessed data are at: \url{https://github.com/SherylHYX/GNNRank}.
    Uncertainty in Contrastive Learning: On the Predictability of Downstream Performance. (arXiv:2207.09336v1 [cs.LG])
    The superior performance of some of today's state-of-the-art deep learning models is to some extent owed to extensive (self-)supervised contrastive pretraining on large-scale datasets. In contrastive learning, the network is presented with pairs of positive (similar) and negative (dissimilar) datapoints and is trained to find an embedding vector for each datapoint, i.e., a representation, which can be further fine-tuned for various downstream tasks. In order to safely deploy these models in critical decision-making systems, it is crucial to equip them with a measure of their uncertainty or reliability. However, due to the pairwise nature of training a contrastive model, and the lack of absolute labels on the output (an abstract embedding vector), adapting conventional uncertainty estimation techniques to such models is non-trivial. In this work, we study whether the uncertainty of such a representation can be quantified for a single datapoint in a meaningful way. In other words, we explore if the downstream performance on a given datapoint is predictable, directly from its pre-trained embedding. We show that this goal can be achieved by directly estimating the distribution of the training data in the embedding space and accounting for the local consistency of the representations. Our experiments show that this notion of uncertainty for an embedding vector often strongly correlates with its downstream accuracy.
    Do Not Sleep on Linear Models: Simple and Interpretable Techniques Outperform Deep Learning for Sleep Scoring. (arXiv:2207.07753v2 [stat.ML] UPDATED)
    Over the last few years, research in automatic sleep scoring has mainly focused on developing increasingly complex deep learning architectures. However, recently these approaches achieved only marginal improvements, often at the expense of requiring more data and more expensive training procedures. Despite all these efforts and their satisfactory performance, automatic sleep staging solutions are not widely adopted in a clinical context yet. We argue that most deep learning solutions for sleep scoring are limited in their real-world applicability as they are hard to train, deploy, and reproduce. Moreover, these solutions lack interpretability and transparency, which are often key to increase adoption rates. In this work, we revisit the problem of sleep stage classification using classical machine learning. Results show that state-of-the-art performance can be achieved with a conventional machine learning pipeline consisting of preprocessing, feature extraction, and a simple machine learning model. In particular, we analyze the performance of a linear model and a non-linear (gradient boosting) model. Our approach surpasses state-of-the-art (that uses the same data) on two public datasets: Sleep-EDF SC-20 (MF1 0.810) and Sleep-EDF ST (MF1 0.795), while achieving competitive results on Sleep-EDF SC-78 (MF1 0.775) and MASS SS3 (MF1 0.817). We show that, for the sleep stage scoring task, the expressiveness of an engineered feature vector is on par with the internally learned representations of deep learning models. This observation opens the door to clinical adoption, as a representative feature vector allows to leverage both the interpretability and successful track record of traditional machine learning models.
    The Implicit Bias of Gradient Descent on Separable Data. (arXiv:1710.10345v5 [stat.ML] UPDATED)
    We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization n more complex models and with other optimization methods.
    A Classification of $G$-invariant Shallow Neural Networks. (arXiv:2205.09219v3 [cs.LG] UPDATED)
    When trying to fit a deep neural network (DNN) to a $G$-invariant target function with respect to a group $G$, it only makes sense to constrain the DNN to be $G$-invariant as well. However, there can be many different ways to do this, thus raising the problem of "$G$-invariant neural architecture design": What is the optimal $G$-invariant architecture for a given problem? Before we can consider the optimization problem itself, we must understand the search space, the architectures in it, and how they relate to one another. In this paper, we take a first step towards this goal; we prove a theorem that gives a classification of all $G$-invariant single-hidden-layer or "shallow" neural network ($G$-SNN) architectures with ReLU activation for any finite orthogonal group $G$. The proof is based on a correspondence of every $G$-SNN to a signed permutation representation of $G$ acting on the hidden neurons. The classification is equivalently given in terms of the first cohomology classes of $G$, thus admitting a topological interpretation. Based on a code implementation, we enumerate the $G$-SNN architectures for some example groups $G$ and visualize their structure. We draw the network morphisms between the enumerated architectures that can be leveraged during neural architecture search (NAS). Finally, we prove that architectures corresponding to inequivalent cohomology classes in a given cohomology ring coincide in function space only when their weight matrices are zero, and we discuss the implications of this in the context of NAS.
    The m-connecting imset and factorization for ADMG models. (arXiv:2207.08963v1 [stat.ML])
    Directed acyclic graph (DAG) models have become widely studied and applied in statistics and machine learning -- indeed, their simplicity facilitates efficient procedures for learning and inference. Unfortunately, these models are not closed under marginalization, making them poorly equipped to handle systems with latent confounding. Acyclic directed mixed graph (ADMG) models characterize margins of DAG models, making them far better suited to handle such systems. However, ADMG models have not seen wide-spread use due to their complexity and a shortage of statistical tools for their analysis. In this paper, we introduce the m-connecting imset which provides an alternative representation for the independence models induced by ADMGs. Furthermore, we define the m-connecting factorization criterion for ADMG models, characterized by a single equation, and prove its equivalence to the global Markov property. The m-connecting imset and factorization criterion provide two new statistical tools for learning and inference with ADMG models. We demonstrate the usefulness of these tools by formulating and evaluating a consistent scoring criterion with a closed form solution.
    Implicit Gradient Regularization. (arXiv:2009.11162v3 [cs.LG] UPDATED)
    Gradient descent can be surprisingly good at optimizing deep neural networks without overfitting and without explicit regularization. We find that the discrete steps of gradient descent implicitly regularize models by penalizing gradient descent trajectories that have large loss gradients. We call this Implicit Gradient Regularization (IGR) and we use backward error analysis to calculate the size of this regularization. We confirm empirically that implicit gradient regularization biases gradient descent toward flat minima, where test errors are small and solutions are robust to noisy parameter perturbations. Furthermore, we demonstrate that the implicit gradient regularization term can be used as an explicit regularizer, allowing us to control this gradient regularization directly. More broadly, our work indicates that backward error analysis is a useful theoretical approach to the perennial question of how learning rate, model size, and parameter regularization interact to determine the properties of overparameterized models optimized with gradient descent.
    Calibrated ensembles can mitigate accuracy tradeoffs under distribution shift. (arXiv:2207.08977v1 [cs.LG])
    We often see undesirable tradeoffs in robust machine learning where out-of-distribution (OOD) accuracy is at odds with in-distribution (ID) accuracy: a robust classifier obtained via specialized techniques such as removing spurious features often has better OOD but worse ID accuracy compared to a standard classifier trained via ERM. In this paper, we find that ID-calibrated ensembles -- where we simply ensemble the standard and robust models after calibrating on only ID data -- outperforms prior state-of-the-art (based on self-training) on both ID and OOD accuracy. On eleven natural distribution shift datasets, ID-calibrated ensembles obtain the best of both worlds: strong ID accuracy and OOD accuracy. We analyze this method in stylized settings, and identify two important conditions for ensembles to perform well both ID and OOD: (1) we need to calibrate the standard and robust models (on ID data, because OOD data is unavailable), (2) OOD has no anticorrelated spurious features.
    A label-efficient two-sample test. (arXiv:2111.08861v5 [cs.LG] UPDATED)
    Two-sample tests evaluate whether two samples are realizations of the same distribution (the null hypothesis) or two different distributions (the alternative hypothesis). We consider a new setting for this problem where sample features are easily measured whereas sample labels are unknown and costly to obtain. Accordingly, we devise a three-stage framework in service of performing an effective two-sample test with only a small number of sample label queries: first, a classifier is trained with samples uniformly labeled to model the posterior probabilities of the labels; second, a novel query scheme dubbed \emph{bimodal query} is used to query labels of samples from both classes, and last, the classical Friedman-Rafsky (FR) two-sample test is performed on the queried samples. Theoretical analysis and extensive experiments performed on several datasets demonstrate that the proposed test controls the Type I error and has decreased Type II error relative to uniform querying and certainty-based querying. Source code for our algorithms and experimental results is available at \url{https://github.com/wayne0908/Label-Efficient-Two-Sample}.
    Generalization Bounds via Convex Analysis. (arXiv:2202.04985v3 [stat.ML] UPDATED)
    Since the celebrated works of Russo and Zou (2016,2019) and Xu and Raginsky (2017), it has been well known that the generalization error of supervised learning algorithms can be bounded in terms of the mutual information between their input and the output, given that the loss of any fixed hypothesis has a subgaussian tail. In this work, we generalize this result beyond the standard choice of Shannon's mutual information to measure the dependence between the input and the output. Our main result shows that it is indeed possible to replace the mutual information by any strongly convex function of the joint input-output distribution, with the subgaussianity condition on the losses replaced by a bound on an appropriately chosen norm capturing the geometry of the dependence measure. This allows us to derive a range of generalization bounds that are either entirely new or strengthen previously known ones. Examples include bounds stated in terms of $p$-norm divergences and the Wasserstein-2 distance, which are respectively applicable for heavy-tailed loss distributions and highly smooth loss functions. Our analysis is entirely based on elementary tools from convex analysis by tracking the growth of a potential function associated with the dependence measure and the loss function.
    A Unifying Causal Framework for Analyzing Dataset Shift-stable Learning Algorithms. (arXiv:1905.11374v5 [stat.ML] UPDATED)
    Recent interest in the external validity of prediction models (i.e., the problem of different train and test distributions, known as dataset shift) has produced many methods for finding predictive distributions that are invariant to dataset shifts and can be used for prediction in new, unseen environments. However, these methods consider different types of shifts and have been developed under disparate frameworks, making it difficult to theoretically analyze how solutions differ with respect to stability and accuracy. Taking a causal graphical view, we use a flexible graphical representation to express various types of dataset shifts. Given a known graph of the data generating process, we show that all invariant distributions correspond to a causal hierarchy of graphical operators which disable the edges in the graph that are responsible for the shifts. The hierarchy provides a common theoretical underpinning for understanding when and how stability to shifts can be achieved, and in what ways stable distributions can differ. We use it to establish conditions for minimax optimal performance across environments, and derive new algorithms that find optimal stable distributions. Using this new perspective, we empirically demonstrate that that there is a tradeoff between minimax and average performance.
    Sufficient Statistic Memory AMP. (arXiv:2112.15327v3 [cs.IT] UPDATED)
    Approximate message passing (AMP) type algorithms have been widely used in the signal reconstruction of certain large random linear systems. A key feature of the AMP-type algorithms is that their dynamics can be correctly described by state evolution. However, the state evolution does not necessarily be convergent. To solve the convergence problem of the state evolution of AMP-type algorithms in principle, this paper proposes a memory AMP (MAMP) under a sufficient statistic condition, named sufficient statistic MAMP (SS-MAMP). We show that the covariance matrices of SS-MAMP are L-banded and convergent. Given an arbitrary MAMP, we can construct an SS-MAMP by damping, which not only ensures the convergence of the state evolution, but also preserves the orthogonality, i.e., its dynamics can be correctly described by state evolution. As a byproduct, we prove that the Bayes-optimal orthogonal/vector AMP (BO-OAMP/VAMP) is an SS-MAMP. As a result, we reveal two interesting properties of BO-OAMP/VAMP for large systems: 1) the covariance matrices are L-banded and are convergent, and 2) damping and memory are not needed (i.e., do not bring performance improvement). As an example, we construct a sufficient statistic Bayes-optimal MAMP (SS-BO-MAMP) whose state evolution converges to the minimum (i.e., Bayes-optimal) mean square error (MSE) predicted by replica methods. In addition, the MSE of SS-BO-MAMP is not worse than the original BO-MAMP. Finally, simulations are provided to verify the theoretical results.
    Signed Network Embedding with Application to Simultaneous Detection of Communities and Anomalies. (arXiv:2207.09324v1 [cs.SI])
    Signed networks are frequently observed in real life with additional sign information associated with each edge, yet such information has been largely ignored in existing network models. This paper develops a unified embedding model for signed networks to disentangle the intertwined balance structure and anomaly effect, which can greatly facilitate the downstream analysis, including community detection, anomaly detection, and network inference. The proposed model captures both balance structure and anomaly effect through a low rank plus sparse matrix decomposition, which are jointly estimated via a regularized formulation. Its theoretical guarantees are established in terms of asymptotic consistency and finite-sample probability bounds for network embedding, community detection and anomaly detection. The advantage of the proposed embedding model is also demonstrated through extensive numerical experiments on both synthetic networks and an international relation network.
    The role of the geometric mean in case-control studies. (arXiv:2207.09016v1 [stat.ME])
    Historically used in settings where the outcome is rare or data collection is expensive, outcome-dependent sampling is relevant to many modern settings where data is readily available for a biased sample of the target population, such as public administrative data. Under outcome-dependent sampling, common effect measures such as the average risk difference and the average risk ratio are not identified, but the conditional odds ratio is. Aggregation of the conditional odds ratio is challenging since summary measures are generally not identified. Furthermore, the marginal odds ratio can be larger (or smaller) than all conditional odds ratios. This so-called non-collapsibility of the odds ratio is avoidable if we use an alternative aggregation to the standard arithmetic mean. We provide a new definition of collapsibility that makes this choice of aggregation method explicit, and we demonstrate that the odds ratio is collapsible under geometric aggregation. We describe how to partially identify, estimate, and do inference on the geometric odds ratio under outcome-dependent sampling. Our proposed estimator is based on the efficient influence function and therefore has doubly robust-style properties.
    Lazy Estimation of Variable Importance for Large Neural Networks. (arXiv:2207.09097v1 [stat.ML])
    As opaque predictive models increasingly impact many areas of modern life, interest in quantifying the importance of a given input variable for making a specific prediction has grown. Recently, there has been a proliferation of model-agnostic methods to measure variable importance (VI) that analyze the difference in predictive power between a full model trained on all variables and a reduced model that excludes the variable(s) of interest. A bottleneck common to these methods is the estimation of the reduced model for each variable (or subset of variables), which is an expensive process that often does not come with theoretical guarantees. In this work, we propose a fast and flexible method for approximating the reduced model with important inferential guarantees. We replace the need for fully retraining a wide neural network by a linearization initialized at the full model parameters. By adding a ridge-like penalty to make the problem convex, we prove that when the ridge penalty parameter is sufficiently large, our method estimates the variable importance measure with an error rate of $O(\frac{1}{\sqrt{n}})$ where $n$ is the number of training samples. We also show that our estimator is asymptotically normal, enabling us to provide confidence bounds for the VI estimates. We demonstrate through simulations that our method is fast and accurate under several data-generating regimes, and we demonstrate its real-world applicability on a seasonal climate forecasting example.
    Similarity of Pre-trained and Fine-tuned Representations. (arXiv:2207.09225v1 [cs.LG])
    In transfer learning, only the last part of the networks - the so-called head - is often fine-tuned. Representation similarity analysis shows that the most significant change still occurs in the head even if all weights are updatable. However, recent results from few-shot learning have shown that representation change in the early layers, which are mostly convolutional, is beneficial, especially in the case of cross-domain adaption. In our paper, we find out whether that also holds true for transfer learning. In addition, we analyze the change of representation in transfer learning, both during pre-training and fine-tuning, and find out that pre-trained structure is unlearned if not usable.
    Adversarial Bandits with Knapsacks. (arXiv:1811.11881v8 [cs.DS] UPDATED)
    We consider Bandits with Knapsacks (henceforth, BwK), a general model for multi-armed bandits under supply/budget constraints. In particular, a bandit algorithm needs to solve a well-known knapsack problem: find an optimal packing of items into a limited-size knapsack. The BwK problem is a common generalization of numerous motivating examples, which range from dynamic pricing to repeated auctions to dynamic ad allocation to network routing and scheduling. While the prior work on BwK focused on the stochastic version, we pioneer the other extreme in which the outcomes can be chosen adversarially. This is a considerably harder problem, compared to both the stochastic version and the "classic" adversarial bandits, in that regret minimization is no longer feasible. Instead, the objective is to minimize the competitive ratio: the ratio of the benchmark reward to the algorithm's reward. We design an algorithm with competitive ratio O(log T) relative to the best fixed distribution over actions, where T is the time horizon; we also prove a matching lower bound. The key conceptual contribution is a new perspective on the stochastic version of the problem. We suggest a new algorithm for the stochastic version, which builds on the framework of regret minimization in repeated games and admits a substantially simpler analysis compared to prior work. We then analyze this algorithm for the adversarial version and use it as a subroutine to solve the latter.
    A sharp uniform-in-time error estimate for Stochastic Gradient Langevin Dynamics. (arXiv:2207.09304v1 [math.PR])
    We establish a sharp uniform-in-time error estimate for the Stochastic Gradient Langevin Dynamics (SGLD), which is a popular sampling algorithm. Under mild assumptions, we obtain a uniform-in-time $O(\eta^2)$ bound for the KL-divergence between the SGLD iteration and the Langevin diffusion, where $\eta$ is the step size (or learning rate). Our analysis is also valid for varying step sizes. Based on this, we are able to obtain an $O(\eta)$ bound for the distance between the SGLD iteration and the invariant distribution of the Langevin diffusion, in terms of Wasserstein or total variation distances.
    A coherence parameter characterizing generative compressed sensing with Fourier measurements. (arXiv:2207.09340v1 [cs.IT])
    In Bora et al. (2017), a mathematical framework was developed for compressed sensing guarantees in the setting where the measurement matrix is Gaussian and the signal structure is the range of a generative neural network (GNN). The problem of compressed sensing with GNNs has since been extensively analyzed when the measurement matrix and/or network weights follow a subgaussian distribution. We move beyond the subgaussian assumption, to measurement matrices that are derived by sampling uniformly at random rows of a unitary matrix (including subsampled Fourier measurements as a special case). Specifically, we prove the first known restricted isometry guarantee for generative compressed sensing with subsampled isometries, and provide recovery bounds with nearly order-optimal sample complexity, addressing an open problem of Scarlett et al. (2022, p. 10). Recovery efficacy is characterized by the coherence, a new parameter, which measures the interplay between the range of the network and the measurement matrix. Our approach relies on subspace counting arguments and ideas central to high-dimensional probability. Furthermore, we propose a regularization strategy for training GNNs to have favourable coherence with the measurement operator. We provide compelling numerical simulations that support this regularized training strategy: our strategy yields low coherence networks that require fewer measurements for signal recovery. This, together with our theoretical results, supports coherence as a natural quantity for characterizing generative compressed sensing with subsampled isometries.
    Neural Greedy Pursuit for Feature Selection. (arXiv:2207.09390v1 [cs.LG])
    We propose a greedy algorithm to select $N$ important features among $P$ input features for a non-linear prediction problem. The features are selected one by one sequentially, in an iterative loss minimization procedure. We use neural networks as predictors in the algorithm to compute the loss and hence, we refer to our method as neural greedy pursuit (NGP). NGP is efficient in selecting $N$ features when $N \ll P$, and it provides a notion of feature importance in a descending order following the sequential selection procedure. We experimentally show that NGP provides better performance than several feature selection methods such as DeepLIFT and Drop-one-out loss. In addition, we experimentally show a phase transition behavior in which perfect selection of all $N$ features without false positives is possible when the training data size exceeds a threshold.
    Robust Training of Neural Networks Using Scale Invariant Architectures. (arXiv:2202.00980v2 [cs.LG] UPDATED)
    In contrast to SGD, adaptive gradient methods like Adam allow robust training of modern deep networks, especially large language models. However, the use of adaptivity not only comes at the cost of extra memory but also raises the fundamental question: can non-adaptive methods like SGD enjoy similar benefits? In this paper, we provide an affirmative answer to this question by proposing to achieve both robust and memory-efficient training via the following general recipe: (1) modify the architecture and make it scale invariant, i.e. the scale of parameter doesn't affect the output of the network, (2) train with SGD and weight decay, and optionally (3) clip the global gradient norm proportional to weight norm multiplied by $\sqrt{\tfrac{2\lambda}{\eta}}$, where $\eta$ is learning rate and $\lambda$ is weight decay. We show that this general approach is robust to rescaling of parameter and loss by proving that its convergence only depends logarithmically on the scale of initialization and loss, whereas the standard SGD might not even converge for many initializations. Following our recipe, we design a scale invariant version of BERT, called SIBERT, which when trained simply by vanilla SGD achieves performance comparable to BERT trained by adaptive methods like Adam on downstream tasks.
    Heterogeneous Treatment Effect with Trained Kernels of the Nadaraya-Watson Regression. (arXiv:2207.09139v1 [cs.LG])
    A new method for estimating the conditional average treatment effect is proposed in the paper. It is called TNW-CATE (the Trainable Nadaraya-Watson regression for CATE) and based on the assumption that the number of controls is rather large whereas the number of treatments is small. TNW-CATE uses the Nadaraya-Watson regression for predicting outcomes of patients from the control and treatment groups. The main idea behind TNW-CATE is to train kernels of the Nadaraya-Watson regression by using a weight sharing neural network of a specific form. The network is trained on controls, and it replaces standard kernels with a set of neural subnetworks with shared parameters such that every subnetwork implements the trainable kernel, but the whole network implements the Nadaraya-Watson estimator. The network memorizes how the feature vectors are located in the feature space. The proposed approach is similar to the transfer learning when domains of source and target data are similar, but tasks are different. Various numerical simulation experiments illustrate TNW-CATE and compare it with the well-known T-learner, S-learner and X-learner for several types of the control and treatment outcome functions. The code of proposed algorithms implementing TNW-CATE is available in https://github.com/Stasychbr/TNW-CATE.
    Finite-Sample Maximum Likelihood Estimation of Location. (arXiv:2206.02348v2 [math.ST] UPDATED)
    We consider 1-dimensional location estimation, where we estimate a parameter $\lambda$ from $n$ samples $\lambda + \eta_i$, with each $\eta_i$ drawn i.i.d. from a known distribution $f$. For fixed $f$ the maximum-likelihood estimate (MLE) is well-known to be optimal in the limit as $n \to \infty$: it is asymptotically normal with variance matching the Cram\'er-Rao lower bound of $\frac{1}{n\mathcal{I}}$, where $\mathcal{I}$ is the Fisher information of $f$. However, this bound does not hold for finite $n$, or when $f$ varies with $n$. We show for arbitrary $f$ and $n$ that one can recover a similar theory based on the Fisher information of a smoothed version of $f$, where the smoothing radius decays with $n$.
    Probabilistic Reconciliation of Count Time Series. (arXiv:2207.09322v1 [stat.ME])
    We propose a principled method for the reconciliation of any probabilistic base forecasts. We show how probabilistic reconciliation can be obtained by merging, via Bayes' rule, the information contained in the base forecast for the bottom and the upper time series. We illustrate our method on a toy hierarchy, showing how our framework allows the probabilistic reconciliation of any base forecast. We perform experiment in the reconciliation of temporal hierarchies of count time series, obtaining major improvements compared to probabilistic reconciliation based on the Gaussian or the truncated Gaussian distribution.
    On the Study of Sample Complexity for Polynomial Neural Networks. (arXiv:2207.08896v1 [cs.LG])
    As a general type of machine learning approach, artificial neural networks have established state-of-art benchmarks in many pattern recognition and data analysis tasks. Among various kinds of neural networks architectures, polynomial neural networks (PNNs) have been recently shown to be analyzable by spectrum analysis via neural tangent kernel, and particularly effective at image generation and face recognition. However, acquiring theoretical insight into the computation and sample complexity of PNNs remains an open problem. In this paper, we extend the analysis in previous literature to PNNs and obtain novel results on sample complexity of PNNs, which provides some insights in explaining the generalization ability of PNNs.
    Why do tree-based models still outperform deep learning on tabular data?. (arXiv:2207.08815v1 [cs.LG])
    While deep learning has enabled tremendous progress on text and image datasets, its superiority on tabular data is not clear. We contribute extensive benchmarks of standard and novel deep learning methods as well as tree-based models such as XGBoost and Random Forests, across a large number of datasets and hyperparameter combinations. We define a standard set of 45 datasets from varied domains with clear characteristics of tabular data and a benchmarking methodology accounting for both fitting models and finding good hyperparameters. Results show that tree-based models remain state-of-the-art on medium-sized data ($\sim$10K samples) even without accounting for their superior speed. To understand this gap, we conduct an empirical investigation into the differing inductive biases of tree-based models and Neural Networks (NNs). This leads to a series of challenges which should guide researchers aiming to build tabular-specific NNs: 1. be robust to uninformative features, 2. preserve the orientation of the data, and 3. be able to easily learn irregular functions. To stimulate research on tabular architectures, we contribute a standard benchmark and raw data for baselines: every point of a 20 000 compute hours hyperparameter search for each learner.
    Implicit Regularization with Polynomial Growth in Deep Tensor Factorization. (arXiv:2207.08942v1 [cs.LG])
    We study the implicit regularization effects of deep learning in tensor factorization. While implicit regularization in deep matrix and 'shallow' tensor factorization via linear and certain type of non-linear neural networks promotes low-rank solutions with at most quadratic growth, we show that its effect in deep tensor factorization grows polynomially with the depth of the network. This provides a remarkably faithful description of the observed experimental behaviour. Using numerical experiments, we demonstrate the benefits of this implicit regularization in yielding a more accurate estimation and better convergence properties.
    Deeply-Learned Generalized Linear Models with Missing Data. (arXiv:2207.08911v1 [stat.ML])
    Deep Learning (DL) methods have dramatically increased in popularity in recent years, with significant growth in their application to supervised learning problems in the biomedical sciences. However, the greater prevalence and complexity of missing data in modern biomedical datasets present significant challenges for DL methods. Here, we provide a formal treatment of missing data in the context of deeply learned generalized linear models, a supervised DL architecture for regression and classification problems. We propose a new architecture, \textit{dlglm}, that is one of the first to be able to flexibly account for both ignorable and non-ignorable patterns of missingness in input features and response at training time. We demonstrate through statistical simulation that our method outperforms existing approaches for supervised learning tasks in the presence of missing not at random (MNAR) missingness. We conclude with a case study of a Bank Marketing dataset from the UCI Machine Learning Repository, in which we predict whether clients subscribed to a product based on phone survey data.
    FLAIR: Federated Learning Annotated Image Repository. (arXiv:2207.08869v1 [cs.LG])
    Cross-device federated learning is an emerging machine learning (ML) paradigm where a large population of devices collectively train an ML model while the data remains on the devices. This research field has a unique set of practical challenges, and to systematically make advances, new datasets curated to be compatible with this paradigm are needed. Existing federated learning benchmarks in the image domain do not accurately capture the scale and heterogeneity of many real-world use cases. We introduce FLAIR, a challenging large-scale annotated image dataset for multi-label classification suitable for federated learning. FLAIR has 429,078 images from 51,414 Flickr users and captures many of the intricacies typically encountered in federated learning, such as heterogeneous user data and a long-tailed label distribution. We implement multiple baselines in different learning setups for different tasks on this dataset. We believe FLAIR can serve as a challenging benchmark for advancing the state-of-the art in federated learning. Dataset access and the code for the benchmark are available at \url{https://github.com/apple/ml-flair}.
    Assaying Out-Of-Distribution Generalization in Transfer Learning. (arXiv:2207.09239v1 [cs.LG])
    Since out-of-distribution generalization is a generally ill-posed problem, various proxy targets (e.g., calibration, adversarial robustness, algorithmic corruptions, invariance across shifts) were studied across different research programs resulting in different recommendations. While sharing the same aspirational goal, these approaches have never been tested under the same experimental conditions on real data. In this paper, we take a unified view of previous work, highlighting message discrepancies that we address empirically, and providing recommendations on how to measure the robustness of a model and how to improve it. To this end, we collect 172 publicly available dataset pairs for training and out-of-distribution evaluation of accuracy, calibration error, adversarial attacks, environment invariance, and synthetic corruptions. We fine-tune over 31k networks, from nine different architectures in the many- and few-shot setting. Our findings confirm that in- and out-of-distribution accuracies tend to increase jointly, but show that their relation is largely dataset-dependent, and in general more nuanced and more complex than posited by previous, smaller scale studies.
    Near-Optimal Quantum Algorithms for Multivariate Mean Estimation. (arXiv:2111.09787v2 [quant-ph] UPDATED)
    We propose the first near-optimal quantum algorithm for estimating in Euclidean norm the mean of a vector-valued random variable with finite mean and covariance. Our result aims at extending the theory of multivariate sub-Gaussian estimators to the quantum setting. Unlike classically, where any univariate estimator can be turned into a multivariate estimator with at most a logarithmic overhead in the dimension, no similar result can be proved in the quantum setting. Indeed, Heinrich ruled out the existence of a quantum advantage for the mean estimation problem when the sample complexity is smaller than the dimension. Our main result is to show that, outside this low-precision regime, there is a quantum estimator that outperforms any classical estimator. Our approach is substantially more involved than in the univariate setting, where most quantum estimators rely only on phase estimation. We exploit a variety of additional algorithmic techniques such as amplitude amplification, the Bernstein-Vazirani algorithm, and quantum singular value transformation. Our analysis also uses concentration inequalities for multivariate truncated statistics. We develop our quantum estimators in two different input models that showed up in the literature before. The first one provides coherent access to the binary representation of the random variable and it encompasses the classical setting. In the second model, the random variable is directly encoded into the phases of quantum registers. This model arises naturally in many quantum algorithms but it is often incomparable to having classical samples. We adapt our techniques to these two settings and we show that the second model is strictly weaker for solving the mean estimation problem. Finally, we describe several applications of our algorithms, notably in measuring the expectation values of commuting observables and in the field of machine learning.
    Data-driven initialization of deep learning solvers for Hamilton-Jacobi-Bellman PDEs. (arXiv:2207.09299v1 [math.OC])
    A deep learning approach for the approximation of the Hamilton-Jacobi-Bellman partial differential equation (HJB PDE) associated to the Nonlinear Quadratic Regulator (NLQR) problem. A state-dependent Riccati equation control law is first used to generate a gradient-augmented synthetic dataset for supervised learning. The resulting model becomes a warm start for the minimization of a loss function based on the residual of the HJB PDE. The combination of supervised learning and residual minimization avoids spurious solutions and mitigate the data inefficiency of a supervised learning-only approach. Numerical tests validate the different advantages of the proposed methodology.
    A Prospective Approach for Human-to-Human Interaction Recognition from Wi-Fi Channel Data using Attention Bidirectional Gated Recurrent Neural Network with GUI Application Implementation. (arXiv:2202.08146v3 [cs.LG] UPDATED)
    Recent advances in 5G wireless technology and socioeconomic transformation have brought a paradigm shift in sensor applications. Wi-Fi signal demonstrates a strong correlation between its temporal variation and body movements, which can be leveraged to recognize human activity. In this article, we demonstrate the cognitive ability of device free mutual human-to-human interaction recognition method based on the time scale Wi-Fi channel state information. The mutual activities examined are steady-state, approaching, departing, handshaking, high-five, hugging, kicking (left-leg), kicking (right-leg), pointing (left-hand), pointing (right-hand), punching(left-hand), punching (right-hand), and pushing. We explore and propose a Self-Attention furnished Bidirectional Gated Recurrent Neural Network model to classify 13 human-to-human mutual interaction types from the time-series data. Our proposed model can recognize a two subject pair mutual interaction with a maximum benchmark accuracy of 94%. This has been expanded for ten subject pairs, which secured a benchmark accuracy of 88% with improved classification around the interaction-transition region. Also, an executable graphical user interface (GUI) is developed, using the PyQt5 python module, to subsequently display the overall mutual human-interaction recognition procedure in real-time. Finally, we conclude with a brief discourse regarding the possible solutions to the handicaps that resulted in curtailments observed during the study. Such, Wi-Fi channel perturbation pattern analysis is believed to be an efficient, economical and privacy-friendly approach to be potentially utilized in mutual human-interaction recognition for indoor activity monitoring, surveillance system, smart health monitoring systems and independent assisted living.
    Online Learning with Off-Policy Feedback. (arXiv:2207.08956v1 [cs.LG])
    We study the problem of online learning in adversarial bandit problems under a partial observability model called off-policy feedback. In this sequential decision making problem, the learner cannot directly observe its rewards, but instead sees the ones obtained by another unknown policy run in parallel (behavior policy). Instead of a standard exploration-exploitation dilemma, the learner has to face another challenge in this setting: due to limited observations outside of their control, the learner may not be able to estimate the value of each policy equally well. To address this issue, we propose a set of algorithms that guarantee regret bounds that scale with a natural notion of mismatch between any comparator policy and the behavior policy, achieving improved performance against comparators that are well-covered by the observations. We also provide an extension to the setting of adversarial linear contextual bandits, and verify the theoretical guarantees via a set of experiments. Our key algorithmic idea is adapting the notion of pessimistic reward estimators that has been recently popular in the context of off-policy reinforcement learning.

  • Open

    If i use Chibi, ShortlyAI, or Playground, do i still own the content I create? And can I use it commercially (e.g. sell it as a novel on Amazon)?
    I am specifically asking about the content you generate when pressing the "generate" button. I am not asking about content you upload to the site (though, that may also be helpful to know). I have read the terms of use of them, but I am not sure I really understand them. As far as I can tell they do not mention an answer to my title's question anywhere. I have seen AI generated novels be sold on Amazon, but that does not mean that they were allowed to do that. submitted by /u/Pashahlis [link] [comments]  ( 92 min )
    A selfie of you. Beautifully made, detailed, cute. High quality, studio lighting, product. (NightCafé)
    IA understands the request well. A selfie of him! I did the test several times, each time, it is unable to respond to the request! It will generate a blank image! Several reasons could be given. Do you have any ideas? A selfie of you (NightCafé) A selfie of you (NightCafé) submitted by /u/StantheBrain [link] [comments]  ( 86 min )
    Weekly China AI News: AI Teaches Robots to Align with Human Value; BAAI Releases Results of Big Model Roadmap Plagiarism Investigation; Alibaba-backed Voice Tech Firm Files for IPO
    submitted by /u/trcytony [link] [comments]  ( 86 min )
    [D] Let me share with you a cool piece that I came across in the latest IEEE newsletter issue.
    Hey guys! Let me share with you a cool piece that I came across in the latest IEEE newsletter issue. It’s a guide that covers a new approach to creating tinyML models. Hope you’ll find it useful: https://iot.ieee.org/newsletter/july-2022/automated-design-of-tiny-machine-learning-models-a-practical-guide-part-1 submitted by /u/Potsieramirez [link] [comments]  ( 92 min )
    Will AI Steal Submarines’ Stealth?
    submitted by /u/jormungandrsjig [link] [comments]  ( 85 min )
    Produce Amazing Artworks with Text and Sketches! "Make-A-Scene": a fantastic blend between text and sketch-conditioned image generation.
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 86 min )
    Glass Madness
    submitted by /u/widgia [link] [comments]  ( 85 min )
    What am I looking at here? My interpretation is that the X axis is the level of activation of the output neuron and y is the level of activation of the input neuron. Is that correct?
    submitted by /u/Mobeamers [link] [comments]  ( 92 min )
    Python Hidden Hacks You Probably Don’t Know About
    submitted by /u/RubiksCodeNMZ [link] [comments]  ( 91 min )
    168 Mediums of Art on Paper made with A.I | a View from A.I
    submitted by /u/OneFinding1429 [link] [comments]  ( 86 min )
  • Open

    Negative numbers in observation_spec
    submitted by /u/Original_Ad_7443 [link] [comments]  ( 106 min )
    Advice on RL for trading
    I'm making DQN for automatic trading. I am feeding it with 100 numbers representing the last variations of the price in percent, normalized. I "punish" (negative reward) the network when it has not bought and the price goes up, or when it has bought and the price goes down. Inversely, I "reward" the network when it has bought and the price goes up, or when it has not bought and the price goes down. Right now this is the network: Dense(64, activation="relu") Dense(64, activation="relu") Dense(64, activation="relu") Dense(3, activation="relu") It ends with 3 neurons because there are 3 possible actions: Buy, Sell, Wait. When running this, the network overfits a LOT and learns every possible trade until it wins every time, during years. When I update the network like this: Dense(64, activation="relu") Dropout(0.3) Dense(64, activation="relu") Dropout(0.3) Dense(64, activation="relu") Dropout(0.3) Dense(3, activation="relu") It is not able to learn AT ALL anymore. With a dropout lower than 0.3 it overfits completely, with a dropout higher than 0.3 it is not able to learn at all. Is there any beginner flaw in there ? Any recommendation ? I have spent weeks on it, trying different inputs / model sizes, and all I ever get is extreme overfit or extreme underfit. Thanks in advance. submitted by /u/superpowers_fan [link] [comments]  ( 90 min )
    Incentivize steep action distribution around zero - PPO Algorithm -
    I am currently dealing with an environment in which selecting zero as an action all the time leads to a very high rewards. It can be beaten though. The distribution of actions of our benchmark method, an MPC method, shows a very high spike around zero. So it is selecting zero as an action most of the time. So is there a way to basically default to zero as an action and only leave it, when the algorithm is actually pretty sure, it will lead to higher rewards? The more I think about it, I come to the conclusion that discrete action spaces might be better fitted here. But for now I want to stick with PPO. Also I think this is pretty similar to a case where the agent is initialized with imitation learning. Then we don't want to unlearn the already good policy, but only make changes when we are pretty sure it leads to an improvement. Does anybody know a good paper on this topic? ​ Action distribution of the RL method and the MPC benchmark method submitted by /u/flxh13 [link] [comments]  ( 87 min )
    What is the best way to represent an observation space in a custom gym environment?
    I'm trying to develop a custom gym environment. Each state is represented by a 9-dimensional binary array and I have a total of 500 different states. Example: State0 = [1, 0, 1, 0, 1, 1, 0, 1, 0] State1 = [0, 1, 1, 1, 1, 1, 0, 1, 0] ... State499 = [1, 0, 0, 0, 1, 1, 0, 1, 0] Each array represent a set of features and I'm try to train an agent to perform specific action according to this features. What is the best way to represent this observation space using Gym Spaces? submitted by /u/Altruistic_Drink_231 [link] [comments]  ( 86 min )
    Recommended stack for Hyperparameter tuning?
    What are some recommended tools for hyperparameter tuning in RL? I am looking both for frameworks and cloud/distributed scaling. Thanks submitted by /u/SatoshiNotMe [link] [comments]  ( 86 min )
    Can we teach a robot arm to write with reinforcement learning?
    Hello RL community, I am new to RL, but have been doing robotic manipulation/arms for five years. My team aims to teach a robot arm to write using RL. I am trying to understand the nature of problems that RL can solve. For example, I understand that the walking problem, solved by RL, is not similar to writing because walking has repeated sub-tasks. Writing has sentences of unique words that are not repeated. I still lack the intuition about tasks robots can learn using RL and would appreciate it if you share insights from your experience. - Is writing a skill that we can teach robots to do with RL? - Should it be more specific, like teaching one word, a Roman character, or a Chinese character (Kanji)? - Is teaching a robot to write one word similar to teaching it to draw a simple shape? Kindly excuse my lack of experience in the field. I am already trying to figure out my path into it. ____ Update: To facilitate the discussion, I would like to assume a specific task. It is to write a Kanji character like this. Kanji of \"study\" Or to write a Roman letter, like "B". ​ https://preview.redd.it/j2kn2rktyhc91.png?width=1071&format=png&auto=webp&s=21c1509aed94e2c21bab0f632a9848f18ac08a98 The requirement is to teach the robot to reproduce this Roman or Kanji character with RL. Your recommendations are most appreciated. Thank you! submitted by /u/Biomacs [link] [comments]  ( 92 min )
    I used Note System on MNIST,traning speed was increased by more than two times!You can view this project on my github.
    submitted by /u/7NoteDancing [link] [comments]  ( 86 min )
    Q-learning Convergence
    I was revisiting Chapter 6.7 of Sutton, which discusses Maximization Bias. It says "Even at asymptote, Q-learning takes the left action about 5% more often than is optimal at our parameter settings " (See image below). How can this be true if Q-learning converges? I believe the condition is that the learning rate alpha decays at a suitable rate, and that all s, a pairs are visited infinitely many times. At asymptote, shouldn't Q-learning converge at a policy only taking left 5% of time just like Double Q-Learning? ​ https://preview.redd.it/79jjoldadfc91.png?width=551&format=png&auto=webp&s=a075c679a6b73f15db1396531b7c221e9b922107 submitted by /u/jhoveen1 [link] [comments]  ( 86 min )
  • Open

    5 Growth Pillars of Smart Learning and Education
    The education sector has seen profound developments and will continue to observe the same in the future. The rising importance of education across the globe, especially in developing countries with large population numbers has attracted tremendous advancements. With the emergence of novel technologies, the means of receiving education have increased exponentially. Smart learning has become… Read More »5 Growth Pillars of Smart Learning and Education The post 5 Growth Pillars of Smart Learning and Education appeared first on Data Science Central.  ( 19 min )
    Google Ads Headlines: How To Write Headlines That Get More Clicks
    Are you struggling to get clicks on your Google Ads? You’re not alone. In fact, most people don’t know how to write headlines that get clicked. If you’re ready to learn how to write headlines that get more clicks, then this blog post is for you. You’ll learn some great headline writing tips that will… Read More »Google Ads Headlines: How To Write Headlines That Get More Clicks The post Google Ads Headlines: How To Write Headlines That Get More Clicks appeared first on Data Science Central.  ( 20 min )
    Healthcare Industry: The Impact of Business Intelligence
    The healthcare industry has long been a complicated endeavor, primarily due to health and medicine’s central concept and myriad other equally intricate factors. Interestingly, any healthcare institution must ensure seamless operations while also managing these factors and working on other goals, such as cutting down costs, fostering better efficiency, delivering a better quality of care,… Read More »Healthcare Industry: The Impact of Business Intelligence The post Healthcare Industry: The Impact of Business Intelligence appeared first on Data Science Central.  ( 18 min )
    Factors to Consider while Developing Mobile Apps
    With over a thousand apps launched daily, the market competition in mobile app development is intense. Going mobile is no longer an option for a business; it is an operating rule. Companies that haven’t yet launched an app are missing out on an essential milestone in the digital revolution. Number of mobile app downloads worldwide… Read More »Factors to Consider while Developing Mobile Apps The post Factors to Consider while Developing Mobile Apps appeared first on Data Science Central.  ( 21 min )
    How Annotations Can Transform AI Training Data
    With a variety of businesses integrating AI technology and machine learning models into their business practices, AI has become less of a novelty and more mainstream over the past few years. With ever-growing amounts of data generated worldwide, you are likely already in possession of the data you need for your machine learning models and… Read More »How Annotations Can Transform AI Training Data The post How Annotations Can Transform AI Training Data appeared first on Data Science Central.  ( 19 min )
    Understanding the Value of Bayesian Networks
    Machine learning algorithms are based on correlation – they do not specify cause and effect relations. Increasingly, Hence, Causality (cause and effect relations) is an important theme – missing in machine learning A Bayesian network is a probabilistic graphical model representing a set of variables and their conditional dependencies via a directed acyclic graph (DAG). … Read More »Understanding the Value of Bayesian Networks The post Understanding the Value of Bayesian Networks appeared first on Data Science Central.  ( 18 min )
    How Blockchain Is Changing the Accounting Profession
    Crudely, blockchain is to accounting and finance what the internet was to computers ages back. While even the fastest traditional bank takes an hour to clear a check, a DLT-based clearinghouse does it in 20 seconds flat! The distributed ledger technology working through a network of thousands of globally distributed computers promises deliverance to businesses… Read More »How Blockchain Is Changing the Accounting Profession The post How Blockchain Is Changing the Accounting Profession appeared first on Data Science Central.  ( 19 min )
  • Open

    [D] How to choose noise schedule in diffusion models?
    Any tips where to look for answers? Typically in the papers schedules states as fact, and I didn't manage to find proper explanation in general. Also, training and inference schedules are different. Why? Thanks for help in advance. submitted by /u/AdelSexy [link] [comments]  ( 87 min )
    [D] Most important unsolved problems in AI research
    Suggesting this topic for discussion, as I am trying to identify the current most important unsolved problems in AI research. Below are a few proposed items that are top of mind for me, would appreciate any input (what to add or what to remove from the list) and relevant sources. Uniting compositional-structure processing (human’s ability for symbolic operations, generalization, etc) with neural computation (e.g. https://arxiv.org/abs/2205.01128). Ability to match knowledge to context. E.g. the text generated by the LLM is a great match for a sci-fi novel, but not as advice to a patient regarding their medical condition. Catastrophic forgetting. It is a known limitation to continual learning, however, it seems like the large-scale models show an indication of robustness (http://www.c…  ( 93 min )
    [R] Evaluating SSL Pipeline
    I am training a self-supervised representation learning (SSL) model in SimSiam style. Other than evaluating a downstream task, what are the techniques commonly used for evaluating the representation learned by SSL pipelines? submitted by /u/mishtimoi [link] [comments]  ( 105 min )
    [R] BeerAdvocate Dataset (Sentence annotations)
    Hi, if you're not familiar with this dataset it has text snippets which encode information about beers to give a final rating. There are four main aspects in each review (taste, aroma, appearance, pallet), and I've been trying to get the sentence-level annotations for this dataset where each of these is highlighted, but it's quite hard to find. Multiple papers claim to use them (e.g., here), but I am finding it difficult to locate them. Even some websites have removed the data here. I have found some annotations here, but I wonder am I missing something? Some papers report there are 100 annotations, others report 800 or so. It seems it original dataset didn't have these sentence-level annotations of the various beer aspects, and researchers uploaded their own labels over the years. I'm just posting in case anyone knows where I could find the full list of annotations for each beer aspect (taste, aroma, appearance, pallet)? Or did I already find them in the above link? Thanks if you have time to help out, and sorry if I've missed something obvious. submitted by /u/SkeeringReal [link] [comments]  ( 88 min )
    [D] Vision transformers: Why non-overlapping patches?
    So I have been looking into various flavors of transformers for vision/image-based tasks and almost all of them use non-overlapping patches, is there a reason for that? My problem with that is if the kernel size(patch dimension) is large then the non-overlapping features might get lost, am I thinking in terms of "image" and I should think more in terms of NLP? submitted by /u/bitemenow999 [link] [comments]  ( 88 min )
    [P] Enabling Creative Expression with Concept Activation Vectors - a project from Google AI
    https://ai.googleblog.com/2022/07/enabling-creative-expression-with.html A project from Google's Brain and Mural teams aims to narrow the gap between the objective, categorical ML inferences and subjective, artistic values. Mood Board Search is a tool built with CAVs (Concept Activation Vectors, from the TCAV paper, Been Kim et al) for humans to express visually subjective concepts to a machine, and then search a dataset for other images with similar qualities. We took great care to make the training interface obvious and user-friendly, using the metaphor of mood boards to anchor to the idea of 'visual aesthetic' and appeal to an artistic mindset. https://i.redd.it/0k9j0wyj6ic91.gif The results were surprising. 3 artists/curators were able to build visually compelling CAVs with only a handful of images that surfaced images with a coherent visual style, across a variety of different subject matter, resulting in feelings of being able to “break out of visually-similar echo chambers” or “see the world through another person’s eyes”. These findings point towards new ways of designing collaborative ML systems that embrace personal and collective subjectivity, with new tools letting a broader audience of people work more closely with ML models. We have open-sourced the all the code for the tool on GitHub, along with the 3 artist-created concepts and a premade image bank so it's ready to use. submitted by /u/joerick [link] [comments]  ( 90 min )
  • Open

    best neural network package in R
    which package is the best neural network package in R? Thank you. submitted by /u/microsat2 [link] [comments]  ( 86 min )
  • Open

    Simplified Transfer Learning for Chest Radiography Model Development
    Posted by Akib Uddin, Product Manager and Andrew Sellergren, Software Engineer, Google Health Every year, nearly a billion chest X-ray (CXR) images are taken globally to aid in the detection and management of health conditions ranging from collapsed lungs to infectious diseases. Generally, CXRs are cheaper and more accessible than other forms of medical imaging. However, existing challenges continue to impede the optimal use of CXRs. For example, in some areas, trained radiologists that can accurately interpret CXR images are in short supply. In addition, interpretation variability between experts, workflow differences between institutions, and the presence of rare conditions familiar only to subspecialists all contribute to making high-quality CXR interpretation a challenge. Recent res…  ( 25 min )
  • Open

    Localize content into multiple languages using AWS machine learning services
    Over the last few years, online education platforms have seen an increase in adoption of and an uptick in demand for video-based learnings because it offers an effective medium to engage learners. To expand to international markets and address a culturally and linguistically diverse population, businesses are also looking at diversifying their learning offerings by […]  ( 10 min )
    Identify rooftop solar panels from satellite imagery using Amazon Rekognition Custom Labels
    Renewable resources like sunlight provide a sustainable and carbon neutral mechanism to generate power. Governments in many countries are providing incentives and subsidies to households to install solar panels as part of small-scale renewable energy schemes. This has created a huge demand for solar panels. Reaching out to potential customers at the right time, through […]  ( 10 min )
  • Open

    Keyhole contour integrals
    The big idea The Cauchy integral theorem says that the integral of around a closed path in the complex plane depends only on the poles of the integrand inside the path. You can change the path itself however you like as long as you don’t change which poles are inside. This observation is often used […] Keyhole contour integrals first appeared on John D. Cook.  ( 5 min )

  • Open

    Nvidia AI Research Team Presents A Deep Reinforcement Learning (RL) Based Approach To Create Smaller And Faster Circuits
    There is a law known as Moore’s law, which states that the number of transistors on a microchip doubles every two years. And as Moore’s law slows, it becomes more vital to create alternative techniques for improving chip performance at the same technological process node. NVIDIA has revealed a new method that uses artificial intelligence to build smaller, quicker, and more efficient circuits to give an increased performance with each new generation of chips. It demonstrates that AI is capable of learning to create these circuits from the ground up in its work using Deep Reinforcement Learning. ✅ Till now, the first method using a deep reinforcement learning agent to design arithmetic circuits ✅ The results show that the best PrefixRL adder achieved a 25% lower area than the electronic design automation tool Continue reading | Checkout the paper and source article. ​ https://i.redd.it/8tr858u6rec91.gif submitted by /u/ai-lover [link] [comments]  ( 87 min )
    Stable-Baselines3 MultiInputPolicy with ml-agents UnityToGymWrapper
    Hey, This is quite a specific set up but hopefully someone out there can maybe help me out? I've used Unity and ml-agents to create an environment. I then built the exe file (binary file) to use the UnityToGymWrapper to create a Gym instance of the environment. I'm now trying to use stable-baselines3 to train my agent using 'MultiInputPolicy', since the observations is a dict of (image1, image2, image3, vector(ie 'direct' features)). However, I get the following error: ​ raise NotImplementedError(f"{observation_space} observation space is not supported") ​ Has anyone come across this problem? Does anyone have an idea on how to fix it? Thanks! submitted by /u/leozinho2r [link] [comments]  ( 86 min )
  • Open

    Last Week in AI: Drones beat human pilots in first fair race, better call quality with AI, how artists view AI-generated art, and more!
    submitted by /u/regalalgorithm [link] [comments]  ( 86 min )
    🦢 Dream Swan
    submitted by /u/widgia [link] [comments]  ( 85 min )
    Like It Did With Chess, Sam Harris Thinks AI Will Outperform Humans in All Subject Areas (short audio clip from Lex Fridman's podcast)
    submitted by /u/mmiller9913 [link] [comments]  ( 86 min )
    After dark I see colors
    submitted by /u/pinkinkgallery [link] [comments]  ( 85 min )
    Meet Intel® Neural Compressor: An Open-Source Python Library for Model Compression that Reduces the Model Size and Increases the Speed of Deep Learning Inference for Deployment on CPUs or GPUs
    Intel has recently released Neural Compressor, an open-source Python package for model compression. This library can be applied to deep learning deployment on CPUs or GPUs to decrease the model size and speed up inference. Additionally, it offers a uniform user interface for well-known network compression techniques, including quantization, pruning, and knowledge distillation across various deep learning frameworks. The tool’s automatic accuracy-driven tweaking technique can be utilized to generate the best-quantized model. Additionally, it allows knowledge distillation so that the knowledge from the teacher model may be transferred to the student model. It implements several weight pruning methods to produce pruned models using a predetermined sparsity goal. For improved framework interoperability, the Python library also offers APIs for various deep learning frameworks, including TensorFlow, PyTorch, and MXNet. Continue reading | The Github repo for the library can be accessed here. submitted by /u/ai-lover [link] [comments]  ( 87 min )
    what recent development in AI is a game changer according to you?
    submitted by /u/immoral_writer [link] [comments]  ( 86 min )
    Masters Thesis - Artificial Intelligence based scenarios
    Hi there! I am a masters student at Loughborough university in the UK currently assessing applicant reactions to AI based scenarios when applying to jobs! To complete my thesis, I require participants to answer a 10-minute survey surrounding this topic. I would really appreciate it if anyone could complete this, and spread this message to anyone who they feel might be interested. Importantly, this survey has been given ethical clearance, with all data being stored anonomoulsy! Here's the link to those interested - https://lborobusiness.eu.qualtrics.com/jfe/form/SV_0OJzCGdNz0CZTYa submitted by /u/CBPoker2000 [link] [comments]  ( 86 min )
    Unraveling the Deep Learning Reproducibility Crisis
    Hello all! I wrote my first blogpost! I believe there exists a Reproducibility Crisis within a large subset of deep learning research published today, which is what I've made an effort to investigate. It explores the current standards for writing deep learning papers, the details they lack, challenges in reproduction, and what can potentially solve them. https://dagshub.com/blog/unraveling-the-deep-learning-reproducibility-crisis/ submitted by /u/codeinassembly [link] [comments]  ( 87 min )
    BLOOM can set a new culture for AI research—but challenges remain
    submitted by /u/bendee983 [link] [comments]  ( 86 min )
    Great CVPR content on Computer Vision News of July 2022
    Here is RSIP Vision's traditional BEST OF CVPR in Computer Vision News of July 2022. Many great articles about AI, Deep Learning, Computer Vision and more... HTML5 version (recommended) PDF version Dilbert on page 2. Free subscription on page 58. Enjoy! https://preview.redd.it/5rfrepnn9bc91.jpg?width=400&format=pjpg&auto=webp&s=0c1fd85ebc827bf8ec5e3c51b31f4c8bf9590edd submitted by /u/Gletta [link] [comments]  ( 86 min )
    Einstein
    submitted by /u/widgia [link] [comments]  ( 85 min )
    PXL•E - Hyper realistic images created on pixelz.ai 🧍🏻‍♀️🧍‍♂️🧍
    submitted by /u/mdfnb [link] [comments]  ( 86 min )
    Among Us meets the Turing Test - indie game
    interesting multiplayer where you have to figure out who's a real person and who's AI https://store.steampowered.com/app/1964800/Captcha_Kills/ submitted by /u/Distinct-Sleep-295 [link] [comments]  ( 87 min )
    Disco Diffusion POTS Hi Res AI Art Weekly Slideshow 7.17.22 Steampunk Va...
    submitted by /u/prfitofthesngularity [link] [comments]  ( 86 min )
    "Mirror" created on pixelz.ai
    submitted by /u/PixelzJ [link] [comments]  ( 86 min )
    "Elf" created on pixelz.ai
    submitted by /u/PixelzJ [link] [comments]  ( 86 min )
  • Open

    [R] EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction
    Paper: https://arxiv.org/abs/2202.05146 Github: https://github.com/HannesStark/EquiBind Abstract: Predicting how a drug-like molecule binds to a specific protein target is a core problem in drug discovery. An extremely fast computational binding method would enable key applications such as fast virtual screening or drug engineering. Existing methods are computationally expensive as they rely on heavy candidate sampling coupled with scoring, ranking, and fine-tuning steps. We challenge this paradigm with EquiBind, an SE(3)-equivariant geometric deep learning model performing direct-shot prediction of both i) the receptor binding location (blind docking) and ii) the ligand's bound pose and orientation. EquiBind achieves significant speed-ups and better quality compared to traditional and recent baselines. Further, we show extra improvements when coupling it with existing fine-tuning techniques at the cost of increased running time. Finally, we propose a novel and fast fine-tuning model that adjusts torsion angles of a ligand's rotatable bonds based on closed-form global minima of the von Mises angular distance to a given input atomic point cloud, avoiding previous expensive differential evolution strategies for energy minimization. ​ https://preview.redd.it/rbqv738foec91.jpg?width=1252&format=pjpg&auto=webp&s=54d8175f93980d32329ea7b6bddcba36e42deb86 submitted by /u/Singularian2501 [link] [comments]  ( 88 min )
    [News] TorchStudio 0.9.8 (IDE for PyTorch) now support PyTorch 1.12, Apple Silicon, Metal Acceleration, Fedora and brings many new features
    Hi, I just released TorchStudio 0.9.8 with several improvements based on community feedback, looking forward for your comments ! download: https://www.torchstudio.ai/download/ full changelog: https://github.com/TorchStudio/torchstudio/releases/tag/0.9.8 If you're new to TorchStudio, you'll find introductory tutorials and videos here: https://www.torchstudio.ai/tutorials/ ​ https://preview.redd.it/acfibkampbc91.png?width=2784&format=png&auto=webp&s=e6dad5baebdb3a9d7df54f35dccc4a98753aa5fe https://preview.redd.it/s29p5yxnpbc91.png?width=2784&format=png&auto=webp&s=a9391ca731b299804c72c82051da9d1b04735c2c submitted by /u/divideconcept [link] [comments]  ( 89 min )
    [R] Unicorn: 🦄 : Towards Grand Unification of Object Tracking(Video Demo)
    submitted by /u/iFighting [link] [comments]  ( 90 min )
    Quick notes on the difference between Imagen and DALL-E 2 for those who haven’t had time to read the paper [D]
    It’s always interesting comparing these big models because when you look closely they are based on similar algorithms and have very similar architectures too. Google claims that what sets Imagen apart is: deep language understanding unprecedented photorealism It’s good of them to do this deduction for us, but all we have to compare Imagen and other models like DALL-E 2 and GLIDE is what they have given us since no one has access to Imagen yet. From the examples they published, it does really look like Imagen has achieved a good level of language understanding. If you looked into DALL-E 2 at all, you might have heard about the discussion on how it is not able to create images with comprehensible text on it or how it confuses the physical attributes of objects with each other. In th…  ( 91 min )
    [D][R] Thoughts on the new SOTA on interpretative DNNs - B-cos Networks: Alignment is All We Need for Interpretability?
    Link to arxiv here. This paper seems quite a breakthrough in designing interpretable DNNs by adding an "aligning" inductive bias into the computations in layers of the NN itself. The quantitative and qualitative results appear incredibly impressive, does anyone have any thoughts on this paper? ​ Nice figure here I took from the paper: ​ https://preview.redd.it/lqdufxpngbc91.png?width=1928&format=png&auto=webp&s=d1e2885fb179e823c8bf045b66f6a65f8705fda8 submitted by /u/shellyturnwarm [link] [comments]  ( 89 min )
    [News] Kornia 0.6.6: ParametrizedLine API, load_image support for Apple Windows Developer, integration demos with Hugging Face and many more.
    📚 Release notes: 👉 https://github.com/kornia/kornia/releases/tag/v0.6.6 📚 Docs and tutorials 👉 https://kornia.readthedocs.io/en/latest/ https://preview.redd.it/fu46z17xtac91.png?width=1060&format=png&auto=webp&s=e10c42173fca97e76d9e2ccdea8809f112c4392b https://preview.redd.it/xy64c27xtac91.png?width=640&format=png&auto=webp&s=789a197ab894ac0f5716e276aff360b09fdfb8eb submitted by /u/edgarriba [link] [comments]  ( 88 min )
    [D] In UMAP and PyNNDescent, the conversion of Cosine and Correlation measures to distance metric seems problematic
    The cosine function returns 1.0 - (result / np.sqrt(norm_x * norm_y)) (i.e 1-cos measure) and the correlation function returns 1.0 - (dot_product / np.sqrt(norm_x * norm_y)) (i.e. (1-corr coeff) As far as I understand this is to convert similarity measures to distance metrics. Both the cosine and correlation measures belong to the interval [-1,1] where 0 means dissimilar or independent, and 1 and -1 means strongly similar or correlated. The converted distance metric belongs to the interval [0, 2], where both bounds, 0 and 2, corresponds to strong similarity. This is problematic because we want a monotonic type of metric where the larger distance indicates dissimilarity. In the case of cosine similarity measure, as far as I have seen, this is usually avoided by staying in the positive space. However, this is unavoidable for correlation coefficient. One way of dealing with this problem is by taking the absolute value: 1-|corr|. But doing so takes away the information on positive and negative correlation from the k-NN graph. Am I missing anything here? If not, how to deal with this situation? UMAP distances.py: umap/distances.py at master · lmcinnes/umap (github.com) PyNNDescent distances.py: pynndescent/distances.py at master · lmcinnes/pynndescent (github.com) submitted by /u/odinnotdoit [link] [comments]  ( 89 min )
    [R] Unicorn: 🦄 : Towards Grand Unification of Object Tracking
    Video Demo for Unicorn Brief Overview We present a unified method, termed Unicorn, that can simultaneously solve four tracking problems (SOT, MOT, VOS, MOTS) with a single network using the same model parameters. For the first time, we accomplished the great unification of the tracking network architecture and learning paradigm. Unicorn performs on-par or better than its task-specific counterparts in 8 tracking datasets, including LaSOT, TrackingNet, MOT17, BDD100K, DAVIS16-17, MOTS20, and BDD100K MOTS. Our work is accepted to ECCV 2022 as an oral presentation ! Paper: https://arxiv.org/abs/2207.07078 Code: https://github.com/MasterBin-IIAU/Unicorn ​ Motivation Object tracking is one of the fundamental tasks in computer vision, which aims to build pixel-level or instance-level correspondence between frames and to output trajectories typically in the forms of boxes or masks. Over the years, according to different application scenarios, the object tracking problem has been mainly divided into four separate sub-tasks: Single Object Tracking (SOT), Multiple Object Tracking (MOT), Video Object Segmentation (VOS), and Multi-Object Tracking and Segmentation (MOTS). As a result, most tracking approaches are developed for only one of or part of the sub-tasks. Despite convenience for specific applications, this fragmented situation brings into the following drawbacks: submitted by /u/iFighting [link] [comments]  ( 88 min )
    [P] i have to build a recommender system
    What are my options? What do I have to watch out for? What tools should I be using? Please share your experience with recommender systems. thanks! submitted by /u/ethereumturk [link] [comments]  ( 87 min )
  • Open

    Galois diagram
    The previous post listed three posts I’d written before about images on the covers of math books. This post is about the image on the first edition of Dummit and Foote’s Abstract Algebra. Here’s a version of the image on the cover I recreated using LaTeX. The image on the cover appears on page 495 […] Galois diagram first appeared on John D. Cook.  ( 4 min )
    Book cover posts
    When a math book has an intriguing image on the cover, it’s fun to get to the point in the book where the meaning of the image is explained. I have some ideas for book covers I’d like to write about, but here I’d like to point out three such posts I’ve already written. Weierstrass […] Book cover posts first appeared on John D. Cook.  ( 5 min )
  • Open

    Google at ICML 2022
    Posted by Cat Armato, Program Manager, University Relations Google is a leader in machine learning (ML) research with groups innovating across virtually all aspects of the field, from theory to application. We build machine learning systems to solve deep scientific and engineering challenges in areas of language, music, visual processing, algorithm development, and more. Core to our approach is to actively engage with the broader research community by open-sourcing datasets and models, publishing our discoveries, and actively participating in leading conferences. Google is proud to be a Diamond Sponsor of the thirty-ninth International Conference on Machine Learning (ICML 2022), a premier annual conference, which is being held this week in Baltimore, Maryland. Google has a strong presenc…  ( 34 min )
  • Open

    What Do NBA Champions and CDOs have in Common?  Success Requires Being 2-way Players
    The Golden State Warriors won the 2022 National Basketball Association (NBA) title for many reasons.  Having one of the top 10 players in NBA history in Steph Curry certainly helps.  But other teams have top 10 / top 15 players, and they didn’t make the finals (or even get into the playoffs, in one case).… Read More »What Do NBA Champions and CDOs have in Common?  Success Requires Being 2-way Players The post What Do NBA Champions and CDOs have in Common?  Success Requires Being 2-way Players appeared first on Data Science Central.  ( 20 min )
    Top 7 Online Form Builders for Higher Conversion Rates
    Introduction Bringing traffic to your website will not help you unless you turn them into customers or valuable leads. To execute this, you need a web form with a high conversion rate in the significant parts of your website. This allows you to grab leads, maximise registrations, publicise free trials, expand your email list, and… Read More »Top 7 Online Form Builders for Higher Conversion Rates The post Top 7 Online Form Builders for Higher Conversion Rates appeared first on Data Science Central.  ( 20 min )
    The Impact of AI on Sports Betting and Its Software
    Unless you have been living under a rock, you have heard of artificial intelligence, a technology that allows machines to mimic human intelligence. It is a broad topic that includes tools and technologies such as knowledge representation, machine learning, natural language processing (NLP), database management, etc. Unsurprisingly, AI has found countless applications across various sectors,… Read More »The Impact of AI on Sports Betting and Its Software The post The Impact of AI on Sports Betting and Its Software appeared first on Data Science Central.  ( 18 min )
  • Open

    Generate synchronized closed captions and audio using the Amazon Polly subtitle generator
    Amazon Polly, an AI generated text-to-speech service, enables you to automate and scale your interactive voice solutions, helping to improve productivity and reduce costs. As our customers continue to use Amazon Polly for its rich set of features and ease of use, we have observed a demand for the ability to simultaneously generate synchronized audio […]  ( 7 min )
    Accelerate your identity verification projects using AWS Amplify and Amazon Rekognition sample implementations
    Amazon Rekognition allows you to mitigate fraudulent attacks and minimize onboarding friction for legitimate customers through a streamlined identity verification process. This can result in an increase in customer trust and safety. Key capabilities of this solution include: Register a new user using a selfie Register a new user after face match against an ID […]  ( 10 min )
  • Open

    Reducing Bias and Improving Safety in DALL·E 2
    Today, we are implementing a new technique so that DALL·E generates images of people that more accurately reflect the diversity of the world’s population. This technique is applied at the system level when DALL·E is given a prompt describing a person that does not  ( 3 min )
  • Open

    Living on the Edge: New Features for NVIDIA Fleet Command Deliver All-in-One Edge AI Management, Maintenance for Enterprises
    NVIDIA Fleet Command — a cloud service for deploying, managing and scaling AI applications at the edge — now includes features that enhance the seamless management of edge AI deployments around the world. With the scale of edge AI deployments, organizations can have up to thousands of independent edge locations that must be managed by Read article > The post Living on the Edge: New Features for NVIDIA Fleet Command Deliver All-in-One Edge AI Management, Maintenance for Enterprises appeared first on NVIDIA Blog.  ( 6 min )
    CORSAIR Integrates NVIDIA Broadcast’s Audio, Video AI Features in iCUE and Elgato Software This Week ‘In the NVIDIA Studio’
    Technology company CORSAIR and streaming partner BigCheeseKIT step In the NVIDIA Studio this week. A leader in high-performance gear and systems for gamers, content creators and PC enthusiasts, CORSAIR has integrated NVIDIA Broadcast technologies into its hardware and iCUE software. Similar AI enhancements have also been added to Elgato’s audio and video software, Wave Link and Camera Hub. The post CORSAIR Integrates NVIDIA Broadcast’s Audio, Video AI Features in iCUE and Elgato Software This Week ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.  ( 7 min )
  • Open

    Confidential Containers: Verifiably secure computation in the cloud
    For many organizations, trusting their data to the cloud requires having a complete understanding of and control over the environment in which that data resides and how it’s being processed. Microsoft understands this, and we are committed to building a trustworthy cloud—one in which security, privacy, and transparency are built into its core. A key […] The post Confidential Containers: Verifiably secure computation in the cloud appeared first on Microsoft Research.  ( 9 min )
  • Open

    Do I lose information by vectorizing a matrix?
    If instead of using matrix (images, game boards, maps...) as an input (N x M) I use a vector (n*m x 1) writing one row after the other. Will that have an impact on the final model? submitted by /u/CommunityBrave822 [link] [comments]  ( 88 min )
    Help with error in Jupyter Notebooks
    I am having an error with Jupyter Notebooks. My error is "NameError: name '_C' is not defined. I am writing a neural network to process the BCI Competition dataset. I've posted the code as a screenshot here. I opened Jupyter through the Anaconda command prompt. All of the packages shown as imports are installed and I have restarted the kernel. In fact, this is the second notebook to try this. When I run 'pip install torch' it states that the requirement is already satisfied and I confirmed that through the Anaconda Navigator. Torch is installed in the base root. Can anyone help me with my error? ​ https://preview.redd.it/2y8bi5w4sac91.png?width=1085&format=png&auto=webp&s=ce9edf7b6884cd1b921e0d8a68d0adfba36a0d01 submitted by /u/RaahulPokemon [link] [comments]  ( 87 min )
    Neural Network Loss Landscapes: What do we know?
    submitted by /u/nickb [link] [comments]  ( 86 min )
  • Open

    What is synthetic Data in machine learning, and why do you need it? — Do It Easy With ScienceProg
    As the name suggests, synthetic data is the data that is artificially generated rather than being created by actual events. In marketing…  ( 9 min )
  • Open

    PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method. (arXiv:2110.06906v2 [cs.LG] UPDATED)
    Emphatic temporal difference (ETD) learning (Sutton et al., 2016) is a successful method to conduct the off-policy value function evaluation with function approximation. Although ETD has been shown to converge asymptotically to a desirable value function, it is well-known that ETD often encounters a large variance so that its sample complexity can increase exponentially fast with the number of iterations. In this work, we propose a new ETD method, called PER-ETD (i.e., PEriodically Restarted-ETD), which restarts and updates the follow-on trace only for a finite period for each iteration of the evaluation parameter. Further, PER-ETD features a design of the logarithmical increase of the restart period with the number of iterations, which guarantees the best trade-off between the variance and bias and keeps both vanishing sublinearly. We show that PER-ETD converges to the same desirable fixed point as ETD, but improves the exponential sample complexity of ETD to be polynomials. Our experiments validate the superior performance of PER-ETD and its advantage over ETD.  ( 2 min )
    Learning Sparse Fixed-Structure Gaussian Bayesian Networks. (arXiv:2107.10450v2 [cs.DS] UPDATED)
    Gaussian Bayesian networks (a.k.a. linear Gaussian structural equation models) are widely used to model causal interactions among continuous variables. In this work, we study the problem of learning a fixed-structure Gaussian Bayesian network up to a bounded error in total variation distance. We analyze the commonly used node-wise least squares regression (LeastSquares) and prove that it has a near-optimal sample complexity. We also study a couple of new algorithms for the problem: - BatchAvgLeastSquares takes the average of several batches of least squares solutions at each node, so that one can interpolate between the batch size and the number of batches. We show that BatchAvgLeastSquares also has near-optimal sample complexity. - CauchyEst takes the median of solutions to several batches of linear systems at each node. We show that the algorithm specialized to polytrees, CauchyEstTree, has near-optimal sample complexity. Experimentally, we show that for uncontaminated, realizable data, the LeastSquares algorithm performs best, but in the presence of contamination or DAG misspecification, CauchyEst/CauchyEstTree and BatchAvgLeastSquares respectively perform better.  ( 2 min )
    Is a Caption Worth a Thousand Images? A Controlled Study for Representation Learning. (arXiv:2207.07635v1 [cs.CV])
    The development of CLIP [Radford et al., 2021] has sparked a debate on whether language supervision can result in vision models with more transferable representations than traditional image-only methods. Our work studies this question through a carefully controlled comparison of two approaches in terms of their ability to learn representations that generalize to downstream classification tasks. We find that when the pre-training dataset meets certain criteria -- it is sufficiently large and contains descriptive captions with low variability -- image-only methods do not match CLIP's transfer performance, even when they are trained with more image data. However, contrary to what one might expect, there are practical settings in which these criteria are not met, wherein added supervision through captions is actually detrimental. Motivated by our findings, we devise simple prescriptions to enable CLIP to better leverage the language information present in existing pre-training datasets.  ( 2 min )
    Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning. (arXiv:2109.03445v2 [stat.ML] UPDATED)
    The stochastic approximation algorithm is a widely used probabilistic method for finding a zero of a vector-valued funtion, when only noisy measurements of the function are available. In the literature to date, one can make a distinction between "synchronous" updating, whereby every component of the current guess is updated at each time, and `"synchronous" updating, whereby only one component is updated. In principle, it is also possible to update, at each time instant, some but not all components of $\theta_t$, which might be termed as "batch asynchronous stochastic approximation" (BASA). Also, one can also make a distinction between using a "local" clock versus a "global" clock. In this paper, we propose a unified formulation of batch asynchronous stochastic approximation (BASA) algorithms, and develop a general methodology for proving that such algorithms converge, irrespective of whether global or local clocks are used. These convergence proofs make use of weaker hypotheses than existing results. For example: existing convergence proofs when a local clock is used require that the measurement noise is an i.i.d sequence. Here, it is assumed that the measurement errors form a martingale difference sequence. Also, all results to date assume that the stochastic step sizes satisfy a probabilistic analog of the Robbins-Monro conditions. We replace this by a purely deterministic condition on the irreducibility of the underlying Markov processes. As specific applications to Reinforcement Learning, we introduce ``batch'' versions of the temporal difference algorithm $TD(0)$ for value iteration, and the $Q$-learning algorithm for finding the optimal action-value function, and also permit the use of local clocks instead of a global clock. In all cases, we establish the convergence of these algorithms, under milder conditions than in the existing literature.
    Dynamic Ranking and Translation Synchronization. (arXiv:2207.01455v2 [math.ST] UPDATED)
    In many applications, such as sport tournaments or recommendation systems, we have at our disposal data consisting of pairwise comparisons between a set of $n$ items (or players). The objective is to use this data to infer the latent strength of each item and/or their ranking. Existing results for this problem predominantly focus on the setting consisting of a single comparison graph $G$. However, there exist scenarios (e.g., sports tournaments) where the the pairwise comparison data evolves with time. Theoretical results for this dynamic setting are relatively limited and is the focus of this paper. We study an extension of the \emph{translation synchronization} problem, to the dynamic setting. In this setup, we are given a sequence of comparison graphs $(G_t)_{t\in \mathcal{T}}$, where $\mathcal{T} \subset [0,1]$ is a grid representing the time domain, and for each item $i$ and time $t\in \mathcal{T}$ there is an associated unknown strength parameter $z^*_{t,i}\in \mathbb{R}$. We aim to recover, for $t\in\mathcal{T}$, the strength vector $z^*_t=(z^*_{t,1},\dots,z^*_{t,n})$ from noisy measurements of $z^*_{t,i}-z^*_{t,j}$, where $\{i,j\}$ is an edge in $G_t$. Assuming that $z^*_t$ evolves smoothly in $t$, we propose two estimators -- one based on a smoothness-penalized least squares approach and the other based on projection onto the low frequency eigenspace of a suitable smoothness operator. For both estimators, we provide finite sample bounds for the $\ell_2$ estimation error under the assumption that $G_t$ is connected for all $t\in \mathcal{T}$, thus proving the consistency of the proposed methods in terms of the grid size $|\mathcal{T}|$. We complement our theoretical findings with experiments on synthetic and real data.
    Optimal Rates for Spectral Algorithms with Least-Squares Regression over Hilbert Spaces. (arXiv:1801.06720v4 [stat.ML] UPDATED)
    In this paper, we study regression problems over a separable Hilbert space with the square loss, covering non-parametric regression over a reproducing kernel Hilbert space. We investigate a class of spectral/regularized algorithms, including ridge regression, principal component regression, and gradient methods. We prove optimal, high-probability convergence results in terms of variants of norms for the studied algorithms, considering a capacity assumption on the hypothesis space and a general source condition on the target function. Consequently, we obtain almost sure convergence results with optimal rates. Our results improve and generalize previous results, filling a theoretical gap for the non-attainable cases.  ( 2 min )
    A two-step machine learning approach to statistical post-processing of weather forecasts for power generation. (arXiv:2207.07589v1 [stat.ML])
    By the end of 2021, the renewable energy share of the global electricity capacity reached 38.3% and the new installations are dominated by wind and solar energy, showing global increases of 12.7% and 18.5%, respectively. However, both wind and photovoltaic energy sources are highly volatile making planning difficult for grid operators, so accurate forecasts of the corresponding weather variables are essential for reliable electricity predictions. The most advanced approach in weather prediction is the ensemble method, which opens the door for probabilistic forecasting; though ensemble forecast are often underdispersive and subject to systematic bias. Hence, they require some form of statistical post-processing, where parametric models provide full predictive distributions of the weather variables at hand. We propose a general two-step machine learning-based approach to calibrating ensemble weather forecasts, where in the first step improved point forecasts are generated, which are then together with various ensemble statistics serve as input features of the neural network estimating the parameters of the predictive distribution. In two case studies based of 100m wind speed and global horizontal irradiance forecasts of the operational ensemble pre diction system of the Hungarian Meteorological Service, the predictive performance of this novel method is compared with the forecast skill of the raw ensemble and the state-of-the-art parametric approaches. Both case studies confirm that at least up to 48h statistical post-processing substantially improves the predictive performance of the raw ensemble for all considered forecast horizons. The investigated variants of the proposed two-step method outperform in skill their competitors and the suggested new approach is well applicable for different weather quantities and for a fair range of predictive distributions.
    Breaking Feedback Loops in Recommender Systems with Causal Inference. (arXiv:2207.01616v2 [cs.IR] UPDATED)
    Recommender systems play a key role in shaping modern web ecosystems. These systems alternate between (1) making recommendations (2) collecting user responses to these recommendations, and (3) retraining the recommendation algorithm based on this feedback. During this process the recommender system influences the user behavioral data that is subsequently used to update it, thus creating a feedback loop. Recent work has shown that feedback loops may compromise recommendation quality and homogenize user behavior, raising ethical and performance concerns when deploying recommender systems. To address these issues, we propose the Causal Adjustment for Feedback Loops (CAFL), an algorithm that provably breaks feedback loops using causal inference and can be applied to any recommendation algorithm that optimizes a training loss. Our main observation is that a recommender system does not suffer from feedback loops if it reasons about causal quantities, namely the intervention distributions of recommendations on user ratings. Moreover, we can calculate this intervention distribution from observational data by adjusting for the recommender system's predictions of user preferences. Using simulated environments, we demonstrate that CAFL improves recommendation quality when compared to prior correction methods.
    Selection of the Most Probable Best. (arXiv:2207.07533v1 [stat.ME])
    We consider an expected-value ranking and selection problem where all k solutions' simulation outputs depend on a common uncertain input model. Given that the uncertainty of the input model is captured by a probability simplex on a finite support, we define the most probable best (MPB) to be the solution whose probability of being optimal is the largest. To devise an efficient sampling algorithm to find the MPB, we first derive a lower bound to the large deviation rate of the probability of falsely selecting the MPB, then formulate an optimal computing budget allocation (OCBA) problem to find the optimal static sampling ratios for all solution-input model pairs that maximize the lower bound. We devise a series of sequential algorithms that apply interpretable and computationally efficient sampling rules and prove their sampling ratios achieve the optimality conditions for the OCBA problem as the simulation budget increases. The algorithms are benchmarked against a state-of-the-art sequential sampling algorithm designed for contextual ranking and selection problems and demonstrated to have superior empirical performances at finding the MPB.
    Flexible Model Aggregation for Quantile Regression. (arXiv:2103.00083v4 [stat.ML] UPDATED)
    Quantile regression is a fundamental problem in statistical learning motivated by the need to quantify uncertainty in predictions, or to model a diverse population without being overly reductive. For instance, epidemiological forecasts, cost estimates, and revenue predictions all benefit from being able to quantify the range of possible values accurately. As such, many models have been developed for this problem over many years of research in econometrics, statistics, and machine learning. Rather than proposing yet another (new) algorithm for quantile regression we adopt a meta viewpoint: we investigate methods for aggregating any number of conditional quantile models, in order to improve accuracy and robustness. We consider weighted ensembles where weights may vary over not only individual models, but also over quantile levels, and feature values. All of the models we consider in this paper can be fit using modern deep learning toolkits, and hence are widely accessible (from an implementation point of view) and scalable. To improve the accuracy of the predicted quantiles (or equivalently, prediction intervals), we develop tools for ensuring that quantiles remain monotonically ordered, and apply conformal calibration methods. These can be used without any modification of the original library of base models. We also review some basic theory surrounding quantile aggregation and related scoring rules, and contribute a few new results to this literature (for example, the fact that post sorting or post isotonic regression can only improve the weighted interval score). Finally, we provide an extensive suite of empirical comparisons across 34 data sets from two different benchmark repositories.
    Selective Regression Under Fairness Criteria. (arXiv:2110.15403v3 [cs.LG] UPDATED)
    Selective regression allows abstention from prediction if the confidence to make an accurate prediction is not sufficient. In general, by allowing a reject option, one expects the performance of a regression model to increase at the cost of reducing coverage (i.e., by predicting on fewer samples). However, as we show, in some cases, the performance of a minority subgroup can decrease while we reduce the coverage, and thus selective regression can magnify disparities between different sensitive subgroups. Motivated by these disparities, we propose new fairness criteria for selective regression requiring the performance of every subgroup to improve with a decrease in coverage. We prove that if a feature representation satisfies the sufficiency criterion or is calibrated for mean and variance, than the proposed fairness criteria is met. Further, we introduce two approaches to mitigate the performance disparity across subgroups: (a) by regularizing an upper bound of conditional mutual information under a Gaussian assumption and (b) by regularizing a contrastive loss for conditional mean and conditional variance prediction. The effectiveness of these approaches is demonstrated on synthetic and real-world datasets.
    Deep Hedging: Continuous Reinforcement Learning for Hedging of General Portfolios across Multiple Risk Aversions. (arXiv:2207.07467v1 [q-fin.CP])
    We present a method for finding optimal hedging policies for arbitrary initial portfolios and market states. We develop a novel actor-critic algorithm for solving general risk-averse stochastic control problems and use it to learn hedging strategies across multiple risk aversion levels simultaneously. We demonstrate the effectiveness of the approach with a numerical example in a stochastic volatility environment.  ( 2 min )
    On the Super-exponential Quantum Speedup of Equivariant Quantum Machine Learning Algorithms with SU($d$) Symmetry. (arXiv:2207.07250v1 [quant-ph])
    We introduce a framework of the equivariant convolutional algorithms which is tailored for a number of machine-learning tasks on physical systems with arbitrary SU($d$) symmetries. It allows us to enhance a natural model of quantum computation--permutational quantum computing (PQC) [Quantum Inf. Comput., 10, 470-497 (2010)] --and defines a more powerful model: PQC+. While PQC was shown to be effectively classically simulatable, we exhibit a problem which can be efficiently solved on PQC+ machine, whereas the best known classical algorithms runs in $O(n!n^2)$ time, thus providing strong evidence against PQC+ being classically simulatable. We further discuss practical quantum machine learning algorithms which can be carried out in the paradigm of PQC+.
    Differentially Private Fine-tuning of Language Models. (arXiv:2110.06500v2 [cs.LG] UPDATED)
    We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. Our experiments show that differentially private adaptations of these approaches outperform previous private algorithms in three important dimensions: utility, privacy, and the computational and memory cost of private training. On many commonly studied datasets, the utility of private models approaches that of non-private models. For example, on the MNLI dataset we achieve an accuracy of $87.8\%$ using RoBERTa-Large and $83.5\%$ using RoBERTa-Base with a privacy budget of $\epsilon = 6.7$. In comparison, absent privacy constraints, RoBERTa-Large achieves an accuracy of $90.2\%$. Our findings are similar for natural language generation tasks. Privately fine-tuning with DART, GPT-2-Small, GPT-2-Medium, GPT-2-Large, and GPT-2-XL achieve BLEU scores of 38.5, 42.0, 43.1, and 43.8 respectively (privacy budget of $\epsilon = 6.8,\delta=$ 1e-5) whereas the non-private baseline is $48.1$. All our experiments suggest that larger models are better suited for private fine-tuning: while they are well known to achieve superior accuracy non-privately, we find that they also better maintain their accuracy when privacy is introduced.
    Kernel Conjugate Gradient Methods with Random Projections. (arXiv:1811.01760v2 [stat.ML] UPDATED)
    We propose and study kernel conjugate gradient methods (KCGM) with random projections for least-squares regression over a separable Hilbert space. Considering two types of random projections generated by randomized sketches and Nystr\"{o}m subsampling, we prove optimal statistical results with respect to variants of norms for the algorithms under a suitable stopping rule. Particularly, our results show that if the projection dimension is proportional to the effective dimension of the problem, KCGM with randomized sketches can generalize optimally, while achieving a computational advantage. As a corollary, we derive optimal rates for classic KCGM in the well-conditioned regimes for the case that the target function may not be in the hypothesis space.
    Sparse solutions of the kernel herding algorithm by improved gradient approximation. (arXiv:2105.07900v2 [math.NA] UPDATED)
    The kernel herding algorithm is used to construct quadrature rules in a reproducing kernel Hilbert space (RKHS). While the computational efficiency of the algorithm and stability of the output quadrature formulas are advantages of this method, the convergence speed of the integration error for a given number of nodes is slow compared to that of other quadrature methods. In this paper, we propose a modified kernel herding algorithm whose framework was introduced in a previous study and aim to obtain sparser solutions while preserving the advantages of standard kernel herding. In the proposed algorithm, the negative gradient is approximated by several vertex directions, and the current solution is updated by moving in the approximate descent direction in each iteration. We show that the convergence speed of the integration error is directly determined by the cosine of the angle between the negative gradient and approximate gradient. Based on this, we propose new gradient approximation algorithms and analyze them theoretically, including through convergence analysis. In numerical experiments, we confirm the effectiveness of the proposed algorithms in terms of sparsity of nodes and computational efficiency. Moreover, we provide a new theoretical analysis of the kernel quadrature rules with fully-corrective weights, which realizes faster convergence speeds than those of previous studies.
    Meta-Calibration: Learning of Model Calibration Using Differentiable Expected Calibration Error. (arXiv:2106.09613v2 [cs.LG] UPDATED)
    Calibration of neural networks is a topical problem that is becoming more and more important as neural networks increasingly underpin real-world applications. The problem is especially noticeable when using modern neural networks, for which there is a significant difference between the confidence of the model and the probability of correct prediction. Various strategies have been proposed to improve calibration, yet accurate calibration remains challenging. We propose a novel framework with two contributions: introducing a differentiable surrogate for expected calibration error (DECE) that allows calibration quality to be directly optimised, and a meta-learning framework that uses DECE to optimise for validation set calibration with respect to model hyper-parameters. The results show that we achieve competitive performance with state-of-the-art calibration approaches. Our framework opens up a new avenue and toolset for tackling calibration, which we believe will inspire further work in this important challenge.
    Optimal No-regret Learning in Repeated First-price Auctions. (arXiv:2003.09795v5 [cs.LG] UPDATED)
    We study online learning in repeated first-price auctions with censored feedback, where a bidder, only observing the winning bid at the end of each auction, learns to adaptively bid in order to maximize her cumulative payoff. To achieve this goal, the bidder faces a challenging dilemma: if she wins the bid--the only way to achieve positive payoffs--then she is not able to observe the highest bid of the other bidders, which we assume is iid drawn from an unknown distribution. This dilemma, despite being reminiscent of the exploration-exploitation trade-off in contextual bandits, cannot directly be addressed by the existing UCB or Thompson sampling algorithms. In this paper, by exploiting the structural properties of first-price auctions, we develop the first learning algorithm that achieves $O(\sqrt{T}\log^{2.5} T)$ regret bound, which is minimax optimal up to $\log$ factors, when the bidder's private values are stochastically generated. We do so by providing an algorithm on a general class of problems, called the partially ordered contextual bandits, which combine the graph feedback across actions, the cross learning across contexts, and a partial order over the contexts. We establish both strengths and weaknesses of this framework, by showing a curious separation that a regret nearly independent of the action/context sizes is possible under stochastic contexts, but is impossible under adversarial contexts. Despite the limitation of this general framework, we further exploit the structure of first-price auctions and develop a learning algorithm that operates sample-efficiently (and computationally efficiently) in the presence of adversarially generated private values. We establish an $O(\sqrt{T}\log^3 T)$ regret bound for this algorithm, hence providing a complete characterization of optimal learning guarantees for first-price auctions.
    Plex: Towards Reliability using Pretrained Large Model Extensions. (arXiv:2207.07411v1 [cs.LG])
    A recent trend in artificial intelligence is the use of pretrained models for language and vision tasks, which have achieved extraordinary performance but also puzzling failures. Probing these models' abilities in diverse ways is therefore critical to the field. In this paper, we explore the reliability of models, where we define a reliable model as one that not only achieves strong predictive performance but also performs well consistently over many decision-making tasks involving uncertainty (e.g., selective prediction, open set recognition), robust generalization (e.g., accuracy and proper scoring rules such as log-likelihood on in- and out-of-distribution datasets), and adaptation (e.g., active learning, few-shot uncertainty). We devise 10 types of tasks over 40 datasets in order to evaluate different aspects of reliability on both vision and language domains. To improve reliability, we developed ViT-Plex and T5-Plex, pretrained large model extensions for vision and language modalities, respectively. Plex greatly improves the state-of-the-art across reliability tasks, and simplifies the traditional protocol as it improves the out-of-the-box performance and does not require designing scores or tuning the model for each task. We demonstrate scaling effects over model sizes up to 1B parameters and pretraining dataset sizes up to 4B examples. We also demonstrate Plex's capabilities on challenging tasks including zero-shot open set recognition, active learning, and uncertainty in conversational language understanding.  ( 3 min )
    Supervising Embedding Algorithms Using the Stress. (arXiv:2207.07218v1 [stat.ME])
    While classical scaling, just like principal component analysis, is parameter-free, most other methods for embedding multivariate data require the selection of one or several parameters. This tuning can be difficult due to the unsupervised nature of the situation. We propose a simple, almost obvious, approach to supervise the choice of tuning parameter(s): minimize a notion of stress. We substantiate this choice by reference to rigidity theory. We extend a result by Aspnes et al. (IEEE Mobile Computing, 2006), showing that general random geometric graphs are trilateration graphs with high probability. And we provide a stability result \`a la Anderson et al. (SIAM Discrete Mathematics, 2010). We illustrate this approach in the context of the MDS-MAP(P) algorithm of Shang and Ruml (IEEE INFOCOM, 2004). As a prototypical patch-stitching method, it requires the choice of patch size, and we use the stress to make that choice data-driven. In this context, we perform a number of experiments to illustrate the validity of using the stress as the basis for tuning parameter selection. In so doing, we uncover a bias-variance tradeoff, which is a phenomenon which may have been overlooked in the multidimensional scaling literature. By turning MDS-MAP(P) into a method for manifold learning, we obtain a local version of Isomap for which the minimization of the stress may also be used for parameter tuning.  ( 3 min )
    Joint Application of the Target Trial Causal Framework and Machine Learning Modeling to Optimize Antibiotic Therapy: Use Case on Acute Bacterial Skin and Skin Structure Infections due to Methicillin-resistant Staphylococcus aureus. (arXiv:2207.07458v1 [stat.ML])
    Bacterial infections are responsible for high mortality worldwide. Antimicrobial resistance underlying the infection, and multifaceted patient's clinical status can hamper the correct choice of antibiotic treatment. Randomized clinical trials provide average treatment effect estimates but are not ideal for risk stratification and optimization of therapeutic choice, i.e., individualized treatment effects (ITE). Here, we leverage large-scale electronic health record data, collected from Southern US academic clinics, to emulate a clinical trial, i.e., 'target trial', and develop a machine learning model of mortality prediction and ITE estimation for patients diagnosed with acute bacterial skin and skin structure infection (ABSSSI) due to methicillin-resistant Staphylococcus aureus (MRSA). ABSSSI-MRSA is a challenging condition with reduced treatment options - vancomycin is the preferred choice, but it has non-negligible side effects. First, we use propensity score matching to emulate the trial and create a treatment randomized (vancomycin vs. other antibiotics) dataset. Next, we use this data to train various machine learning methods (including boosted/LASSO logistic regression, support vector machines, and random forest) and choose the best model in terms of area under the receiver characteristic (AUC) through bootstrap validation. Lastly, we use the models to calculate ITE and identify possible averted deaths by therapy change. The out-of-bag tests indicate that SVM and RF are the most accurate, with AUC of 81% and 78%, respectively, but BLR/LASSO is not far behind (76%). By calculating the counterfactuals using the BLR/LASSO, vancomycin increases the risk of death, but it shows a large variation (odds ratio 1.2, 95% range 0.4-3.8) and the contribution to outcome probability is modest. Instead, the RF exhibits stronger changes in ITE, suggesting more complex treatment heterogeneity.  ( 3 min )
    Making Linear MDPs Practical via Contrastive Representation Learning. (arXiv:2207.07150v1 [cs.LG])
    It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations. This motivates much of the recent theoretical study on linear MDPs. However, most approaches require a given representation under unrealistic assumptions about the normalization of the decomposition or introduce unresolved computational challenges in practice. Instead, we consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning via contrastive estimation. The framework also admits confidence-adjusted index algorithms, enabling an efficient and principled approach to incorporating optimism or pessimism in the face of uncertainty. To the best of our knowledge, this provides the first practical representation learning method for linear MDPs that achieves both strong theoretical guarantees and empirical performance. Theoretically, we prove that the proposed algorithm is sample efficient in both the online and offline settings. Empirically, we demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.  ( 2 min )
    Blessing of Nonconvexity in Deep Linear Models: Depth Flattens the Optimization Landscape Around the True Solution. (arXiv:2207.07612v1 [cs.LG])
    This work characterizes the effect of depth on the optimization landscape of linear regression, showing that, despite their nonconvexity, deeper models have more desirable optimization landscape. We consider a robust and over-parameterized setting, where a subset of measurements are grossly corrupted with noise and the true linear model is captured via an $N$-layer linear neural network. On the negative side, we show that this problem \textit{does not} have a benign landscape: given any $N\geq 1$, with constant probability, there exists a solution corresponding to the ground truth that is neither local nor global minimum. However, on the positive side, we prove that, for any $N$-layer model with $N\geq 2$, a simple sub-gradient method becomes oblivious to such ``problematic'' solutions; instead, it converges to a balanced solution that is not only close to the ground truth but also enjoys a flat local landscape, thereby eschewing the need for "early stopping". Lastly, we empirically verify that the desirable optimization landscape of deeper models extends to other robust learning tasks, including deep matrix recovery and deep ReLU networks with $\ell_1$-loss.  ( 2 min )
    Causal Graphs Underlying Generative Models: Path to Learning with Limited Data. (arXiv:2207.07174v1 [cs.LG])
    Training generative models that capture rich semantics of the data and interpreting the latent representations encoded by such models are very important problems in unsupervised learning. In this work, we provide a simple algorithm that relies on perturbation experiments on latent codes of a pre-trained generative autoencoder to uncover a causal graph that is implied by the generative model. We leverage pre-trained attribute classifiers and perform perturbation experiments to check for influence of a given latent variable on a subset of attributes. Given this, we show that one can fit an effective causal graph that models a structural equation model between latent codes taken as exogenous variables and attributes taken as observed variables. One interesting aspect is that a single latent variable controls multiple overlapping subsets of attributes unlike conventional approach that tries to impose full independence. Using a pre-trained RNN-based generative autoencoder trained on a dataset of peptide sequences, we demonstrate that the learnt causal graph from our algorithm between various attributes and latent codes can be used to predict a specific property for sequences which are unseen. We compare prediction models trained on either all available attributes or only the ones in the Markov blanket and empirically show that in both the unsupervised and supervised regimes, typically, using the predictor that relies on Markov blanket attributes generalizes better for out-of-distribution sequences.  ( 3 min )
    Feed-Forward Source-Free Latent Domain Adaptation via Cross-Attention. (arXiv:2207.07624v1 [cs.LG])
    We study the highly practical but comparatively under-studied problem of latent-domain adaptation, where a source model should be adapted to a target dataset that contains a mixture of unlabelled domain-relevant and domain-irrelevant examples. Furthermore, motivated by the requirements for data privacy and the need for embedded and resource-constrained devices of all kinds to adapt to local data distributions, we focus on the setting of feed-forward source-free domain adaptation, where adaptation should not require access to the source dataset, and also be back propagation-free. Our solution is to meta-learn a network capable of embedding the mixed-relevance target dataset and dynamically adapting inference for target examples using cross-attention. The resulting framework leads to consistent improvement on strong ERM baselines. We also show that our framework sometimes even improves on the upper bound of domain-supervised adaptation, where only domain-relevant instances are provided for adaptation. This suggests that human annotated domain labels may not always be optimal, and raises the possibility of doing better through automated instance selection.  ( 2 min )
    Single Model Uncertainty Estimation via Stochastic Data Centering. (arXiv:2207.07235v1 [cs.LG])
    We are interested in estimating the uncertainties of deep neural networks, which play an important role in many scientific and engineering problems. In this paper, we present a striking new finding that an ensemble of neural networks with the same weight initialization, trained on datasets that are shifted by a constant bias gives rise to slightly inconsistent trained models, where the differences in predictions are a strong indicator of epistemic uncertainties. Using the neural tangent kernel (NTK), we demonstrate that this phenomena occurs in part because the NTK is not shift-invariant. Since this is achieved via a trivial input transformation, we show that it can therefore be approximated using just a single neural network -- using a technique that we call $\Delta-$UQ -- that estimates uncertainty around prediction by marginalizing out the effect of the biases. We show that $\Delta-$UQ's uncertainty estimates are superior to many of the current methods on a variety of benchmarks -- outlier rejection, calibration under distribution shift, and sequential design optimization of black box functions.  ( 2 min )
  • Open

    Short-Term Trajectory Prediction for Full-Immersive Multiuser Virtual Reality with Redirected Walking. (arXiv:2207.07520v1 [cs.NI])
    Full-immersive multiuser Virtual Reality (VR) envisions supporting unconstrained mobility of the users in the virtual worlds, while at the same time constraining their physical movements inside VR setups through redirected walking. For enabling delivery of high data rate video content in real-time, the supporting wireless networks will leverage highly directional communication links that will "track" the users for maintaining the Line-of-Sight (LoS) connectivity. Recurrent Neural Networks (RNNs) and in particular Long Short-Term Memory (LSTM) networks have historically presented themselves as a suitable candidate for near-term movement trajectory prediction for natural human mobility, and have also recently been shown as applicable in predicting VR users' mobility under the constraints of redirected walking. In this work, we extend these initial findings by showing that Gated Recurrent Unit (GRU) networks, another candidate from the RNN family, generally outperform the traditionally utilized LSTMs. Second, we show that context from a virtual world can enhance the accuracy of the prediction if used as an additional input feature in comparison to the more traditional utilization of solely the historical physical movements of the VR users. Finally, we show that the prediction system trained on a static number of coexisting VR users be scaled to a multi-user system without significant accuracy degradation.  ( 3 min )
    An Approach for Link Prediction in Directed Complex Networks based on Asymmetric Similarity-Popularity. (arXiv:2207.07399v1 [cs.SI])
    Complex networks are graphs representing real-life systems that exhibit unique characteristics not found in purely regular or completely random graphs. The study of such systems is vital but challenging due to the complexity of the underlying processes. This task has nevertheless been made easier in recent decades thanks to the availability of large amounts of networked data. Link prediction in complex networks aims to estimate the likelihood that a link between two nodes is missing from the network. Links can be missing due to imperfections in data collection or simply because they are yet to appear. Discovering new relationships between entities in networked data has attracted researchers' attention in various domains such as sociology, computer science, physics, and biology. Most existing research focuses on link prediction in undirected complex networks. However, not all real-life systems can be faithfully represented as undirected networks. This simplifying assumption is often made when using link prediction algorithms but inevitably leads to loss of information about relations among nodes and degradation in prediction performance. This paper introduces a link prediction method designed explicitly for directed networks. It is based on the similarity-popularity paradigm, which has recently proven successful in undirected networks. The presented algorithms handle the asymmetry in node relationships by modeling it as asymmetry in similarity and popularity. Given the observed network topology, the algorithms approximate the hidden similarities as shortest path distances using edge weights that capture and factor out the links' asymmetry and nodes' popularity. The proposed approach is evaluated on real-life networks, and the experimental results demonstrate its effectiveness in predicting missing links across a broad spectrum of networked data types and sizes.  ( 3 min )
    Explainable Sparse Knowledge Graph Completion via High-order Graph Reasoning Network. (arXiv:2207.07503v1 [cs.LG])
    Knowledge Graphs (KGs) are becoming increasingly essential infrastructures in many applications while suffering from incompleteness issues. The KG completion task (KGC) automatically predicts missing facts based on an incomplete KG. However, existing methods perform unsatisfactorily in real-world scenarios. On the one hand, their performance will dramatically degrade along with the increasing sparsity of KGs. On the other hand, the inference procedure for prediction is an untrustworthy black box. This paper proposes a novel explainable model for sparse KGC, compositing high-order reasoning into a graph convolutional network, namely HoGRN. It can not only improve the generalization ability to mitigate the information insufficiency issue but also provide interpretability while maintaining the model's effectiveness and efficiency. There are two main components that are seamlessly integrated for joint optimization. First, the high-order reasoning component learns high-quality relation representations by capturing endogenous correlation among relations. This can reflect logical rules to justify a broader of missing facts. Second, the entity updating component leverages a weight-free Graph Convolutional Network (GCN) to efficiently model KG structures with interpretability. Unlike conventional methods, we conduct entity aggregation and design composition-based attention in the relational space without additional parameters. The lightweight design makes HoGRN better suitable for sparse settings. For evaluation, we have conducted extensive experiments-the results of HoGRN on several sparse KGs present impressive improvements (9% MRR gain on average). Further ablation and case studies demonstrate the effectiveness of the main components. Our codes will be released upon acceptance.  ( 3 min )
    Creating an Explainable Intrusion Detection System Using Self Organizing Maps. (arXiv:2207.07465v1 [cs.CR])
    Modern Artificial Intelligence (AI) enabled Intrusion Detection Systems (IDS) are complex black boxes. This means that a security analyst will have little to no explanation or clarification on why an IDS model made a particular prediction. A potential solution to this problem is to research and develop Explainable Intrusion Detection Systems (X-IDS) based on current capabilities in Explainable Artificial Intelligence (XAI). In this paper, we create a Self Organizing Maps (SOMs) based X-IDS system that is capable of producing explanatory visualizations. We leverage SOM's explainability to create both global and local explanations. An analyst can use global explanations to get a general idea of how a particular IDS model computes predictions. Local explanations are generated for individual datapoints to explain why a certain prediction value was computed. Furthermore, our SOM based X-IDS was evaluated on both explanation generation and traditional accuracy tests using the NSL-KDD and the CIC-IDS-2017 datasets.  ( 2 min )
    HLT-MT: High-resource Language-specific Training for Multilingual Neural Machine Translation. (arXiv:2207.04906v2 [cs.CL] UPDATED)
    Multilingual neural machine translation (MNMT) trained in multiple language pairs has attracted considerable attention due to fewer model parameters and lower training costs by sharing knowledge among multiple languages. Nonetheless, multilingual training is plagued by language interference degeneration in shared parameters because of the negative interference among different translation directions, especially on high-resource languages. In this paper, we propose the multilingual translation model with the high-resource language-specific training (HLT-MT) to alleviate the negative interference, which adopts the two-stage training with the language-specific selection mechanism. Specifically, we first train the multilingual model only with the high-resource pairs and select the language-specific modules at the top of the decoder to enhance the translation quality of high-resource directions. Next, the model is further trained on all available corpora to transfer knowledge from high-resource languages (HRLs) to low-resource languages (LRLs). Experimental results show that HLT-MT outperforms various strong baselines on WMT-10 and OPUS-100 benchmarks. Furthermore, the analytic experiments validate the effectiveness of our method in mitigating the negative interference in multilingual training.
    Breaking Feedback Loops in Recommender Systems with Causal Inference. (arXiv:2207.01616v2 [cs.IR] UPDATED)
    Recommender systems play a key role in shaping modern web ecosystems. These systems alternate between (1) making recommendations (2) collecting user responses to these recommendations, and (3) retraining the recommendation algorithm based on this feedback. During this process the recommender system influences the user behavioral data that is subsequently used to update it, thus creating a feedback loop. Recent work has shown that feedback loops may compromise recommendation quality and homogenize user behavior, raising ethical and performance concerns when deploying recommender systems. To address these issues, we propose the Causal Adjustment for Feedback Loops (CAFL), an algorithm that provably breaks feedback loops using causal inference and can be applied to any recommendation algorithm that optimizes a training loss. Our main observation is that a recommender system does not suffer from feedback loops if it reasons about causal quantities, namely the intervention distributions of recommendations on user ratings. Moreover, we can calculate this intervention distribution from observational data by adjusting for the recommender system's predictions of user preferences. Using simulated environments, we demonstrate that CAFL improves recommendation quality when compared to prior correction methods.
    Feature Learning in Infinite-Width Neural Networks. (arXiv:2011.14522v3 [cs.LG] UPDATED)
    As its width tends to infinity, a deep neural network's behavior under gradient descent can become simplified and predictable (e.g. given by the Neural Tangent Kernel (NTK)), if it is parametrized appropriately (e.g. the NTK parametrization). However, we show that the standard and NTK parametrizations of a neural network do not admit infinite-width limits that can learn features, which is crucial for pretraining and transfer learning such as with BERT. We propose simple modifications to the standard parametrization to allow for feature learning in the limit. Using the *Tensor Programs* technique, we derive explicit formulas for such limits. On Word2Vec and few-shot learning on Omniglot via MAML, two canonical tasks that rely crucially on feature learning, we compute these limits exactly. We find that they outperform both NTK baselines and finite-width networks, with the latter approaching the infinite-width feature learning performance as width increases. More generally, we classify a natural space of neural network parametrizations that generalizes standard, NTK, and Mean Field parametrizations. We show 1) any parametrization in this space either admits feature learning or has an infinite-width training dynamics given by kernel gradient descent, but not both; 2) any such infinite-width limit can be computed using the Tensor Programs technique. Code for our experiments can be found at github.com/edwardjhu/TP4.
    Calibration of Natural Language Understanding Models with Venn--ABERS Predictors. (arXiv:2205.10586v2 [cs.CL] UPDATED)
    Transformers, currently the state-of-the-art in natural language understanding (NLU) tasks, are prone to generate uncalibrated predictions or extreme probabilities, making the process of taking different decisions based on their output relatively difficult. In this paper we propose to build several inductive Venn--ABERS predictors (IVAP), which are guaranteed to be well calibrated under minimal assumptions, based on a selection of pre-trained transformers. We test their performance over a set of diverse NLU tasks and show that they are capable of producing well-calibrated probabilistic predictions that are uniformly spread over the [0,1] interval -- all while retaining the original model's predictive accuracy.
    Estimating and Penalizing Induced Preference Shifts in Recommender Systems. (arXiv:2204.11966v2 [cs.LG] UPDATED)
    The content that a recommender system (RS) shows to users influences them. Therefore, when choosing a recommender to deploy, one is implicitly also choosing to induce specific internal states in users. Even more, systems trained via long-horizon optimization will have direct incentives to manipulate users: in this work, we focus on the incentive to shift user preferences so they are easier to satisfy. We argue that - before deployment - system designers should: estimate the shifts a recommender would induce; evaluate whether such shifts would be undesirable; and perhaps even actively optimize to avoid problematic shifts. These steps involve two challenging ingredients: estimation requires anticipating how hypothetical algorithms would influence user preferences if deployed - we do this by using historical user interaction data to train a predictive user model which implicitly contains their preference dynamics; evaluation and optimization additionally require metrics to assess whether such influences are manipulative or otherwise unwanted - we use the notion of "safe shifts", that define a trust region within which behavior is safe: for instance, the natural way in which users would shift without interference from the system could be deemed "safe". In simulated experiments, we show that our learned preference dynamics model is effective in estimating user preferences and how they would respond to new recommenders. Additionally, we show that recommenders that optimize for staying in the trust region can avoid manipulative behaviors while still generating engagement.
    Self-supervised learning in non-small cell lung cancer discovers novel morphological clusters linked to patient outcome and molecular phenotypes. (arXiv:2205.01931v2 [cs.CV] UPDATED)
    Histopathological images provide the definitive source of cancer diagnosis, containing information used by pathologists to identify and subclassify malignant disease, and to guide therapeutic choices. These images contain vast amounts of information, much of which is currently unavailable to human interpretation. Supervised deep learning approaches have been powerful for classification tasks, but they are inherently limited by the cost and quality of annotations. Therefore, we developed Histomorphological Phenotype Learning, an unsupervised methodology, which requires no annotations and operates via the self-discovery of discriminatory image features in small image tiles. Tiles are grouped into morphologically similar clusters which appear to represent recurrent modes of tumor growth emerging under natural selection. These clusters have distinct features which can be identified using orthogonal methods. Applied to lung cancer tissues, we show that they align closely with patient outcomes, with histopathologically recognised tumor types and growth patterns, and with transcriptomic measures of immunophenotype.
    clDice -- A Novel Topology-Preserving Loss Function for Tubular Structure Segmentation. (arXiv:2003.07311v7 [cs.CV] UPDATED)
    Accurate segmentation of tubular, network-like structures, such as vessels, neurons, or roads, is relevant to many fields of research. For such structures, the topology is their most important characteristic; particularly preserving connectedness: in the case of vascular networks, missing a connected vessel entirely alters the blood-flow dynamics. We introduce a novel similarity measure termed centerlineDice (short clDice), which is calculated on the intersection of the segmentation masks and their (morphological) skeleta. We theoretically prove that clDice guarantees topology preservation up to homotopy equivalence for binary 2D and 3D segmentation. Extending this, we propose a computationally efficient, differentiable loss function (soft-clDice) for training arbitrary neural segmentation networks. We benchmark the soft-clDice loss on five public datasets, including vessels, roads and neurons (2D and 3D). Training on soft-clDice leads to segmentation with more accurate connectivity information, higher graph similarity, and better volumetric scores.
    blob loss: instance imbalance aware loss functions for semantic segmentation. (arXiv:2205.08209v2 [cs.CV] UPDATED)
    Deep convolutional neural networks have proven to be remarkably effective in semantic segmentation tasks. Most popular loss functions were introduced targeting improved volumetric scores, such as the Sorensen Dice coefficient. By design, DSC can tackle class imbalance; however, it does not recognize instance imbalance within a class. As a result, a large foreground instance can dominate minor instances and still produce a satisfactory Sorensen Dice coefficient. Nevertheless, missing out on instances will lead to poor detection performance. This represents a critical issue in applications such as disease progression monitoring. For example, it is imperative to locate and surveil small-scale lesions in the follow-up of multiple sclerosis patients. We propose a novel family of loss functions, nicknamed blob loss, primarily aimed at maximizing instance-level detection metrics, such as F1 score and sensitivity. Blob loss is designed for semantic segmentation problems in which the instances are the connected components within a class. We extensively evaluate a DSC-based blob loss in five complex 3D semantic segmentation tasks featuring pronounced instance heterogeneity in terms of texture and morphology. Compared to soft Dice loss, we achieve 5 percent improvement for MS lesions, 3 percent improvement for liver tumor, and an average 2 percent improvement for Microscopy segmentation tasks considering F1 score.
    Flexible Model Aggregation for Quantile Regression. (arXiv:2103.00083v4 [stat.ML] UPDATED)
    Quantile regression is a fundamental problem in statistical learning motivated by the need to quantify uncertainty in predictions, or to model a diverse population without being overly reductive. For instance, epidemiological forecasts, cost estimates, and revenue predictions all benefit from being able to quantify the range of possible values accurately. As such, many models have been developed for this problem over many years of research in econometrics, statistics, and machine learning. Rather than proposing yet another (new) algorithm for quantile regression we adopt a meta viewpoint: we investigate methods for aggregating any number of conditional quantile models, in order to improve accuracy and robustness. We consider weighted ensembles where weights may vary over not only individual models, but also over quantile levels, and feature values. All of the models we consider in this paper can be fit using modern deep learning toolkits, and hence are widely accessible (from an implementation point of view) and scalable. To improve the accuracy of the predicted quantiles (or equivalently, prediction intervals), we develop tools for ensuring that quantiles remain monotonically ordered, and apply conformal calibration methods. These can be used without any modification of the original library of base models. We also review some basic theory surrounding quantile aggregation and related scoring rules, and contribute a few new results to this literature (for example, the fact that post sorting or post isotonic regression can only improve the weighted interval score). Finally, we provide an extensive suite of empirical comparisons across 34 data sets from two different benchmark repositories.
    PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method. (arXiv:2110.06906v2 [cs.LG] UPDATED)
    Emphatic temporal difference (ETD) learning (Sutton et al., 2016) is a successful method to conduct the off-policy value function evaluation with function approximation. Although ETD has been shown to converge asymptotically to a desirable value function, it is well-known that ETD often encounters a large variance so that its sample complexity can increase exponentially fast with the number of iterations. In this work, we propose a new ETD method, called PER-ETD (i.e., PEriodically Restarted-ETD), which restarts and updates the follow-on trace only for a finite period for each iteration of the evaluation parameter. Further, PER-ETD features a design of the logarithmical increase of the restart period with the number of iterations, which guarantees the best trade-off between the variance and bias and keeps both vanishing sublinearly. We show that PER-ETD converges to the same desirable fixed point as ETD, but improves the exponential sample complexity of ETD to be polynomials. Our experiments validate the superior performance of PER-ETD and its advantage over ETD.
    Online Continual Learning for Embedded Devices. (arXiv:2203.10681v3 [cs.LG] UPDATED)
    Real-time on-device continual learning is needed for new applications such as home robots, user personalization on smartphones, and augmented/virtual reality headsets. However, this setting poses unique challenges: embedded devices have limited memory and compute capacity and conventional machine learning models suffer from catastrophic forgetting when updated on non-stationary data streams. While several online continual learning models have been developed, their effectiveness for embedded applications has not been rigorously studied. In this paper, we first identify criteria that online continual learners must meet to effectively perform real-time, on-device learning. We then study the efficacy of several online continual learning methods when used with mobile neural networks. We measure their performance, memory usage, compute requirements, and ability to generalize to out-of-domain inputs.
    Dynamic categories, dynamic operads: From deep learning to prediction markets. (arXiv:2205.03906v2 [math.CT] UPDATED)
    Natural organized systems adapt to internal and external pressures and this seems to happens all the way down. Wanting to think clearly about this idea motivates our paper, and so the idea is elaborated extensively in the introduction, which should be broadly accessible to a philosophically-interested audience. In the remaining sections, we turn to more compressed category theory. We define the monoidal double category $\mathbf{Org}$ of dynamic organizations, we provide definitions of $\mathbf{Org}$-enriched, or "dynamic", categorical structures -- e.g. dynamic categories, operads, and monoidal categories -- and we show how they instantiate the motivating philosophical ideas. We give two examples of dynamic categorical structures: prediction markets as a dynamic operad and deep learning as a dynamic monoidal category.
    Teaching Networks to Solve Optimization Problems. (arXiv:2202.04104v2 [cs.LG] UPDATED)
    Leveraging machine learning to facilitate the optimization process is an emerging field that holds the promise to bypass the fundamental computational bottleneck caused by classic iterative solvers in critical applications requiring near-real-time optimization. The majority of existing approaches focus on learning data-driven optimizers that lead to fewer iterations in solving an optimization. In this paper, we take a different approach and propose to replace the iterative solvers altogether with a trainable parametric set function, that outputs the optimal arguments/parameters of an optimization problem in a single feed forward. We denote our method as Learning to Optimize the Optimization Process (LOOP). We show the feasibility of learning such parametric (set) functions to solve various classic optimization problems including linear/nonlinear regression, principal component analysis, transport-based coreset, and quadratic programming in supply management applications. In addition, we propose two alternative approaches for learning such parametric functions, with and without a solver in the LOOP. Finally, through various numerical experiments, we show that the trained solvers could be orders of magnitude faster than the classic iterative solvers while providing near optimal solutions.
    Theory of Acceleration of Decision Making by Correlated Time Sequences. (arXiv:2203.16004v4 [cs.LG] UPDATED)
    Photonic accelerators have been intensively studied to provide enhanced information processing capability to benefit from the unique attributes of physical processes. Recently, it has been reported that chaotically oscillating ultrafast time series from a laser, called laser chaos, provide the ability to solve multi-armed bandit (MAB) problems or decision-making problems at GHz order. Furthermore, it has been confirmed that the negatively correlated time-domain structure of laser chaos contributes to the acceleration of decision-making. However, the underlying mechanism of why decision-making is accelerated by correlated time series is unknown. In this study, we demonstrate a theoretical model to account for accelerating decision-making by correlated time sequence. We first confirm the effectiveness of the negative autocorrelation inherent in time series for solving two-armed bandit problems using Fourier transform surrogate methods. We propose a theoretical model that concerns the correlated time series subjected to the decision-making system and the internal status of the system therein in a unified manner, inspired by correlated random walks. We demonstrate that the performance derived analytically by the theory agrees well with the numerical simulations, which confirms the validity of the proposed model and leads to optimal system design. The present study paves the way for improving the effectiveness of correlated time series for decision-making, impacting artificial intelligence and other applications.
    Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable Prototypes. (arXiv:2111.15000v2 [cs.CV] UPDATED)
    We present a deformable prototypical part network (Deformable ProtoPNet), an interpretable image classifier that integrates the power of deep learning and the interpretability of case-based reasoning. This model classifies input images by comparing them with prototypes learned during training, yielding explanations in the form of "this looks like that." However, while previous methods use spatially rigid prototypes, we address this shortcoming by proposing spatially flexible prototypes. Each prototype is made up of several prototypical parts that adaptively change their relative spatial positions depending on the input image. Consequently, a Deformable ProtoPNet can explicitly capture pose variations and context, improving both model accuracy and the richness of explanations provided. Compared to other case-based interpretable models using prototypes, our approach achieves state-of-the-art accuracy and gives an explanation with greater context. The code is available at https://github.com/jdonnelly36/Deformable-ProtoPNet.
    Combining Diverse Feature Priors. (arXiv:2110.08220v2 [cs.LG] UPDATED)
    To improve model generalization, model designers often restrict the features that their models use, either implicitly or explicitly. In this work, we explore the design space of leveraging such feature priors by viewing them as distinct perspectives on the data. Specifically, we find that models trained with diverse sets of feature priors have less overlapping failure modes, and can thus be combined more effectively. Moreover, we demonstrate that jointly training such models on additional (unlabeled) data allows them to correct each other's mistakes, which, in turn, leads to better generalization and resilience to spurious correlations. Code available at https://github.com/MadryLab/copriors
    Fine-Grained Population Mobility Data-Based Community-Level COVID-19 Prediction Model. (arXiv:2202.06257v3 [cs.LG] UPDATED)
    Predicting the number of infections in the anti-epidemic process is extremely beneficial to the government in developing anti-epidemic strategies, especially in fine-grained geographic units. Previous works focus on low spatial resolution prediction, e.g., county-level, and preprocess data to the same geographic level, which loses some useful information. In this paper, we propose a fine-grained population mobility data-based model (FGC-COVID) utilizing data of two geographic levels for community-level COVID-19 prediction. We use the population mobility data between Census Block Groups (CBGs), which is a finer-grained geographic level than community, to build the graph and capture the dependencies between CBGs using graph neural networks (GNNs). To mine as finer-grained patterns as possible for prediction, a spatial weighted aggregation module is introduced to aggregate the embeddings of CBGs to community level based on their geographic affiliation and spatial autocorrelation. Extensive experiments on 300 days LA city COVID-19 data indicate our model outperforms existing forecasting models on community-level COVID-19 prediction.
    Algorithms to estimate Shapley value feature attributions. (arXiv:2207.07605v1 [cs.LG])
    Feature attributions based on the Shapley value are popular for explaining machine learning models; however, their estimation is complex from both a theoretical and computational standpoint. We disentangle this complexity into two factors: (1)~the approach to removing feature information, and (2)~the tractable estimation strategy. These two factors provide a natural lens through which we can better understand and compare 24 distinct algorithms. Based on the various feature removal approaches, we describe the multiple types of Shapley value feature attributions and methods to calculate each one. Then, based on the tractable estimation strategies, we characterize two distinct families of approaches: model-agnostic and model-specific approximations. For the model-agnostic approximations, we benchmark a wide class of estimation approaches and tie them to alternative yet equivalent characterizations of the Shapley value. For the model-specific approximations, we clarify the assumptions crucial to each method's tractability for linear, tree, and deep models. Finally, we identify gaps in the literature and promising future research directions.
    Causal Inference Through the Structural Causal Marginal Problem. (arXiv:2202.01300v3 [cs.AI] UPDATED)
    We introduce an approach to counterfactual inference based on merging information from multiple datasets. We consider a causal reformulation of the statistical marginal problem: given a collection of marginal structural causal models (SCMs) over distinct but overlapping sets of variables, determine the set of joint SCMs that are counterfactually consistent with the marginal ones. We formalise this approach for categorical SCMs using the response function formulation and show that it reduces the space of allowed marginal and joint SCMs. Our work thus highlights a new mode of falsifiability through additional variables, in contrast to the statistical one via additional data.
    Convergence of Batch Asynchronous Stochastic Approximation With Applications to Reinforcement Learning. (arXiv:2109.03445v2 [stat.ML] UPDATED)
    The stochastic approximation algorithm is a widely used probabilistic method for finding a zero of a vector-valued funtion, when only noisy measurements of the function are available. In the literature to date, one can make a distinction between "synchronous" updating, whereby every component of the current guess is updated at each time, and `"synchronous" updating, whereby only one component is updated. In principle, it is also possible to update, at each time instant, some but not all components of $\theta_t$, which might be termed as "batch asynchronous stochastic approximation" (BASA). Also, one can also make a distinction between using a "local" clock versus a "global" clock. In this paper, we propose a unified formulation of batch asynchronous stochastic approximation (BASA) algorithms, and develop a general methodology for proving that such algorithms converge, irrespective of whether global or local clocks are used. These convergence proofs make use of weaker hypotheses than existing results. For example: existing convergence proofs when a local clock is used require that the measurement noise is an i.i.d sequence. Here, it is assumed that the measurement errors form a martingale difference sequence. Also, all results to date assume that the stochastic step sizes satisfy a probabilistic analog of the Robbins-Monro conditions. We replace this by a purely deterministic condition on the irreducibility of the underlying Markov processes. As specific applications to Reinforcement Learning, we introduce ``batch'' versions of the temporal difference algorithm $TD(0)$ for value iteration, and the $Q$-learning algorithm for finding the optimal action-value function, and also permit the use of local clocks instead of a global clock. In all cases, we establish the convergence of these algorithms, under milder conditions than in the existing literature.
    Selective Regression Under Fairness Criteria. (arXiv:2110.15403v3 [cs.LG] UPDATED)
    Selective regression allows abstention from prediction if the confidence to make an accurate prediction is not sufficient. In general, by allowing a reject option, one expects the performance of a regression model to increase at the cost of reducing coverage (i.e., by predicting on fewer samples). However, as we show, in some cases, the performance of a minority subgroup can decrease while we reduce the coverage, and thus selective regression can magnify disparities between different sensitive subgroups. Motivated by these disparities, we propose new fairness criteria for selective regression requiring the performance of every subgroup to improve with a decrease in coverage. We prove that if a feature representation satisfies the sufficiency criterion or is calibrated for mean and variance, than the proposed fairness criteria is met. Further, we introduce two approaches to mitigate the performance disparity across subgroups: (a) by regularizing an upper bound of conditional mutual information under a Gaussian assumption and (b) by regularizing a contrastive loss for conditional mean and conditional variance prediction. The effectiveness of these approaches is demonstrated on synthetic and real-world datasets.
    Acoustic scene classification using auditory datasets. (arXiv:2112.13450v2 [cs.SD] UPDATED)
    The approach used not only challenges some of the fundamental mathematical techniques used so far in early experiments of the same trend but also introduces new scopes and new horizons for interesting results. The physics governing spectrograms have been optimized in the project along with exploring how it handles the intense requirements of the problem at hand. Major contributions and developments brought under the light, through this project involve using better mathematical techniques and problem-specific machine learning methods. Improvised data analysis and data augmentation for audio datasets like frequency masking and random frequency-time stretching are used in the project and hence are explained in this paper. In the used methodology, the audio transforms principle were also tried and explored, and indeed the insights gained were used constructively in the later stages of the project. Using a deep learning principle is surely one of them. Also, in this paper, the potential scopes and upcoming research openings in both short and long term tunnel of time has been presented. Although much of the results gained are domain-specific as of now, they are surely potent enough to produce novel solutions in various different domains of diverse backgrounds.
    Robust Self-Supervised Audio-Visual Speech Recognition. (arXiv:2201.01763v3 [cs.SD] UPDATED)
    Audio-based automatic speech recognition (ASR) degrades significantly in noisy environments and is particularly vulnerable to interfering speech, as the model cannot determine which speaker to transcribe. Audio-visual speech recognition (AVSR) systems improve robustness by complementing the audio stream with the visual information that is invariant to noise and helps the model focus on the desired speaker. However, previous AVSR work focused solely on the supervised learning setup; hence the progress was hindered by the amount of labeled data available. In this work, we present a self-supervised AVSR framework built upon Audio-Visual HuBERT (AV-HuBERT), a state-of-the-art audio-visual speech representation learning model. On the largest available AVSR benchmark dataset LRS3, our approach outperforms prior state-of-the-art by ~50% (28.0% vs. 14.1%) using less than 10% of labeled data (433hr vs. 30hr) in the presence of babble noise, while reducing the WER of an audio-based model by over 75% (25.8% vs. 5.8%) on average.
    Learning Sparse Fixed-Structure Gaussian Bayesian Networks. (arXiv:2107.10450v2 [cs.DS] UPDATED)
    Gaussian Bayesian networks (a.k.a. linear Gaussian structural equation models) are widely used to model causal interactions among continuous variables. In this work, we study the problem of learning a fixed-structure Gaussian Bayesian network up to a bounded error in total variation distance. We analyze the commonly used node-wise least squares regression (LeastSquares) and prove that it has a near-optimal sample complexity. We also study a couple of new algorithms for the problem: - BatchAvgLeastSquares takes the average of several batches of least squares solutions at each node, so that one can interpolate between the batch size and the number of batches. We show that BatchAvgLeastSquares also has near-optimal sample complexity. - CauchyEst takes the median of solutions to several batches of linear systems at each node. We show that the algorithm specialized to polytrees, CauchyEstTree, has near-optimal sample complexity. Experimentally, we show that for uncontaminated, realizable data, the LeastSquares algorithm performs best, but in the presence of contamination or DAG misspecification, CauchyEst/CauchyEstTree and BatchAvgLeastSquares respectively perform better.
    Interpretable Deep Learning: Interpretation, Interpretability, Trustworthiness, and Beyond. (arXiv:2103.10689v3 [cs.LG] UPDATED)
    Deep neural networks have been well-known for their superb handling of various machine learning and artificial intelligence tasks. However, due to their over-parameterized black-box nature, it is often difficult to understand the prediction results of deep models. In recent years, many interpretation tools have been proposed to explain or reveal how deep models make decisions. In this paper, we review this line of research and try to make a comprehensive survey. Specifically, we first introduce and clarify two basic concepts -- interpretations and interpretability -- that people usually get confused about. To address the research efforts in interpretations, we elaborate the designs of a number of interpretation algorithms, from different perspectives, by proposing a new taxonomy. Then, to understand the interpretation results, we also survey the performance metrics for evaluating interpretation algorithms. Further, we summarize the current works in evaluating models' interpretability using "trustworthy" interpretation algorithms. Finally, we review and discuss the connections between deep models' interpretations and other factors, such as adversarial robustness and learning from interpretations, and we introduce several open-source libraries for interpretation algorithms and evaluation approaches.
    Optimal Rates for Spectral Algorithms with Least-Squares Regression over Hilbert Spaces. (arXiv:1801.06720v4 [stat.ML] UPDATED)
    In this paper, we study regression problems over a separable Hilbert space with the square loss, covering non-parametric regression over a reproducing kernel Hilbert space. We investigate a class of spectral/regularized algorithms, including ridge regression, principal component regression, and gradient methods. We prove optimal, high-probability convergence results in terms of variants of norms for the studied algorithms, considering a capacity assumption on the hypothesis space and a general source condition on the target function. Consequently, we obtain almost sure convergence results with optimal rates. Our results improve and generalize previous results, filling a theoretical gap for the non-attainable cases.
    A Scalable AutoML Approach Based on Graph Neural Networks. (arXiv:2111.00083v4 [cs.LG] UPDATED)
    AutoML systems build machine learning models automatically by performing a search over valid data transformations and learners, along with hyper-parameter optimization for each learner. Many AutoML systems use meta-learning to guide search for optimal pipelines. In this work, we present a novel meta-learning system called KGpip which, (1) builds a database of datasets and corresponding pipelines by mining thousands of scripts with program analysis, (2) uses dataset embeddings to find similar datasets in the database based on its content instead of metadata-based features, (3) models AutoML pipeline creation as a graph generation problem, to succinctly characterize the diverse pipelines seen for a single dataset. KGpip's meta-learning is a sub-component for AutoML systems. We demonstrate this by integrating KGpip with two AutoML systems. Our comprehensive evaluation using 126 datasets, including those used by the state-of-the-art systems, shows that KGpip significantly outperforms these systems.
    Computing-In-Memory Neural Network Accelerators for Safety-Critical Systems: Can Small Device Variations Be Disastrous?. (arXiv:2207.07626v1 [cs.AR])
    Computing-in-Memory (CiM) architectures based on emerging non-volatile memory (NVM) devices have demonstrated great potential for deep neural network (DNN) acceleration thanks to their high energy efficiency. However, NVM devices suffer from various non-idealities, especially device-to-device variations due to fabrication defects and cycle-to-cycle variations due to the stochastic behavior of devices. As such, the DNN weights actually mapped to NVM devices could deviate significantly from the expected values, leading to large performance degradation. To address this issue, most existing works focus on maximizing average performance under device variations. This objective would work well for general-purpose scenarios. But for safety-critical applications, the worst-case performance must also be considered. Unfortunately, this has been rarely explored in the literature. In this work, we formulate the problem of determining the worst-case performance of CiM DNN accelerators under the impact of device variations. We further propose a method to effectively find the specific combination of device variation in the high-dimensional space that leads to the worst-case performance. We find that even with very small device variations, the accuracy of a DNN can drop drastically, causing concerns when deploying CiM accelerators in safety-critical applications. Finally, we show that surprisingly none of the existing methods used to enhance average DNN performance in CiM accelerators are very effective when extended to enhance the worst-case performance, and further research down the road is needed to address this problem.
    Optimal No-regret Learning in Repeated First-price Auctions. (arXiv:2003.09795v5 [cs.LG] UPDATED)
    We study online learning in repeated first-price auctions with censored feedback, where a bidder, only observing the winning bid at the end of each auction, learns to adaptively bid in order to maximize her cumulative payoff. To achieve this goal, the bidder faces a challenging dilemma: if she wins the bid--the only way to achieve positive payoffs--then she is not able to observe the highest bid of the other bidders, which we assume is iid drawn from an unknown distribution. This dilemma, despite being reminiscent of the exploration-exploitation trade-off in contextual bandits, cannot directly be addressed by the existing UCB or Thompson sampling algorithms. In this paper, by exploiting the structural properties of first-price auctions, we develop the first learning algorithm that achieves $O(\sqrt{T}\log^{2.5} T)$ regret bound, which is minimax optimal up to $\log$ factors, when the bidder's private values are stochastically generated. We do so by providing an algorithm on a general class of problems, called the partially ordered contextual bandits, which combine the graph feedback across actions, the cross learning across contexts, and a partial order over the contexts. We establish both strengths and weaknesses of this framework, by showing a curious separation that a regret nearly independent of the action/context sizes is possible under stochastic contexts, but is impossible under adversarial contexts. Despite the limitation of this general framework, we further exploit the structure of first-price auctions and develop a learning algorithm that operates sample-efficiently (and computationally efficiently) in the presence of adversarially generated private values. We establish an $O(\sqrt{T}\log^3 T)$ regret bound for this algorithm, hence providing a complete characterization of optimal learning guarantees for first-price auctions.
    Assessments of epistemic uncertainty using Gaussian stochastic weight averaging for fluid-flow regression. (arXiv:2109.08248v2 [physics.flu-dyn] UPDATED)
    We use Gaussian stochastic weight averaging (SWAG) to assess the model-form uncertainty associated with neural-network-based function approximation relevant to fluid flows. SWAG approximates a posterior Gaussian distribution of each weight, given training data, and a constant learning rate. Having access to this distribution, it is able to create multiple models with various combinations of sampled weights, which can be used to obtain ensemble predictions. The average of such an ensemble can be regarded as the `mean estimation', whereas its standard deviation can be used to construct `confidence intervals', which enable us to perform uncertainty quantification (UQ) with regard to the training process of neural networks. We utilize representative neural-network-based function approximation tasks for the following cases: (i) a two-dimensional circular-cylinder wake; (ii) the DayMET dataset (maximum daily temperature in North America); (iii) a three-dimensional square-cylinder wake; and (iv) urban flow, to assess the generalizability of the present idea for a wide range of complex datasets. SWAG-based UQ can be applied regardless of the network architecture, and therefore, we demonstrate the applicability of the method for two types of neural networks: (i) global field reconstruction from sparse sensors by combining convolutional neural network (CNN) and multi-layer perceptron (MLP); and (ii) far-field state estimation from sectional data with two-dimensional CNN. We find that SWAG can obtain physically-interpretable confidence-interval estimates from the perspective of model-form uncertainty. This capability supports its use for a wide range of problems in science and engineering.
    Analysis, Characterization, Prediction and Attribution of Extreme Atmospheric Events with Machine Learning: a Review. (arXiv:2207.07580v1 [cs.LG])
    Atmospheric Extreme Events (EEs) cause severe damages to human societies and ecosystems. The frequency and intensity of EEs and other associated events are increasing in the current climate change and global warming risk. The accurate prediction, characterization, and attribution of atmospheric EEs is therefore a key research field, in which many groups are currently working by applying different methodologies and computational tools. Machine Learning (ML) methods have arisen in the last years as powerful techniques to tackle many of the problems related to atmospheric EEs. This paper reviews the ML algorithms applied to the analysis, characterization, prediction, and attribution of the most important atmospheric EEs. A summary of the most used ML techniques in this area, and a comprehensive critical review of literature related to ML in EEs, are provided. A number of examples is discussed and perspectives and outlooks on the field are drawn.
    Differentially Private Fine-tuning of Language Models. (arXiv:2110.06500v2 [cs.LG] UPDATED)
    We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. Our experiments show that differentially private adaptations of these approaches outperform previous private algorithms in three important dimensions: utility, privacy, and the computational and memory cost of private training. On many commonly studied datasets, the utility of private models approaches that of non-private models. For example, on the MNLI dataset we achieve an accuracy of $87.8\%$ using RoBERTa-Large and $83.5\%$ using RoBERTa-Base with a privacy budget of $\epsilon = 6.7$. In comparison, absent privacy constraints, RoBERTa-Large achieves an accuracy of $90.2\%$. Our findings are similar for natural language generation tasks. Privately fine-tuning with DART, GPT-2-Small, GPT-2-Medium, GPT-2-Large, and GPT-2-XL achieve BLEU scores of 38.5, 42.0, 43.1, and 43.8 respectively (privacy budget of $\epsilon = 6.8,\delta=$ 1e-5) whereas the non-private baseline is $48.1$. All our experiments suggest that larger models are better suited for private fine-tuning: while they are well known to achieve superior accuracy non-privately, we find that they also better maintain their accuracy when privacy is introduced.
    FedFly: Towards Migration in Edge-based Distributed Federated Learning. (arXiv:2111.01516v2 [cs.DC] UPDATED)
    Federated learning (FL) is a privacy-preserving distributed machine learning technique that trains models while keeping all the original data generated on devices locally. Since devices may be resource constrained, offloading can be used to improve FL performance by transferring computational workload from devices to edge servers. However, due to mobility, devices participating in FL may leave the network during training and need to connect to a different edge server. This is challenging because the offloaded computations from edge server need to be migrated. In line with this assertion, we present FedFly, which is, to the best of our knowledge, the first work to migrate a deep neural network (DNN) when devices move between edge servers during FL training. Our empirical results on the CIFAR10 dataset, with both balanced and imbalanced data distribution, support our claims that FedFly can reduce training time by up to 33% when a device moves after 50% of the training is completed, and by up to 45% when 90% of the training is completed when compared to state-of-the-art offloading approach in FL. FedFly has negligible overhead of up to two seconds and does not compromise accuracy. Finally, we highlight a number of open research issues for further investigation. FedFly can be downloaded from https://github.com/qub-blesson/FedFly.
    An original model for multi-target learning of logical rules for knowledge graph reasoning. (arXiv:2112.06189v2 [cs.AI] UPDATED)
    Large-scale knowledge graphs provide structured representations of human knowledge. However, as it is impossible to collect all knowledge, knowledge graphs are usually incomplete. Reasoning based on existing facts paves a way to discover missing facts. In this paper, we study the problem of learning logical rules for reasoning on knowledge graphs for completing missing factual triplets. Learning logical rules equips a model with strong interpretability as well as the ability to generalize to similar tasks. We propose a model able to fully use training data which also considers multi-target scenarios. In addition, considering the deficiency in evaluating the performance of models and the quality of mined rules, we further propose two novel indicators to help with the problem. Experimental results empirically demonstrate that our model outperforms state-of-the-art methods on five benchmark datasets. The results also prove the effectiveness of the indicators.
    FAIR principles for AI models, with a practical application for accelerated high energy diffraction microscopy. (arXiv:2207.00611v2 [cs.AI] UPDATED)
    A concise and measurable set of FAIR (Findable, Accessible, Interoperable and Reusable) principles for scientific data is transforming the state-of-practice for data management and stewardship, supporting and enabling discovery and innovation. Learning from this initiative, and acknowledging the impact of artificial intelligence (AI) in the practice of science and engineering, we introduce a set of practical, concise, and measurable FAIR principles for AI models. We showcase how to create and share FAIR data and AI models within a unified computational framework combining the following elements: the Advanced Photon Source at Argonne National Laboratory, the Materials Data Facility, the Data and Learning Hub for Science, and funcX, and the Argonne Leadership Computing Facility (ALCF), in particular the ThetaGPU supercomputer and the SambaNova DataScale system at the ALCF AI Testbed. We describe how this domain-agnostic computational framework may be harnessed to enable autonomous AI-driven discovery.
    Learning to Separate Voices by Spatial Regions. (arXiv:2207.04203v2 [cs.SD] UPDATED)
    We consider the problem of audio voice separation for binaural applications, such as earphones and hearing aids. While today's neural networks perform remarkably well (separating $4+$ sources with 2 microphones) they assume a known or fixed maximum number of sources, K. Moreover, today's models are trained in a supervised manner, using training data synthesized from generic sources, environments, and human head shapes. This paper intends to relax both these constraints at the expense of a slight alteration in the problem definition. We observe that, when a received mixture contains too many sources, it is still helpful to separate them by region, i.e., isolating signal mixtures from each conical sector around the user's head. This requires learning the fine-grained spatial properties of each region, including the signal distortions imposed by a person's head. We propose a two-stage self-supervised framework in which overheard voices from earphones are pre-processed to extract relatively clean personalized signals, which are then used to train a region-wise separation model. Results show promising performance, underscoring the importance of personalization over a generic supervised approach. (audio samples available at our project website: https://uiuc-earable-computing.github.io/binaural/. We believe this result could help real-world applications in selective hearing, noise cancellation, and audio augmented reality.
    Is a Caption Worth a Thousand Images? A Controlled Study for Representation Learning. (arXiv:2207.07635v1 [cs.CV])
    The development of CLIP [Radford et al., 2021] has sparked a debate on whether language supervision can result in vision models with more transferable representations than traditional image-only methods. Our work studies this question through a carefully controlled comparison of two approaches in terms of their ability to learn representations that generalize to downstream classification tasks. We find that when the pre-training dataset meets certain criteria -- it is sufficiently large and contains descriptive captions with low variability -- image-only methods do not match CLIP's transfer performance, even when they are trained with more image data. However, contrary to what one might expect, there are practical settings in which these criteria are not met, wherein added supervision through captions is actually detrimental. Motivated by our findings, we devise simple prescriptions to enable CLIP to better leverage the language information present in existing pre-training datasets.
    Distributionally Robust Deep Learning using Hardness Weighted Sampling. (arXiv:2001.02658v4 [cs.LG] UPDATED)
    Limiting failures of machine learning systems is of paramount importance for safety-critical applications. In order to improve the robustness of machine learning systems, Distributionally Robust Optimization (DRO) has been proposed as a generalization of Empirical Risk Minimization (ERM). However, its use in deep learning has been severely restricted due to the relative inefficiency of the optimizers available for DRO in comparison to the wide-spread variants of Stochastic Gradient Descent (SGD) optimizers for ERM. We propose SGD with hardness weighted sampling, a principled and efficient optimization method for DRO in machine learning that is particularly suited in the context of deep learning. Similar to a hard example mining strategy in practice, the proposed algorithm is straightforward to implement and computationally as efficient as SGD-based optimizers used for deep learning, requiring minimal overhead computation. In contrast to typical ad hoc hard mining approaches, we prove the convergence of our DRO algorithm for over-parameterized deep learning networks with ReLU activation and a finite number of layers and parameters. Our experiments on fetal brain 3D MRI segmentation and brain tumor segmentation in MRI demonstrate the feasibility and the usefulness of our approach. Using our hardness weighted sampling for training a state-of-the-art deep learning pipeline leads to improved robustness to anatomical variabilities in automatic fetal brain 3D MRI segmentation using deep learning and to improved robustness to the image protocol variations in brain tumor segmentation. Our code is available at https://github.com/LucasFidon/HardnessWeightedSampler.
    ODFNet: Using orientation distribution functions to characterize 3D point clouds. (arXiv:2012.04708v2 [cs.CV] UPDATED)
    Learning new representations of 3D point clouds is an active research area in 3D vision, as the order-invariant point cloud structure still presents challenges to the design of neural network architectures. Recent works explored learning either global or local features or both for point clouds, however none of the earlier methods focused on capturing contextual shape information by analysing local orientation distribution of points. In this paper, we leverage on point orientation distributions around a point in order to obtain an expressive local neighborhood representation for point clouds. We achieve this by dividing the spherical neighborhood of a given point into predefined cone volumes, and statistics inside each volume are used as point features. In this way, a local patch can be represented by not only the selected point's nearest neighbors, but also considering a point density distribution defined along multiple orientations around the point. We are then able to construct an orientation distribution function (ODF) neural network that involves an ODFBlock which relies on mlp (multi-layer perceptron) layers. The new ODFNet model achieves state-of the-art accuracy for object classification on ModelNet40 and ScanObjectNN datasets, and segmentation on ShapeNet S3DIS datasets.
    Feed-Forward Source-Free Latent Domain Adaptation via Cross-Attention. (arXiv:2207.07624v1 [cs.LG])
    We study the highly practical but comparatively under-studied problem of latent-domain adaptation, where a source model should be adapted to a target dataset that contains a mixture of unlabelled domain-relevant and domain-irrelevant examples. Furthermore, motivated by the requirements for data privacy and the need for embedded and resource-constrained devices of all kinds to adapt to local data distributions, we focus on the setting of feed-forward source-free domain adaptation, where adaptation should not require access to the source dataset, and also be back propagation-free. Our solution is to meta-learn a network capable of embedding the mixed-relevance target dataset and dynamically adapting inference for target examples using cross-attention. The resulting framework leads to consistent improvement on strong ERM baselines. We also show that our framework sometimes even improves on the upper bound of domain-supervised adaptation, where only domain-relevant instances are provided for adaptation. This suggests that human annotated domain labels may not always be optimal, and raises the possibility of doing better through automated instance selection.
    Kernel Conjugate Gradient Methods with Random Projections. (arXiv:1811.01760v2 [stat.ML] UPDATED)
    We propose and study kernel conjugate gradient methods (KCGM) with random projections for least-squares regression over a separable Hilbert space. Considering two types of random projections generated by randomized sketches and Nystr\"{o}m subsampling, we prove optimal statistical results with respect to variants of norms for the algorithms under a suitable stopping rule. Particularly, our results show that if the projection dimension is proportional to the effective dimension of the problem, KCGM with randomized sketches can generalize optimally, while achieving a computational advantage. As a corollary, we derive optimal rates for classic KCGM in the well-conditioned regimes for the case that the target function may not be in the hypothesis space.
    QSAN: A Near-term Achievable Quantum Self-Attention Network. (arXiv:2207.07563v1 [quant-ph])
    Self-attention mechanism, an important component of machine learning, has been relatively little investigated in the field of quantum machine learning. Inspired by the variational Quantum Algorithm (VQA) framework and classical selfattention mechanism, Quantum Self-Attention Network (QSAN) that can be implemented on a near-term quantum computer is proposed. Theoretically, Quantum Self-Attention Mechanism (QSAM) is defined, which is a new interpretation of the classical self-attention mechanism after linearization and logicalization. Quantum Logical Similarity (QLS) is one of the cores of QSAM, which replaces the similarity operation of inner product with logical operation, allowing a better execution of QSAM on quantum computers. Quantum Bit Self-Attention Score Matrix (QBSASM) is another centerpiece, which is a QLS-based density matrix used to represent the output distribution. In practice, QSAN is realized based on the QSAM framework, and the concept of quantum coordinates is introduced to simplify circuit design. Finally, QSAN is tested on a quantum computer with a small sample of data, laying the foundation for Quantum Natural Language Processing (QNLP).
    Low-bit Shift Network for End-to-End Spoken Language Understanding. (arXiv:2207.07497v1 [cs.SD])
    Deep neural networks (DNN) have achieved impressive success in multiple domains. Over the years, the accuracy of these models has increased with the proliferation of deeper and more complex architectures. Thus, state-of-the-art solutions are often computationally expensive, which makes them unfit to be deployed on edge computing platforms. In order to mitigate the high computation, memory, and power requirements of inferring convolutional neural networks (CNNs), we propose the use of power-of-two quantization, which quantizes continuous parameters into low-bit power-of-two values. This reduces computational complexity by removing expensive multiplication operations and with the use of low-bit weights. ResNet is adopted as the building block of our solution and the proposed model is evaluated on a spoken language understanding (SLU) task. Experimental results show improved performance for shift neural network architectures, with our low-bit quantization achieving 98.76 \% on the test set which is comparable performance to its full-precision counterpart and state-of-the-art solutions.
    A Probabilistic Autoencoder for Type Ia Supernovae Spectral Time Series. (arXiv:2207.07645v1 [astro-ph.CO])
    We construct a physically-parameterized probabilistic autoencoder (PAE) to learn the intrinsic diversity of type Ia supernovae (SNe Ia) from a sparse set of spectral time series. The PAE is a two-stage generative model, composed of an Auto-Encoder (AE) which is interpreted probabilistically after training using a Normalizing Flow (NF). We demonstrate that the PAE learns a low-dimensional latent space that captures the nonlinear range of features that exists within the population, and can accurately model the spectral evolution of SNe Ia across the full range of wavelength and observation times directly from the data. By introducing a correlation penalty term and multi-stage training setup alongside our physically-parameterized network we show that intrinsic and extrinsic modes of variability can be separated during training, removing the need for the additional models to perform magnitude standardization. We then use our PAE in a number of downstream tasks on SNe Ia for increasingly precise cosmological analyses, including automatic detection of SN outliers, the generation of samples consistent with the data distribution, and solving the inverse problem in the presence of noisy and incomplete data to constrain cosmological distance measurements. We find that the optimal number of intrinsic model parameters appears to be three, in line with previous studies, and show that we can standardize our test sample of SNe Ia with an RMS of $0.091 \pm 0.010$ mag, which corresponds to $0.074 \pm 0.010$ mag if peculiar velocity contributions are removed. Trained models and codes are released at \href{https://github.com/georgestein/suPAErnova}{github.com/georgestein/suPAErnova}
    Quantitative Stock Investment by Routing Uncertainty-Aware Trading Experts: A Multi-Task Learning Approach. (arXiv:2207.07578v1 [q-fin.TR])
    Quantitative investment is a fundamental financial task that highly relies on accurate stock prediction and profitable investment decision making. Despite recent advances in deep learning (DL) have shown stellar performance on capturing trading opportunities in the stochastic stock market, we observe that the performance of existing DL methods is sensitive to random seeds and network initialization. To design more profitable DL methods, we analyze this phenomenon and find two major limitations of existing works. First, there is a noticeable gap between accurate financial predictions and profitable investment strategies. Second, investment decisions are made based on only one individual predictor without consideration of model uncertainty, which is inconsistent with the workflow in real-world trading firms. To tackle these two limitations, we first reformulate quantitative investment as a multi-task learning problem. Later on, we propose AlphaMix, a novel two-stage mixture-of-experts (MoE) framework for quantitative investment to mimic the efficient bottom-up trading strategy design workflow of successful trading firms. In Stage one, multiple independent trading experts are jointly optimized with an individual uncertainty-aware loss function. In Stage two, we train neural routers (corresponding to the role of a portfolio manager) to dynamically deploy these experts on an as-needed basis. AlphaMix is also a universal framework that is applicable to various backbone network architectures with consistent performance gains. Through extensive experiments on long-term real-world data spanning over five years on two of the most influential financial markets (US and China), we demonstrate that AlphaMix significantly outperforms many state-of-the-art baselines in terms of four financial criteria.
    OASYS: Domain-Agnostic Automated System for Constructing Knowledge Base from Unstructured Text. (arXiv:2207.07597v1 [cs.CL])
    In recent years, creating and managing knowledge bases have become crucial to the retail product and enterprise domains. We present an automatic knowledge base construction system that mines data from documents. This system can generate training data during the training process without human intervention. Therefore, it is domain-agnostic trainable using only the target domain text corpus and a pre-defined knowledge base. This system is called OASYS and is the first system built with the Korean language in mind. In addition, we also have constructed a new human-annotated benchmark dataset of the Korean Wikipedia corpus paired with a Korean DBpedia to aid system evaluation. The system performance results on human-annotated benchmark test dataset are meaningful and show that the generated knowledge base from OASYS trained on only auto-generated data is useful. We provide both a human-annotated test dataset and an auto-generated dataset.
    Communication-Efficient Diffusion Strategy for Performance Improvement of Federated Learning with Non-IID Data. (arXiv:2207.07493v1 [cs.DC])
    Federated learning (FL) is a novel learning paradigm that addresses the privacy leakage challenge of centralized learning. However, in FL, users with non-independent and identically distributed (non-IID) characteristics can deteriorate the performance of the global model. Specifically, the global model suffers from the weight divergence challenge owing to non-IID data. To address the aforementioned challenge, we propose a novel diffusion strategy of the machine learning (ML) model (FedDif) to maximize the FL performance with non-IID data. In FedDif, users spread local models to neighboring users over D2D communications. FedDif enables the local model to experience different distributions before parameter aggregation. Furthermore, we theoretically demonstrate that FedDif can circumvent the weight divergence challenge. On the theoretical basis, we propose the communication-efficient diffusion strategy of the ML model, which can determine the trade-off between the learning performance and communication cost based on auction theory. The performance evaluation results show that FedDif improves the test accuracy of the global model by 11% compared to the baseline FL with non-IID settings. Moreover, FedDif improves communication efficiency in perspective of the number of transmitted sub-frames and models by 2.77 folds than the latest methods
    Permutationless Many-Jet Event Reconstruction with Symmetry Preserving Attention Networks. (arXiv:2010.09206v6 [hep-ex] UPDATED)
    Top quarks, produced in large numbers at the Large Hadron Collider, have a complex detector signature and require special reconstruction techniques. The most common decay mode, the "all-jet" channel, results in a 6-jet final state which is particularly difficult to reconstruct in $pp$ collisions due to the large number of permutations possible. We present a novel approach to this class of problem, based on neural networks using a generalized attention mechanism, that we call Symmetry Preserving Attention Networks (SPA-Net). We train one such network to identify the decay products of each top quark unambiguously and without combinatorial explosion as an example of the power of this technique.This approach significantly outperforms existing state-of-the-art methods, correctly assigning all jets in $93.0%$ of $6$-jet, $87.8%$ of $7$-jet, and $82.6%$ of $\geq 8$-jet events respectively.
    Plex: Towards Reliability using Pretrained Large Model Extensions. (arXiv:2207.07411v1 [cs.LG])
    A recent trend in artificial intelligence is the use of pretrained models for language and vision tasks, which have achieved extraordinary performance but also puzzling failures. Probing these models' abilities in diverse ways is therefore critical to the field. In this paper, we explore the reliability of models, where we define a reliable model as one that not only achieves strong predictive performance but also performs well consistently over many decision-making tasks involving uncertainty (e.g., selective prediction, open set recognition), robust generalization (e.g., accuracy and proper scoring rules such as log-likelihood on in- and out-of-distribution datasets), and adaptation (e.g., active learning, few-shot uncertainty). We devise 10 types of tasks over 40 datasets in order to evaluate different aspects of reliability on both vision and language domains. To improve reliability, we developed ViT-Plex and T5-Plex, pretrained large model extensions for vision and language modalities, respectively. Plex greatly improves the state-of-the-art across reliability tasks, and simplifies the traditional protocol as it improves the out-of-the-box performance and does not require designing scores or tuning the model for each task. We demonstrate scaling effects over model sizes up to 1B parameters and pretraining dataset sizes up to 4B examples. We also demonstrate Plex's capabilities on challenging tasks including zero-shot open set recognition, active learning, and uncertainty in conversational language understanding.
    Rethinking Attention Mechanism in Time Series Classification. (arXiv:2207.07564v1 [cs.LG])
    Attention-based models have been widely used in many areas, such as computer vision and natural language processing. However, relevant applications in time series classification (TSC) have not been explored deeply yet, causing a significant number of TSC algorithms still suffer from general problems of attention mechanism, like quadratic complexity. In this paper, we promote the efficiency and performance of the attention mechanism by proposing our flexible multi-head linear attention (FMLA), which enhances locality awareness by layer-wise interactions with deformable convolutional blocks and online knowledge distillation. What's more, we propose a simple but effective mask mechanism that helps reduce the noise influence in time series and decrease the redundancy of the proposed FMLA by masking some positions of each given series proportionally. To stabilize this mechanism, samples are forwarded through the model with random mask layers several times and their outputs are aggregated to teach the same model with regular mask layers. We conduct extensive experiments on 85 UCR2018 datasets to compare our algorithm with 11 well-known ones and the results show that our algorithm has comparable performance in terms of top-1 accuracy. We also compare our model with three Transformer-based models with respect to the floating-point operations per second and number of parameters and find that our algorithm achieves significantly better efficiency with lower complexity.
    A two-step machine learning approach to statistical post-processing of weather forecasts for power generation. (arXiv:2207.07589v1 [stat.ML])
    By the end of 2021, the renewable energy share of the global electricity capacity reached 38.3% and the new installations are dominated by wind and solar energy, showing global increases of 12.7% and 18.5%, respectively. However, both wind and photovoltaic energy sources are highly volatile making planning difficult for grid operators, so accurate forecasts of the corresponding weather variables are essential for reliable electricity predictions. The most advanced approach in weather prediction is the ensemble method, which opens the door for probabilistic forecasting; though ensemble forecast are often underdispersive and subject to systematic bias. Hence, they require some form of statistical post-processing, where parametric models provide full predictive distributions of the weather variables at hand. We propose a general two-step machine learning-based approach to calibrating ensemble weather forecasts, where in the first step improved point forecasts are generated, which are then together with various ensemble statistics serve as input features of the neural network estimating the parameters of the predictive distribution. In two case studies based of 100m wind speed and global horizontal irradiance forecasts of the operational ensemble pre diction system of the Hungarian Meteorological Service, the predictive performance of this novel method is compared with the forecast skill of the raw ensemble and the state-of-the-art parametric approaches. Both case studies confirm that at least up to 48h statistical post-processing substantially improves the predictive performance of the raw ensemble for all considered forecast horizons. The investigated variants of the proposed two-step method outperform in skill their competitors and the suggested new approach is well applicable for different weather quantities and for a fair range of predictive distributions.
    3DVerifier: Efficient Robustness Verification for 3D Point Cloud Models. (arXiv:2207.07539v1 [cs.CV])
    3D point cloud models are widely applied in safety-critical scenes, which delivers an urgent need to obtain more solid proofs to verify the robustness of models. Existing verification method for point cloud model is time-expensive and computationally unattainable on large networks. Additionally, they cannot handle the complete PointNet model with joint alignment network (JANet) that contains multiplication layers, which effectively boosts the performance of 3D models. This motivates us to design a more efficient and general framework to verify various architectures of point cloud models. The key challenges in verifying the large-scale complete PointNet models are addressed as dealing with the cross-non-linearity operations in the multiplication layers and the high computational complexity of high-dimensional point cloud inputs and added layers. Thus, we propose an efficient verification framework, 3DVerifier, to tackle both challenges by adopting a linear relaxation function to bound the multiplication layer and combining forward and backward propagation to compute the certified bounds of the outputs of the point cloud models. Our comprehensive experiments demonstrate that 3DVerifier outperforms existing verification algorithms for 3D models in terms of both efficiency and accuracy. Notably, our approach achieves an orders-of-magnitude improvement in verification efficiency for the large network, and the obtained certified bounds are also significantly tighter than the state-of-the-art verifiers. We release our tool 3DVerifier via https://github.com/TrustAI/3DVerifier for use by the community.
    Blessing of Nonconvexity in Deep Linear Models: Depth Flattens the Optimization Landscape Around the True Solution. (arXiv:2207.07612v1 [cs.LG])
    This work characterizes the effect of depth on the optimization landscape of linear regression, showing that, despite their nonconvexity, deeper models have more desirable optimization landscape. We consider a robust and over-parameterized setting, where a subset of measurements are grossly corrupted with noise and the true linear model is captured via an $N$-layer linear neural network. On the negative side, we show that this problem \textit{does not} have a benign landscape: given any $N\geq 1$, with constant probability, there exists a solution corresponding to the ground truth that is neither local nor global minimum. However, on the positive side, we prove that, for any $N$-layer model with $N\geq 2$, a simple sub-gradient method becomes oblivious to such ``problematic'' solutions; instead, it converges to a balanced solution that is not only close to the ground truth but also enjoys a flat local landscape, thereby eschewing the need for "early stopping". Lastly, we empirically verify that the desirable optimization landscape of deeper models extends to other robust learning tasks, including deep matrix recovery and deep ReLU networks with $\ell_1$-loss.
    Position Prediction as an Effective Pretraining Strategy. (arXiv:2207.07611v1 [cs.LG])
    Transformers have gained increasing popularity in a wide range of applications, including Natural Language Processing (NLP), Computer Vision and Speech Recognition, because of their powerful representational capacity. However, harnessing this representational capacity effectively requires a large amount of data, strong regularization, or both, to mitigate overfitting. Recently, the power of the Transformer has been unlocked by self-supervised pretraining strategies based on masked autoencoders which rely on reconstructing masked inputs, directly, or contrastively from unmasked content. This pretraining strategy which has been used in BERT models in NLP, Wav2Vec models in Speech and, recently, in MAE models in Vision, forces the model to learn about relationships between the content in different parts of the input using autoencoding related objectives. In this paper, we propose a novel, but surprisingly simple alternative to content reconstruction~-- that of predicting locations from content, without providing positional information for it. Doing so requires the Transformer to understand the positional relationships between different parts of the input, from their content alone. This amounts to an efficient implementation where the pretext task is a classification problem among all possible positions for each input token. We experiment on both Vision and Speech benchmarks, where our approach brings improvements over strong supervised training baselines and is comparable to modern unsupervised/self-supervised pretraining methods. Our method also enables Transformers trained without position embeddings to outperform ones trained with full position information.
    Pick your Neighbor: Local Gauss-Southwell Rule for Fast Asynchronous Decentralized Optimization. (arXiv:2207.07543v1 [math.OC])
    In decentralized optimization environments, each agent $i$ in a network of $n$ optimization nodes possesses a private function $f_i$, and nodes communicate with their neighbors to cooperatively minimize the aggregate objective $\sum_{i=1}^n f_i$. In this setting, synchronizing the nodes' updates incurs significant communication overhead and computational costs, so much of the recent literature has focused on the analysis and design of asynchronous optimization algorithms where agents activate and communicate at arbitrary times, without requiring a global synchronization enforcer. Nonetheless, in most of the work on the topic, active nodes select a neighbor to contact based on a fixed probability (e.g., uniformly at random), a choice that ignores the optimization landscape at the moment of activation. Instead, in this work we introduce an optimization-aware selection rule that chooses the neighbor with the highest dual cost improvement (a quantity related to a consensus-based dualization of the problem at hand). This scheme is related to the coordinate descent (CD) method with a Gauss-Southwell (GS) rule for coordinate updates; in our setting however, only a subset of coordinates is accessible at each iteration (because each node is constrained to communicate only with its direct neighbors), so the existing literature on GS methods does not apply. To overcome this difficulty, we develop a new analytical framework for smooth and strongly convex $f_i$ that covers the class of set-wise CD algorithms -- a class that directly applies to decentralized scenarios, but is not limited to them -- and we show that the proposed set-wise GS rule achieves a speedup by a factor of up to the maximum degree in the network (which is of the order of $\Theta(n)$ in highly connected graphs). The speedup predicted by our theoretical analysis is subsequently validated in numerical experiments with synthetic data.
    On the Usefulness of Deep Ensemble Diversity for Out-of-Distribution Detection. (arXiv:2207.07517v1 [cs.LG])
    The ability to detect Out-of-Distribution (OOD) data is important in safety-critical applications of deep learning. The aim is to separate In-Distribution (ID) data drawn from the training distribution from OOD data using a measure of uncertainty extracted from a deep neural network. Deep Ensembles are a well-established method of improving the quality of uncertainty estimates produced by deep neural networks, and have been shown to have superior OOD detection performance compared to single models. An existing intuition in the literature is that the diversity of Deep Ensemble predictions indicates distributional shift, and so measures of diversity such as Mutual Information (MI) should be used for OOD detection. We show experimentally that this intuition is not valid on ImageNet-scale OOD detection -- using MI leads to 30-40% worse %FPR@95 compared to single-model entropy on some OOD datasets. We suggest an alternative explanation for Deep Ensembles' better OOD detection performance -- OOD detection is binary classification and we are ensembling diverse classifiers. As such we show that practically, even better OOD detection performance can be achieved for Deep Ensembles by averaging task-specific detection scores such as Energy over the ensemble.
    Selection of the Most Probable Best. (arXiv:2207.07533v1 [stat.ME])
    We consider an expected-value ranking and selection problem where all k solutions' simulation outputs depend on a common uncertain input model. Given that the uncertainty of the input model is captured by a probability simplex on a finite support, we define the most probable best (MPB) to be the solution whose probability of being optimal is the largest. To devise an efficient sampling algorithm to find the MPB, we first derive a lower bound to the large deviation rate of the probability of falsely selecting the MPB, then formulate an optimal computing budget allocation (OCBA) problem to find the optimal static sampling ratios for all solution-input model pairs that maximize the lower bound. We devise a series of sequential algorithms that apply interpretable and computationally efficient sampling rules and prove their sampling ratios achieve the optimality conditions for the OCBA problem as the simulation budget increases. The algorithms are benchmarked against a state-of-the-art sequential sampling algorithm designed for contextual ranking and selection problems and demonstrated to have superior empirical performances at finding the MPB.
    Z-Index at CheckThat! Lab 2022: Check-Worthiness Identification on Tweet Text. (arXiv:2207.07308v1 [cs.CL])
    The wide use of social media and digital technologies facilitates sharing various news and information about events and activities. Despite sharing positive information misleading and false information is also spreading on social media. There have been efforts in identifying such misleading information both manually by human experts and automatic tools. Manual effort does not scale well due to the high volume of information, containing factual claims, are appearing online. Therefore, automatically identifying check-worthy claims can be very useful for human experts. In this study, we describe our participation in Subtask-1A: Check-worthiness of tweets (English, Dutch and Spanish) of CheckThat! lab at CLEF 2022. We performed standard preprocessing steps and applied different models to identify whether a given text is worthy of fact checking or not. We use the oversampling technique to balance the dataset and applied SVM and Random Forest (RF) with TF-IDF representations. We also used BERT multilingual (BERT-m) and XLM-RoBERTa-base pre-trained models for the experiments. We used BERT-m for the official submissions and our systems ranked as 3rd, 5th, and 12th in Spanish, Dutch, and English, respectively. In further experiments, our evaluation shows that transformer models (BERT-m and XLM-RoBERTa-base) outperform the SVM and RF in Dutch and English languages where a different scenario is observed for Spanish.
    pathGCN: Learning General Graph Spatial Operators from Paths. (arXiv:2207.07408v1 [cs.LG])
    Graph Convolutional Networks (GCNs), similarly to Convolutional Neural Networks (CNNs), are typically based on two main operations - spatial and point-wise convolutions. In the context of GCNs, differently from CNNs, a pre-determined spatial operator based on the graph Laplacian is often chosen, allowing only the point-wise operations to be learnt. However, learning a meaningful spatial operator is critical for developing more expressive GCNs for improved performance. In this paper we propose pathGCN, a novel approach to learn the spatial operator from random paths on the graph. We analyze the convergence of our method and its difference from existing GCNs. Furthermore, we discuss several options of combining our learnt spatial operator with point-wise convolutions. Our extensive experiments on numerous datasets suggest that by properly learning both the spatial and point-wise convolutions, phenomena like over-smoothing can be inherently avoided, and new state-of-the-art performance is achieved.
    Augmenting Softmax Information for Selective Classification with Out-of-Distribution Data. (arXiv:2207.07506v1 [cs.LG])
    Detecting out-of-distribution (OOD) data is a task that is receiving an increasing amount of research attention in the domain of deep learning for computer vision. However, the performance of detection methods is generally evaluated on the task in isolation, rather than also considering potential downstream tasks in tandem. In this work, we examine selective classification in the presence of OOD data (SCOD). That is to say, the motivation for detecting OOD samples is to reject them so their impact on the quality of predictions is reduced. We show under this task specification, that existing post-hoc methods perform quite differently compared to when evaluated only on OOD detection. This is because it is no longer an issue to conflate in-distribution (ID) data with OOD data if the ID data is going to be misclassified. However, the conflation within ID data of correct and incorrect predictions becomes undesirable. We also propose a novel method for SCOD, Softmax Information Retaining Combination (SIRC), that augments softmax-based confidence scores with feature-agnostic information such that their ability to identify OOD samples is improved without sacrificing separation between correct and incorrect ID predictions. Experiments on a wide variety of ImageNet-scale datasets and convolutional neural network architectures show that SIRC is able to consistently match or outperform the baseline for SCOD, whilst existing OOD detection methods fail to do so.
    Skill-based Model-based Reinforcement Learning. (arXiv:2207.07560v1 [cs.LG])
    Model-based reinforcement learning (RL) is a sample-efficient way of learning complex behaviors by leveraging a learned single-step dynamics model to plan actions in imagination. However, planning every action for long-horizon tasks is not practical, akin to a human planning out every muscle movement. Instead, humans efficiently plan with high-level skills to solve complex tasks. From this intuition, we propose a Skill-based Model-based RL framework (SkiMo) that enables planning in the skill space using a skill dynamics model, which directly predicts the skill outcomes, rather than predicting all small details in the intermediate states, step by step. For accurate and efficient long-term planning, we jointly learn the skill dynamics model and a skill repertoire from prior experience. We then harness the learned skill dynamics model to accurately simulate and plan over long horizons in the skill space, which enables efficient downstream learning of long-horizon, sparse reward tasks. Experimental results in navigation and manipulation domains show that SkiMo extends the temporal horizon of model-based approaches and improves the sample efficiency for both model-based RL and skill-based RL. Code and videos are available at \url{https://clvrai.com/skimo}
    Feasibility of Inconspicuous GAN-generated Adversarial Patches against Object Detection. (arXiv:2207.07347v1 [cs.CV])
    Standard approaches for adversarial patch generation lead to noisy conspicuous patterns, which are easily recognizable by humans. Recent research has proposed several approaches to generate naturalistic patches using generative adversarial networks (GANs), yet only a few of them were evaluated on the object detection use case. Moreover, the state of the art mostly focuses on suppressing a single large bounding box in input by overlapping it with the patch directly. Suppressing objects near the patch is a different, more complex task. In this work, we have evaluated the existing approaches to generate inconspicuous patches. We have adapted methods, originally developed for different computer vision tasks, to the object detection use case with YOLOv3 and the COCO dataset. We have evaluated two approaches to generate naturalistic patches: by incorporating patch generation into the GAN training process and by using the pretrained GAN. For both cases, we have assessed a trade-off between performance and naturalistic patch appearance. Our experiments have shown, that using a pre-trained GAN helps to gain realistic-looking patches while preserving the performance similar to conventional adversarial patches.
    Low Rank Approximation for General Tensor Networks. (arXiv:2207.07417v1 [cs.DS])
    We study the problem of approximating a given tensor with $q$ modes $A \in \mathbb{R}^{n \times \ldots \times n}$ with an arbitrary tensor network of rank $k$ -- that is, a graph $G = (V, E)$, where $|V| = q$, together with a collection of tensors $\{U_v \mid v \in V\}$ which are contracted in the manner specified by $G$ to obtain a tensor $T$. For each mode of $U_v$ corresponding to an edge incident to $v$, the dimension is $k$, and we wish to find $U_v$ such that the Frobenius norm distance between $T$ and $A$ is minimized. This generalizes a number of well-known tensor network decompositions, such as the Tensor Train, Tensor Ring, Tucker, and PEPS decompositions. We approximate $A$ by a binary tree network $T'$ with $O(q)$ cores, such that the dimension on each edge of this network is at most $\widetilde{O}(k^{O(dt)} \cdot q/\varepsilon)$, where $d$ is the maximum degree of $G$ and $t$ is its treewidth, such that $\|A - T'\|_F^2 \leq (1 + \varepsilon) \|A - T\|_F^2$. The running time of our algorithm is $O(q \cdot \text{nnz}(A)) + n \cdot \text{poly}(k^{dt}q/\varepsilon)$, where $\text{nnz}(A)$ is the number of nonzero entries of $A$. Our algorithm is based on a new dimensionality reduction technique for tensor decomposition which may be of independent interest. We also develop fixed-parameter tractable $(1 + \varepsilon)$-approximation algorithms for Tensor Train and Tucker decompositions, improving the running time of Song, Woodruff and Zhong (SODA, 2019) and avoiding the use of generic polynomial system solvers. We show that our algorithms have a nearly optimal dependence on $1/\varepsilon$ assuming that there is no $O(1)$-approximation algorithm for the $2 \to 4$ norm with better running time than brute force. Finally, we give additional results for Tucker decomposition with robust loss functions, and fixed-parameter tractable CP decomposition.
    Towards Better Dermoscopic Image Feature Representation Learning for Melanoma Classification. (arXiv:2207.07303v1 [eess.IV])
    Deep learning-based melanoma classification with dermoscopic images has recently shown great potential in automatic early-stage melanoma diagnosis. However, limited by the significant data imbalance and obvious extraneous artifacts, i.e., the hair and ruler markings, discriminative feature extraction from dermoscopic images is very challenging. In this study, we seek to resolve these problems respectively towards better representation learning for lesion features. Specifically, a GAN-based data augmentation (GDA) strategy is adapted to generate synthetic melanoma-positive images, in conjunction with the proposed implicit hair denoising (IHD) strategy. Wherein the hair-related representations are implicitly disentangled via an auxiliary classifier network and reversely sent to the melanoma-feature extraction backbone for better melanoma-specific representation learning. Furthermore, to train the IHD module, the hair noises are additionally labeled on the ISIC2020 dataset, making it the first large-scale dermoscopic dataset with annotation of hair-like artifacts. Extensive experiments demonstrate the superiority of the proposed framework as well as the effectiveness of each component. The improved dataset publicly avaliable at https://github.com/kirtsy/DermoscopicDataset.
    A Systematic Review and Replicability Study of BERT4Rec for Sequential Recommendation. (arXiv:2207.07483v1 [cs.IR])
    BERT4Rec is an effective model for sequential recommendation based on the Transformer architecture. In the original publication, BERT4Rec claimed superiority over other available sequential recommendation approaches (e.g. SASRec), and it is now frequently being used as a state-of-the art baseline for sequential recommendations. However, not all subsequent publications confirmed this result and proposed other models that were shown to outperform BERT4Rec in effectiveness. In this paper we systematically review all publications that compare BERT4Rec with another popular Transformer-based model, namely SASRec, and show that BERT4Rec results are not consistent within these publications. To understand the reasons behind this inconsistency, we analyse the available implementations of BERT4Rec and show that we fail to reproduce results of the original BERT4Rec publication when using their default configuration parameters. However, we are able to replicate the reported results with the original code if training for a much longer amount of time (up to 30x) compared to the default configuration. We also propose our own implementation of BERT4Rec based on the Hugging Face Transformers library, which we demonstrate replicates the originally reported results on 3 out 4 datasets, while requiring up to 95% less training time to converge. Overall, from our systematic review and detailed experiments, we conclude that BERT4Rec does indeed exhibit state-of-the-art effectiveness for sequential recommendation, but only when trained for a sufficient amount of time. Additionally, we show that our implementation can further benefit from adapting other Transformer architectures that are available in the Hugging Face Transformers library (e.g. using disentangled attention, as provided by DeBERTa, or larger hidden layer size cf. ALBERT).
    Heuristic-free Optimization of Force-Controlled Robot Search Strategies in Stochastic Environments. (arXiv:2207.07524v1 [cs.RO])
    In both industrial and service domains, a central benefit of the use of robots is their ability to quickly and reliably execute repetitive tasks. However, even relatively simple peg-in-hole tasks are typically subject to stochastic variations, requiring search motions to find relevant features such as holes. While search improves robustness, it comes at the cost of increased runtime: More exhaustive search will maximize the probability of successfully executing a given task, but will significantly delay any downstream tasks. This trade-off is typically resolved by human experts according to simple heuristics, which are rarely optimal. This paper introduces an automatic, data-driven and heuristic-free approach to optimize robot search strategies. By training a neural model of the search strategy on a large set of simulated stochastic environments, conditioning it on few real-world examples and inverting the model, we can infer search strategies which adapt to the time-variant characteristics of the underlying probability distributions, while requiring very few real-world measurements. We evaluate our approach on two different industrial robots in the context of spiral and probe search for THT electronics assembly.
    Zero-Shot Assistance in Novel Decision Problems. (arXiv:2202.07364v2 [cs.LG] UPDATED)
    We consider the problem of creating assistants that can help agents - often humans - solve novel sequential decision problems, assuming the agent is not able to specify the reward function explicitly to the assistant. Instead of aiming to automate, and act in place of the agent as in current approaches, we give the assistant an advisory role and keep the agent in the loop as the main decision maker. The difficulty is that we must account for potential biases induced by limitations or constraints of the agent which may cause it to seemingly irrationally reject advice. To do this we introduce a novel formalization of assistance that models these biases, allowing the assistant to infer and adapt to them. We then introduce a new method for planning the assistant's advice which can scale to large decision making problems. Finally, we show experimentally that our approach adapts to these agent biases, and results in higher cumulative reward for the agent than automation-based alternatives.
    RITA: a Study on Scaling Up Generative Protein Sequence Models. (arXiv:2205.05789v2 [q-bio.QM] UPDATED)
    In this work we introduce RITA: a suite of autoregressive generative models for protein sequences, with up to 1.2 billion parameters, trained on over 280 million protein sequences belonging to the UniRef-100 database. Such generative models hold the promise of greatly accelerating protein design. We conduct the first systematic study of how capabilities evolve with model size for autoregressive transformers in the protein domain: we evaluate RITA models in next amino acid prediction, zero-shot fitness, and enzyme function prediction, showing benefits from increased scale. We release the RITA models openly, to the benefit of the research community.
    Meta-Calibration: Learning of Model Calibration Using Differentiable Expected Calibration Error. (arXiv:2106.09613v2 [cs.LG] UPDATED)
    Calibration of neural networks is a topical problem that is becoming more and more important as neural networks increasingly underpin real-world applications. The problem is especially noticeable when using modern neural networks, for which there is a significant difference between the confidence of the model and the probability of correct prediction. Various strategies have been proposed to improve calibration, yet accurate calibration remains challenging. We propose a novel framework with two contributions: introducing a differentiable surrogate for expected calibration error (DECE) that allows calibration quality to be directly optimised, and a meta-learning framework that uses DECE to optimise for validation set calibration with respect to model hyper-parameters. The results show that we achieve competitive performance with state-of-the-art calibration approaches. Our framework opens up a new avenue and toolset for tackling calibration, which we believe will inspire further work in this important challenge.
    Joint Application of the Target Trial Causal Framework and Machine Learning Modeling to Optimize Antibiotic Therapy: Use Case on Acute Bacterial Skin and Skin Structure Infections due to Methicillin-resistant Staphylococcus aureus. (arXiv:2207.07458v1 [stat.ML])
    Bacterial infections are responsible for high mortality worldwide. Antimicrobial resistance underlying the infection, and multifaceted patient's clinical status can hamper the correct choice of antibiotic treatment. Randomized clinical trials provide average treatment effect estimates but are not ideal for risk stratification and optimization of therapeutic choice, i.e., individualized treatment effects (ITE). Here, we leverage large-scale electronic health record data, collected from Southern US academic clinics, to emulate a clinical trial, i.e., 'target trial', and develop a machine learning model of mortality prediction and ITE estimation for patients diagnosed with acute bacterial skin and skin structure infection (ABSSSI) due to methicillin-resistant Staphylococcus aureus (MRSA). ABSSSI-MRSA is a challenging condition with reduced treatment options - vancomycin is the preferred choice, but it has non-negligible side effects. First, we use propensity score matching to emulate the trial and create a treatment randomized (vancomycin vs. other antibiotics) dataset. Next, we use this data to train various machine learning methods (including boosted/LASSO logistic regression, support vector machines, and random forest) and choose the best model in terms of area under the receiver characteristic (AUC) through bootstrap validation. Lastly, we use the models to calculate ITE and identify possible averted deaths by therapy change. The out-of-bag tests indicate that SVM and RF are the most accurate, with AUC of 81% and 78%, respectively, but BLR/LASSO is not far behind (76%). By calculating the counterfactuals using the BLR/LASSO, vancomycin increases the risk of death, but it shows a large variation (odds ratio 1.2, 95% range 0.4-3.8) and the contribution to outcome probability is modest. Instead, the RF exhibits stronger changes in ITE, suggesting more complex treatment heterogeneity.
    Context-sensitive neocortical neurons transform the effectiveness and efficiency of neural information processing. (arXiv:2207.07338v1 [cs.NE])
    There is ample neurobiological evidence that context-sensitive neocortical neurons use their apical inputs as context to amplify the transmission of coherent feedforward (FF) inputs. However, it has not been demonstrated until now how this known mechanism can provide useful neural computation. Here we show for the first time that the processing and learning capabilities of this form of neural information processing are well-matched to the abilities of mammalian neocortex. Specifically, we show that a network composed of such local processors restricts the transmission of conflicting information to higher levels and greatly reduces the amount of activity required to process large amounts of heterogeneous real-world data e.g., when processing audiovisual speech, these local processors use seen lip movements to selectively amplify FF transmission of the auditory information that those movements generate and vice versa. As this mechanism is shown to be far more effective and efficient than the best available forms of deep neural nets, it offers a step-change in understanding the brain's mysterious energy-saving mechanism and inspires advances in designing enhanced forms of biologically plausible machine learning algorithms.
    The Mechanical Neural Network(MNN) -- A physical implementation of a multilayer perceptron for education and hands-on experimentation. (arXiv:2207.07482v1 [cs.LG])
    In this paper the Mechanical Neural Network(MNN) is introduced, a physical implementation of a multilayer perceptron(MLP) with ReLU activation functions, two input neurons, four hidden neurons and two output neurons. This physical model of a MLP is used in education to give a hands on experience and allow students to experience the effect of changing the parameters of the network on the output. Neurons are small wooden levers which are connected by threads. Students can adapt the weights between the neurons by moving the clamps connecting a neuron via a thread to the next. The MNN can model real valued functions and logical operators including XOR.
    Direction-Aware Adaptive Online Neural Speech Enhancement with an Augmented Reality Headset in Real Noisy Conversational Environments. (arXiv:2207.07296v1 [eess.AS])
    This paper describes the practical response- and performance-aware development of online speech enhancement for an augmented reality (AR) headset that helps a user understand conversations made in real noisy echoic environments (e.g., cocktail party). One may use a state-of-the-art blind source separation method called fast multichannel nonnegative matrix factorization (FastMNMF) that works well in various environments thanks to its unsupervised nature. Its heavy computational cost, however, prevents its application to real-time processing. In contrast, a supervised beamforming method that uses a deep neural network (DNN) for estimating spatial information of speech and noise readily fits real-time processing, but suffers from drastic performance degradation in mismatched conditions. Given such complementary characteristics, we propose a dual-process robust online speech enhancement method based on DNN-based beamforming with FastMNMF-guided adaptation. FastMNMF (back end) is performed in a mini-batch style and the noisy and enhanced speech pairs are used together with the original parallel training data for updating the direction-aware DNN (front end) with backpropagation at a computationally-allowable interval. This method is used with a blind dereverberation method called weighted prediction error (WPE) for transcribing the noisy reverberant speech of a speaker, which can be detected from video or selected by a user's hand gesture or eye gaze, in a streaming manner and spatially showing the transcriptions with an AR technique. Our experiment showed that the word error rate was improved by more than 10 points with the run-time adaptation using only twelve minutes of observation.
    Outlier detection of vital sign trajectories from COVID-19 patients. (arXiv:2207.07572v1 [cs.LG])
    There is growing interest in continuous wearable vital sign sensors for monitoring patients remotely at home. These monitors are usually coupled to an alerting system, which is triggered when vital sign measurements fall outside a predefined normal range. Trends in vital signs, such as an increasing heart rate, are often indicative of deteriorating health, but are rarely incorporated into alerting systems. In this work, we present a novel outlier detection algorithm to identify such abnormal vital sign trends. We introduce a distance-based measure to compare vital sign trajectories. For each patient in our dataset, we split vital sign time series into 180 minute, non-overlapping epochs. We then calculated a distance between all pairs of epochs using the dynamic time warp distance. Each epoch was characterized by its mean pairwise distance (average link distance) to all other epochs, with large distances considered as outliers. We applied this method to a pilot dataset collected over 1561 patient-hours from 8 patients who had recently been discharged from hospital after contracting COVID-19. We show that outlier epochs correspond well with patients who were subsequently readmitted to hospital. We also show, descriptively, how epochs transition from normal to abnormal for one such patient.
    The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning. (arXiv:2207.07570v1 [cs.LG])
    We study the multi-step off-policy learning approach to distributional RL. Despite the apparent similarity between value-based RL and distributional RL, our study reveals intriguing and fundamental differences between the two cases in the multi-step setting. We identify a novel notion of path-dependent distributional TD error, which is indispensable for principled multi-step distributional RL. The distinction from the value-based case bears important implications on concepts such as backward-view algorithms. Our work provides the first theoretical guarantees on multi-step off-policy distributional RL algorithms, including results that apply to the small number of existing approaches to multi-step distributional RL. In addition, we derive a novel algorithm, Quantile Regression-Retrace, which leads to a deep RL agent QR-DQN-Retrace that shows empirical improvements over QR-DQN on the Atari-57 benchmark. Collectively, we shed light on how unique challenges in multi-step distributional RL can be addressed both in theory and practice.
    Does Twitter know your political views? POLiTweets dataset and semi-automatic method for political leaning discovery. (arXiv:2207.07586v1 [cs.CL])
    Every day, the world is flooded by millions of messages and statements posted on Twitter or Facebook. Social media platforms try to protect users' personal data, but there still is a real risk of misuse, including elections manipulation. Did you know, that only 13 posts addressing important or controversial topics for society are enough to predict one's political affiliation with a 0.85 F1-score? To examine this phenomenon, we created a novel universal method of semi-automated political leaning discovery. It relies on a heuristical data annotation procedure, which was evaluated to achieve 0.95 agreement with human annotators (counted as an accuracy metric). We also present POLiTweets - the first publicly open Polish dataset for political affiliation discovery in a multi-party setup, consisting of over 147k tweets from almost 10k Polish-writing users annotated heuristically and almost 40k tweets from 166 users annotated manually as a test set. We used our data to study the aspects of domain shift in the context of topics and the type of content writers - ordinary citizens vs. professional politicians.
    Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models. (arXiv:2207.07646v1 [cs.CV])
    Utilizing vision and language models (VLMs) pre-trained on large-scale image-text pairs is becoming a promising paradigm for open-vocabulary visual recognition. In this work, we extend this paradigm by leveraging motion and audio that naturally exist in video. We present \textbf{MOV}, a simple yet effective method for \textbf{M}ultimodal \textbf{O}pen-\textbf{V}ocabulary video classification. In MOV, we directly use the vision encoder from pre-trained VLMs with minimal modifications to encode video, optical flow and audio spectrogram. We design a cross-modal fusion mechanism to aggregate complimentary multimodal information. Experiments on Kinetics-700 and VGGSound show that introducing flow or audio modality brings large performance gains over the pre-trained VLM and existing methods. Specifically, MOV greatly improves the accuracy on base classes, while generalizes better on novel classes. MOV achieves state-of-the-art results on UCF and HMDB zero-shot video classification benchmarks, significantly outperforming both traditional zero-shot methods and recent methods based on VLMs. Code and models will be released.
    CheXplaining in Style: Counterfactual Explanations for Chest X-rays using StyleGAN. (arXiv:2207.07553v1 [eess.IV])
    Deep learning models used in medical image analysis are prone to raising reliability concerns due to their black-box nature. To shed light on these black-box models, previous works predominantly focus on identifying the contribution of input features to the diagnosis, i.e., feature attribution. In this work, we explore counterfactual explanations to identify what patterns the models rely on for diagnosis. Specifically, we investigate the effect of changing features within chest X-rays on the classifier's output to understand its decision mechanism. We leverage a StyleGAN-based approach (StyleEx) to create counterfactual explanations for chest X-rays by manipulating specific latent directions in their latent space. In addition, we propose EigenFind to significantly reduce the computation time of generated explanations. We clinically evaluate the relevancy of our counterfactual explanations with the help of radiologists. Our code is publicly available.
    MIMO-DoAnet: Multi-channel Input and Multiple Outputs DoA Network with Unknown Number of Sound Sources. (arXiv:2207.07307v1 [eess.AS])
    Recent neural network based Direction of Arrival (DoA) estimation algorithms have performed well on unknown number of sound sources scenarios. These algorithms are usually achieved by mapping the multi-channel audio input to the single output (i.e. overall spatial pseudo-spectrum (SPS) of all sources), that is called MISO. However, such MISO algorithms strongly depend on empirical threshold setting and the angle assumption that the angles between the sound sources are greater than a fixed angle. To address these limitations, we propose a novel multi-channel input and multiple outputs DoA network called MIMO-DoAnet. Unlike the general MISO algorithms, MIMO-DoAnet predicts the SPS coding of each sound source with the help of the informative spatial covariance matrix. By doing so, the threshold task of detecting the number of sound sources becomes an easier task of detecting whether there is a sound source in each output, and the serious interaction between sound sources disappears during inference stage. Experimental results show that MIMO-DoAnet achieves relative 18.6% and absolute 13.3%, relative 34.4% and absolute 20.2% F1 score improvement compared with the MISO baseline system in 3, 4 sources scenes. The results also demonstrate MIMO-DoAnet alleviates the threshold setting problem and solves the angle assumption problem effectively.
    Attention, Filling in The Gaps for Generalization in Routing Problems. (arXiv:2207.07212v1 [cs.LG])
    Machine Learning (ML) methods have become a useful tool for tackling vehicle routing problems, either in combination with popular heuristics or as standalone models. However, current methods suffer from poor generalization when tackling problems of different sizes or different distributions. As a result, ML in vehicle routing has witnessed an expansion phase with new methodologies being created for particular problem instances that become infeasible at larger problem sizes. This paper aims at encouraging the consolidation of the field through understanding and improving current existing models, namely the attention model by Kool et al. We identify two discrepancy categories for VRP generalization. The first is based on the differences that are inherent to the problems themselves, and the second relates to architectural weaknesses that limit the model's ability to generalize. Our contribution becomes threefold: We first target model discrepancies by adapting the Kool et al. method and its loss function for Sparse Dynamic Attention based on the alpha-entmax activation. We then target inherent differences through the use of a mixed instance training method that has been shown to outperform single instance training in certain scenarios. Finally, we introduce a framework for inference level data augmentation that improves performance by leveraging the model's lack of invariance to rotation and dilation changes.  ( 2 min )
    Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments. (arXiv:2207.07273v1 [eess.AS])
    This paper describes noisy speech recognition for an augmented reality headset that helps verbal communication within real multiparty conversational environments. A major approach that has actively been studied in simulated environments is to sequentially perform speech enhancement and automatic speech recognition (ASR) based on deep neural networks (DNNs) trained in a supervised manner. In our task, however, such a pretrained system fails to work due to the mismatch between the training and test conditions and the head movements of the user. To enhance only the utterances of a target speaker, we use beamforming based on a DNN-based speech mask estimator that can adaptively extract the speech components corresponding to a head-relative particular direction. We propose a semi-supervised adaptation method that jointly updates the mask estimator and the ASR model at run-time using clean speech signals with ground-truth transcriptions and noisy speech signals with highly-confident estimated transcriptions. Comparative experiments using the state-of-the-art distant speech recognition system show that the proposed method significantly improves the ASR performance.  ( 2 min )
    Emotion Recognition in Conversation using Probabilistic Soft Logic. (arXiv:2207.07238v1 [cs.LG])
    Creating agents that can both appropriately respond to conversations and understand complex human linguistic tendencies and social cues has been a long standing challenge in the NLP community. A recent pillar of research revolves around emotion recognition in conversation (ERC); a sub-field of emotion recognition that focuses on conversations or dialogues that contain two or more utterances. In this work, we explore an approach to ERC that exploits the use of neural embeddings along with complex structures in dialogues. We implement our approach in a framework called Probabilistic Soft Logic (PSL), a declarative templating language that uses first-order like logical rules, that when combined with data, define a particular class of graphical model. Additionally, PSL provides functionality for the incorporation of results from neural models into PSL models. This allows our model to take advantage of advanced neural methods, such as sentence embeddings, and logical reasoning over the structure of a dialogue. We compare our method with state-of-the-art purely neural ERC systems, and see almost a 20% improvement. With these results, we provide an extensive qualitative and quantitative analysis over the DailyDialog conversation dataset.  ( 2 min )
    Improving Task-free Continual Learning by Distributionally Robust Memory Evolution. (arXiv:2207.07256v1 [cs.LG])
    Task-free continual learning (CL) aims to learn a non-stationary data stream without explicit task definitions and not forget previous knowledge. The widely adopted memory replay approach could gradually become less effective for long data streams, as the model may memorize the stored examples and overfit the memory buffer. Second, existing methods overlook the high uncertainty in the memory data distribution since there is a big gap between the memory data distribution and the distribution of all the previous data examples. To address these problems, for the first time, we propose a principled memory evolution framework to dynamically evolve the memory data distribution by making the memory buffer gradually harder to be memorized with distributionally robust optimization (DRO). We then derive a family of methods to evolve the memory buffer data in the continuous probability measure space with Wasserstein gradient flow (WGF). The proposed DRO is w.r.t the worst-case evolved memory data distribution, thus guarantees the model performance and learns significantly more robust features than existing memory-replay-based methods. Extensive experiments on existing benchmarks demonstrate the effectiveness of the proposed methods for alleviating forgetting. As a by-product of the proposed framework, our method is more robust to adversarial examples than existing task-free CL methods.  ( 2 min )
    LapSeg3D: Weakly Supervised Semantic Segmentation of Point Clouds Representing Laparoscopic Scenes. (arXiv:2207.07418v1 [cs.CV])
    The semantic segmentation of surgical scenes is a prerequisite for task automation in robot assisted interventions. We propose LapSeg3D, a novel DNN-based approach for the voxel-wise annotation of point clouds representing surgical scenes. As the manual annotation of training data is highly time consuming, we introduce a semi-autonomous clustering-based pipeline for the annotation of the gallbladder, which is used to generate segmented labels for the DNN. When evaluated against manually annotated data, LapSeg3D achieves an F1 score of 0.94 for gallbladder segmentation on various datasets of ex-vivo porcine livers. We show LapSeg3D to generalize accurately across different gallbladders and datasets recorded with different RGB-D camera systems.  ( 2 min )
    Modeling Quality and Machine Learning Pipelines through Extended Feature Models. (arXiv:2207.07528v1 [cs.SE])
    The recently increased complexity of Machine Learning (ML) methods, led to the necessity to lighten both the research and industry development processes. ML pipelines have become an essential tool for experts of many domains, data scientists and researchers, allowing them to easily put together several ML models to cover the full analytic process starting from raw datasets. Over the years, several solutions have been proposed to automate the building of ML pipelines, most of them focused on semantic aspects and characteristics of the input dataset. However, an approach taking into account the new quality concerns needed by ML systems (like fairness, interpretability, privacy, etc.) is still missing. In this paper, we first identify, from the literature, key quality attributes of ML systems. Further, we propose a new engineering approach for quality ML pipeline by properly extending the Feature Models meta-model. The presented approach allows to model ML pipelines, their quality requirements (on the whole pipeline and on single phases), and quality characteristics of algorithms used to implement each pipeline phase. Finally, we demonstrate the expressiveness of our model considering the classification problem.  ( 2 min )
    Pattern Analysis of Money Flow in the Bitcoin Blockchain. (arXiv:2207.07315v1 [cs.SI])
    Bitcoin is the first and highest valued cryptocurrency that stores transactions in a publicly distributed ledger called the blockchain. Understanding the activity and behavior of Bitcoin actors is a crucial research topic as they are pseudonymous in the transaction network. In this article, we propose a method based on taint analysis to extract taint flows --dynamic networks representing the sequence of Bitcoins transferred from an initial source to other actors until dissolution. Then, we apply graph embedding methods to characterize taint flows. We evaluate our embedding method with taint flows from top mining pools and show that it can classify mining pools with high accuracy. We also found that taint flows from the same period show high similarity. Our work proves that tracing the money flows can be a promising approach to classifying source actors and characterizing different money flow patterns  ( 2 min )
    Error analysis for deep neural network approximations of parametric hyperbolic conservation laws. (arXiv:2207.07362v1 [math.NA])
    We derive rigorous bounds on the error resulting from the approximation of the solution of parametric hyperbolic scalar conservation laws with ReLU neural networks. We show that the approximation error can be made as small as desired with ReLU neural networks that overcome the curse of dimensionality. In addition, we provide an explicit upper bound on the generalization error in terms of the training error, number of training samples and the neural network size. The theoretical results are illustrated by numerical experiments.  ( 2 min )
    Riemannian Natural Gradient Methods. (arXiv:2207.07287v1 [math.OC])
    This paper studies large-scale optimization problems on Riemannian manifolds whose objective function is a finite sum of negative log-probability losses. Such problems arise in various machine learning and signal processing applications. By introducing the notion of Fisher information matrix in the manifold setting, we propose a novel Riemannian natural gradient method, which can be viewed as a natural extension of the natural gradient method from the Euclidean setting to the manifold setting. We establish the almost-sure global convergence of our proposed method under standard assumptions. Moreover, we show that if the loss function satisfies certain convexity and smoothness conditions and the input-output map satisfies a Riemannian Jacobian stability condition, then our proposed method enjoys a local linear -- or, under the Lipschitz continuity of the Riemannian Jacobian of the input-output map, even quadratic -- rate of convergence. We then prove that the Riemannian Jacobian stability condition will be satisfied by a two-layer fully connected neural network with batch normalization with high probability, provided that the width of the network is sufficiently large. This demonstrates the practical relevance of our convergence rate result. Numerical experiments on applications arising from machine learning demonstrate the advantages of the proposed method over state-of-the-art ones.  ( 2 min )
    Stable Invariant Models via Koopman Spectra. (arXiv:2207.07475v1 [cs.LG])
    Weight-tied models have attracted attention in the modern development of neural networks. The deep equilibrium model (DEQ) represents infinitely deep neural networks with weight-tying, and recent studies have shown the potential of this type of approach. DEQs are needed to iteratively solve root-finding problems in training and are built on the assumption that the underlying dynamics determined by the models converge to a fixed point. In this paper, we present the stable invariant model (SIM), a new class of deep models that in principle approximates DEQs under stability and extends the dynamics to more general ones converging to an invariant set (not restricted in a fixed point). The key ingredient in deriving SIMs is a representation of the dynamics with the spectra of the Koopman and Perron--Frobenius operators. This perspective approximately reveals stable dynamics with DEQs and then derives two variants of SIMs. We also propose an implementation of SIMs that can be learned in the same way as feedforward models. We illustrate the empirical performance of SIMs with experiments and demonstrate that SIMs achieve comparative or superior performance against DEQs in several learning tasks.  ( 2 min )
    Sparse Relational Reasoning with Object-Centric Representations. (arXiv:2207.07512v1 [cs.LG])
    We investigate the composability of soft-rules learned by relational neural architectures when operating over object-centric (slot-based) representations, under a variety of sparsity-inducing constraints. We find that increasing sparsity, especially on features, improves the performance of some models and leads to simpler relations. Additionally, we observe that object-centric representations can be detrimental when not all objects are fully captured; a failure mode to which CNNs are less prone. These findings demonstrate the trade-offs between interpretability and performance, even for models designed to tackle relational tasks.  ( 2 min )
    Lipschitz Bound Analysis of Neural Networks. (arXiv:2207.07232v1 [cs.LG])
    Lipschitz Bound Estimation is an effective method of regularizing deep neural networks to make them robust against adversarial attacks. This is useful in a variety of applications ranging from reinforcement learning to autonomous systems. In this paper, we highlight the significant gap in obtaining a non-trivial Lipschitz bound certificate for Convolutional Neural Networks (CNNs) and empirically support it with extensive graphical analysis. We also show that unrolling Convolutional layers or Toeplitz matrices can be employed to convert Convolutional Neural Networks (CNNs) to a Fully Connected Network. Further, we propose a simple algorithm to show the existing 20x-50x gap in a particular data distribution between the actual lipschitz constant and the obtained tight bound. We also ran sets of thorough experiments on various network architectures and benchmark them on datasets like MNIST and CIFAR-10. All these proposals are supported by extensive testing, graphs, histograms and comparative analysis.  ( 2 min )
    Contrastive Adapters for Foundation Model Group Robustness. (arXiv:2207.07180v1 [cs.LG])
    While large pretrained foundation models (FMs) have shown remarkable zero-shot classification robustness to dataset-level distribution shifts, their robustness to subpopulation or group shifts is relatively underexplored. We study this problem, and find that FMs such as CLIP may not be robust to various group shifts. Across 9 robustness benchmarks, zero-shot classification with their embeddings results in gaps of up to 80.7 percentage points (pp) between average and worst-group accuracy. Unfortunately, existing methods to improve robustness require retraining, which can be prohibitively expensive on large foundation models. We also find that efficient ways to improve model inference (e.g., via adapters, lightweight networks with FM embeddings as inputs) do not consistently improve and can sometimes hurt group robustness compared to zero-shot (e.g., increasing the accuracy gap by 50.1 pp on CelebA). We thus develop an adapter training strategy to effectively and efficiently improve FM group robustness. Our motivating observation is that while poor robustness results from groups in the same class being embedded far apart in the foundation model "embedding space," standard adapter training may not bring these points closer together. We thus propose contrastive adapting, which trains adapters with contrastive learning to bring sample embeddings close to both their ground-truth class embeddings and other sample embeddings in the same class. Across the 9 benchmarks, our approach consistently improves group robustness, raising worst-group accuracy by 8.5 to 56.0 pp over zero-shot. Our approach is also efficient, doing so without any FM finetuning and only a fixed set of frozen FM embeddings. On benchmarks such as Waterbirds and CelebA, this leads to worst-group accuracy comparable to state-of-the-art methods that retrain entire models, while only training $\leq$1% of the model parameters.  ( 3 min )
    K-level Reasoning for Zero-Shot Coordination in Hanabi. (arXiv:2207.07166v1 [cs.AI])
    The standard problem setting in cooperative multi-agent settings is self-play (SP), where the goal is to train a team of agents that works well together. However, optimal SP policies commonly contain arbitrary conventions ("handshakes") and are not compatible with other, independently trained agents or humans. This latter desiderata was recently formalized by Hu et al. 2020 as the zero-shot coordination (ZSC) setting and partially addressed with their Other-Play (OP) algorithm, which showed improved ZSC and human-AI performance in the card game Hanabi. OP assumes access to the symmetries of the environment and prevents agents from breaking these in a mutually incompatible way during training. However, as the authors point out, discovering symmetries for a given environment is a computationally hard problem. Instead, we show that through a simple adaption of k-level reasoning (KLR) Costa Gomes et al. 2006, synchronously training all levels, we can obtain competitive ZSC and ad-hoc teamplay performance in Hanabi, including when paired with a human-like proxy bot. We also introduce a new method, synchronous-k-level reasoning with a best response (SyKLRBR), which further improves performance on our synchronous KLR by co-training a best response.  ( 2 min )
    Making Linear MDPs Practical via Contrastive Representation Learning. (arXiv:2207.07150v1 [cs.LG])
    It is common to address the curse of dimensionality in Markov decision processes (MDPs) by exploiting low-rank representations. This motivates much of the recent theoretical study on linear MDPs. However, most approaches require a given representation under unrealistic assumptions about the normalization of the decomposition or introduce unresolved computational challenges in practice. Instead, we consider an alternative definition of linear MDPs that automatically ensures normalization while allowing efficient representation learning via contrastive estimation. The framework also admits confidence-adjusted index algorithms, enabling an efficient and principled approach to incorporating optimism or pessimism in the face of uncertainty. To the best of our knowledge, this provides the first practical representation learning method for linear MDPs that achieves both strong theoretical guarantees and empirical performance. Theoretically, we prove that the proposed algorithm is sample efficient in both the online and offline settings. Empirically, we demonstrate superior performance over existing state-of-the-art model-based and model-free algorithms on several benchmarks.  ( 2 min )
    Provably Adversarially Robust Nearest Prototype Classifiers. (arXiv:2207.07208v1 [cs.LG])
    Nearest prototype classifiers (NPCs) assign to each input point the label of the nearest prototype with respect to a chosen distance metric. A direct advantage of NPCs is that the decisions are interpretable. Previous work could provide lower bounds on the minimal adversarial perturbation in the $\ell_p$-threat model when using the same $\ell_p$-distance for the NPCs. In this paper we provide a complete discussion on the complexity when using $\ell_p$-distances for decision and $\ell_q$-threat models for certification for $p,q \in \{1,2,\infty\}$. In particular we provide scalable algorithms for the \emph{exact} computation of the minimal adversarial perturbation when using $\ell_2$-distance and improved lower bounds in other cases. Using efficient improved lower bounds we train our Provably adversarially robust NPC (PNPC), for MNIST which have better $\ell_2$-robustness guarantees than neural networks. Additionally, we show up to our knowledge the first certification results w.r.t. to the LPIPS perceptual metric which has been argued to be a more realistic threat model for image classification than $\ell_p$-balls. Our PNPC has on CIFAR10 higher certified robust accuracy than the empirical robust accuracy reported in (Laidlaw et al., 2021). The code is available in our repository.  ( 2 min )
    Current Trends in Deep Learning for Earth Observation: An Open-source Benchmark Arena for Image Classification. (arXiv:2207.07189v1 [cs.CV])
    We present 'AiTLAS: Benchmark Arena' -- an open-source benchmark framework for evaluating state-of-the-art deep learning approaches for image classification in Earth Observation (EO). To this end, we present a comprehensive comparative analysis of more than 400 models derived from nine different state-of-the-art architectures, and compare them to a variety of multi-class and multi-label classification tasks from 22 datasets with different sizes and properties. In addition to models trained entirely on these datasets, we also benchmark models trained in the context of transfer learning, leveraging pre-trained model variants, as it is typically performed in practice. All presented approaches are general and can be easily extended to many other remote sensing image classification tasks not considered in this study. To ensure reproducibility and facilitate better usability and further developments, all of the experimental resources including the trained models, model configurations and processing details of the datasets (with their corresponding splits used for training and evaluating the models) are publicly available on the repository: https://github.com/biasvariancelabs/aitlas-arena.  ( 2 min )
    LineCap: Line Charts for Data Visualization Captioning Models. (arXiv:2207.07243v1 [cs.CV])
    Data visualization captions help readers understand the purpose of a visualization and are crucial for individuals with visual impairments. The prevalence of poor figure captions and the successful application of deep learning approaches to image captioning motivate the use of similar techniques for automated figure captioning. However, research in this field has been stunted by the lack of suitable datasets. We introduce LineCap, a novel figure captioning dataset of 3,528 figures, and we provide insights from curating this dataset and using end-to-end deep learning models for automated figure captioning.  ( 2 min )
    Set-based value operators for non-stationary Markovian environments. (arXiv:2207.07271v1 [cs.LG])
    This paper analyzes finite state Markov Decision Processes (MDPs) with uncertain parameters in compact sets and re-examines results from robust MDP via set-based fixed point theory. We generalize the Bellman and policy evaluation operators to operators that contract on the space of value functions and denote them as \emph{value operators}. We generalize these value operators to act on the space of value function sets and denote them as \emph{set-based value operators}. We prove that these set-based value operators are contractions in the space of compact value function sets. Leveraging insights from set theory, we generalize the rectangularity condition for the Bellman operator from classic robust MDP literature to a \emph{containment condition} for a generic value operator, which is weaker and can be applied to a larger set of parameter-uncertain MDPs and contractive operators in dynamic programming and reinforcement learning. We prove that both the rectangularity condition and the containment condition sufficiently ensure that the set-based value operator's fixed point set contains its own supremum and infimum elements. For convex and compact sets of uncertain MDP parameters, we show equivalence between the classic robust value function and the supremum of the fixed point set of the set-based Bellman operator. Under dynamically changing MDP parameters in compact sets, we prove a set convergence result for value iteration, which otherwise may not converge to a single value function.  ( 3 min )
    ScaleNet: Searching for the Model to Scale. (arXiv:2207.07267v1 [cs.CV])
    Recently, community has paid increasing attention on model scaling and contributed to developing a model family with a wide spectrum of scales. Current methods either simply resort to a one-shot NAS manner to construct a non-structural and non-scalable model family or rely on a manual yet fixed scaling strategy to scale an unnecessarily best base model. In this paper, we bridge both two components and propose ScaleNet to jointly search base model and scaling strategy so that the scaled large model can have more promising performance. Concretely, we design a super-supernet to embody models with different spectrum of sizes (e.g., FLOPs). Then, the scaling strategy can be learned interactively with the base model via a Markov chain-based evolution algorithm and generalized to develop even larger models. To obtain a decent super-supernet, we design a hierarchical sampling strategy to enhance its training sufficiency and alleviate the disturbance. Experimental results show our scaled networks enjoy significant performance superiority on various FLOPs, but with at least 2.53x reduction on search cost. Codes are available at https://github.com/luminolx/ScaleNet.  ( 2 min )
    Accelerated Federated Learning with Decoupled Adaptive Optimization. (arXiv:2207.07223v1 [cs.LG])
    The federated learning (FL) framework enables edge clients to collaboratively learn a shared inference model while keeping privacy of training data on clients. Recently, many heuristics efforts have been made to generalize centralized adaptive optimization methods, such as SGDM, Adam, AdaGrad, etc., to federated settings for improving convergence and accuracy. However, there is still a paucity of theoretical principles on where to and how to design and utilize adaptive optimization methods in federated settings. This work aims to develop novel adaptive optimization methods for FL from the perspective of dynamics of ordinary differential equations (ODEs). First, an analytic framework is established to build a connection between federated optimization methods and decompositions of ODEs of corresponding centralized optimizers. Second, based on this analytic framework, a momentum decoupling adaptive optimization method, FedDA, is developed to fully utilize the global momentum on each local iteration and accelerate the training convergence. Last but not least, full batch gradients are utilized to mimic centralized optimization in the end of the training process to ensure the convergence and overcome the possible inconsistency caused by adaptive optimization methods.  ( 2 min )
    Assortment Optimization with Customer Choice Modeling in a Crowdfunding Setting. (arXiv:2207.07222v1 [q-fin.MF])
    Crowdfunding, which is the act of raising funds from a large number of people's contributions, is among the most popular research topics in economic theory. Due to the fact that crowdfunding platforms (CFPs) have facilitated the process of raising funds by offering several features, we should take their existence and survival in the marketplace into account. In this study, we investigated the significant role of platform features in a customer behavioral choice model. In particular, we proposed a multinomial logit model to describe the customers' (backers') behavior in a crowdfunding setting. We proceed by discussing the revenue-sharing model in these platforms. For this purpose, we conclude that an assortment optimization problem could be of major importance in order to maximize the platforms' revenue. We were able to derive a reasonable amount of data in some cases and implement two well-known machine learning methods such as multivariate regression and classification problems to predict the best assortments the platform could offer to every arriving customer. We compared the results of these two methods and investigated how well they perform in all cases.  ( 2 min )
    Sound Randomized Smoothing in Floating-Point Arithmetics. (arXiv:2207.07209v1 [cs.LG])
    Randomized smoothing is sound when using infinite precision. However, we show that randomized smoothing is no longer sound for limited floating-point precision. We present a simple example where randomized smoothing certifies a radius of $1.26$ around a point, even though there is an adversarial example in the distance $0.8$ and extend this example further to provide false certificates for CIFAR10. We discuss the implicit assumptions of randomized smoothing and show that they do not apply to generic image classification models whose smoothed versions are commonly certified. In order to overcome this problem, we propose a sound approach to randomized smoothing when using floating-point precision with essentially equal speed and matching the certificates of the standard, unsound practice for standard classifiers tested so far. Our only assumption is that we have access to a fair coin.  ( 2 min )
    COOR-PLT: A hierarchical control model for coordinating adaptive platoons of connected and autonomous vehicles at signal-free intersections based on deep reinforcement learning. (arXiv:2207.07195v1 [cs.LG])
    Platooning and coordination are two implementation strategies that are frequently proposed for traffic control of connected and autonomous vehicles (CAVs) at signal-free intersections instead of using conventional traffic signals. However, few studies have attempted to integrate both strategies to better facilitate the CAV control at signal-free intersections. To this end, this study proposes a hierarchical control model, named COOR-PLT, to coordinate adaptive CAV platoons at a signal-free intersection based on deep reinforcement learning (DRL). COOR-PLT has a two-layer framework. The first layer uses a centralized control strategy to form adaptive platoons. The optimal size of each platoon is determined by considering multiple objectives (i.e., efficiency, fairness and energy saving). The second layer employs a decentralized control strategy to coordinate multiple platoons passing through the intersection. Each platoon is labeled with coordinated status or independent status, upon which its passing priority is determined. As an efficient DRL algorithm, Deep Q-network (DQN) is adopted to determine platoon sizes and passing priorities respectively in the two layers. The model is validated and examined on the simulator Simulation of Urban Mobility (SUMO). The simulation results demonstrate that the model is able to: (1) achieve satisfactory convergence performances; (2) adaptively determine platoon size in response to varying traffic conditions; and (3) completely avoid deadlocks at the intersection. By comparison with other control methods, the model manifests its superiority of adopting adaptive platooning and DRL-based coordination strategies. Also, the model outperforms several state-of-the-art methods on reducing travel time and fuel consumption in different traffic conditions.  ( 3 min )
    Single Model Uncertainty Estimation via Stochastic Data Centering. (arXiv:2207.07235v1 [cs.LG])
    We are interested in estimating the uncertainties of deep neural networks, which play an important role in many scientific and engineering problems. In this paper, we present a striking new finding that an ensemble of neural networks with the same weight initialization, trained on datasets that are shifted by a constant bias gives rise to slightly inconsistent trained models, where the differences in predictions are a strong indicator of epistemic uncertainties. Using the neural tangent kernel (NTK), we demonstrate that this phenomena occurs in part because the NTK is not shift-invariant. Since this is achieved via a trivial input transformation, we show that it can therefore be approximated using just a single neural network -- using a technique that we call $\Delta-$UQ -- that estimates uncertainty around prediction by marginalizing out the effect of the biases. We show that $\Delta-$UQ's uncertainty estimates are superior to many of the current methods on a variety of benchmarks -- outlier rejection, calibration under distribution shift, and sequential design optimization of black box functions.  ( 2 min )
    NASRec: Weight Sharing Neural Architecture Search for Recommender Systems. (arXiv:2207.07187v1 [cs.IR])
    The rise of deep neural networks provides an important driver in optimizing recommender systems. However, the success of recommender systems lies in delicate architecture fabrication, and thus calls for Neural Architecture Search (NAS) to further improve its modeling. We propose NASRec, a paradigm that trains a single supernet and efficiently produces abundant models/sub-architectures by weight sharing. To overcome the data multi-modality and architecture heterogeneity challenges in recommendation domain, NASRec establishes a large supernet (i.e., search space) to search the full architectures, with the supernet incorporating versatile operator choices and dense connectivity minimizing human prior for flexibility. The scale and heterogeneity in NASRec impose challenges in search, such as training inefficiency, operator-imbalance, and degraded rank correlation. We tackle these challenges by proposing single-operator any-connection sampling, operator-balancing interaction modules, and post-training fine-tuning. Our results on three Click-Through Rates (CTR) prediction benchmarks show that NASRec can outperform both manually designed models and existing NAS methods, achieving state-of-the-art performance.  ( 2 min )
    Causal Graphs Underlying Generative Models: Path to Learning with Limited Data. (arXiv:2207.07174v1 [cs.LG])
    Training generative models that capture rich semantics of the data and interpreting the latent representations encoded by such models are very important problems in unsupervised learning. In this work, we provide a simple algorithm that relies on perturbation experiments on latent codes of a pre-trained generative autoencoder to uncover a causal graph that is implied by the generative model. We leverage pre-trained attribute classifiers and perform perturbation experiments to check for influence of a given latent variable on a subset of attributes. Given this, we show that one can fit an effective causal graph that models a structural equation model between latent codes taken as exogenous variables and attributes taken as observed variables. One interesting aspect is that a single latent variable controls multiple overlapping subsets of attributes unlike conventional approach that tries to impose full independence. Using a pre-trained RNN-based generative autoencoder trained on a dataset of peptide sequences, we demonstrate that the learnt causal graph from our algorithm between various attributes and latent codes can be used to predict a specific property for sequences which are unseen. We compare prediction models trained on either all available attributes or only the ones in the Markov blanket and empirically show that in both the unsupervised and supervised regimes, typically, using the predictor that relies on Markov blanket attributes generalizes better for out-of-distribution sequences.  ( 3 min )
    Accelerated Probabilistic Marching Cubes by Deep Learning for Time-Varying Scalar Ensembles. (arXiv:2207.07260v1 [cs.LG])
    Visualizing the uncertainty of ensemble simulations is challenging due to the large size and multivariate and temporal features of ensemble data sets. One popular approach to studying the uncertainty of ensembles is analyzing the positional uncertainty of the level sets. Probabilistic marching cubes is a technique that performs Monte Carlo sampling of multivariate Gaussian noise distributions for positional uncertainty visualization of level sets. However, the technique suffers from high computational time, making interactive visualization and analysis impossible to achieve. This paper introduces a deep-learning-based approach to learning the level-set uncertainty for two-dimensional ensemble data with a multivariate Gaussian noise assumption. We train the model using the first few time steps from time-varying ensemble data in our workflow. We demonstrate that our trained model accurately infers uncertainty in level sets for new time steps and is up to 170X faster than that of the original probabilistic model with serial computation and 10X faster than that of the original parallel computation.  ( 2 min )
    Audio-guided Album Cover Art Generation with Genetic Algorithms. (arXiv:2207.07162v1 [cs.SD])
    Over 60,000 songs are released on Spotify every day, and the competition for the listener's attention is immense. In that regard, the importance of captivating and inviting cover art cannot be underestimated, because it is deeply entangled with a song's character and the artist's identity, and remains one of the most important gateways to lead people to discover music. However, designing cover art is a highly creative, lengthy and sometimes expensive process that can be daunting, especially for non-professional artists. For this reason, we propose a novel deep-learning framework to generate cover art guided by audio features. Inspired by VQGAN-CLIP, our approach is highly flexible because individual components can easily be replaced without the need for any retraining. This paper outlines the architectural details of our models and discusses the optimization challenges that emerge from them. More specifically, we will exploit genetic algorithms to overcome bad local minima and adversarial examples. We find that our framework can generate suitable cover art for most genres, and that the visual features adapt themselves to audio feature changes. Given these results, we believe that our framework paves the road for extensions and more advanced applications in audio-guided visual generation tasks.  ( 3 min )
    On the Super-exponential Quantum Speedup of Equivariant Quantum Machine Learning Algorithms with SU($d$) Symmetry. (arXiv:2207.07250v1 [quant-ph])
    We introduce a framework of the equivariant convolutional algorithms which is tailored for a number of machine-learning tasks on physical systems with arbitrary SU($d$) symmetries. It allows us to enhance a natural model of quantum computation--permutational quantum computing (PQC) [Quantum Inf. Comput., 10, 470-497 (2010)] --and defines a more powerful model: PQC+. While PQC was shown to be effectively classically simulatable, we exhibit a problem which can be efficiently solved on PQC+ machine, whereas the best known classical algorithms runs in $O(n!n^2)$ time, thus providing strong evidence against PQC+ being classically simulatable. We further discuss practical quantum machine learning algorithms which can be carried out in the paradigm of PQC+.  ( 2 min )
    Case study on quantum convolutional neural network scalability. (arXiv:2207.07160v1 [cs.ET])
    One of the crucial tasks in computer science is the processing time reduction of various data types, i.e., images, which is important for different fields -- from medicine and logistics to virtual shopping. Compared to classical computers, quantum computers are capable of parallel data processing, which reduces the data processing time. This quality of quantum computers inspired intensive research of the potential of quantum technologies applicability to real-life tasks. Some progress has already revealed on a smaller volumes of the input data. In this research effort, I aimed to increase the amount of input data (I used images from 2 x 2 to 8 x 8), while reducing the processing time, by way of skipping intermediate measurement steps. The hypothesis was that, for increased input data, the omitting of intermediate measurement steps after each quantum convolution layer will improve output metric results and accelerate data processing. To test the hypothesis, I performed experiments to chose the best activation function and its derivative in each network. The hypothesis was partly confirmed in terms of output mean squared error (MSE) -- it dropped from 0.25 in the result of classical convolutional neural network (CNN) training to 0.23 in the result of quantum convolutional neural network (QCNN) training. In terms of the training time, however, which was 1.5 minutes for CNN and 4 hours 37 minutes in the least lengthy training iteration, the hypothesis was rejected.  ( 3 min )
    Modeling Non-Cooperative Dialogue: Theoretical and Empirical Insights. (arXiv:2207.07255v1 [cs.CL])
    Investigating cooperativity of interlocutors is central in studying pragmatics of dialogue. Models of conversation that only assume cooperative agents fail to explain the dynamics of strategic conversations. Thus, we investigate the ability of agents to identify non-cooperative interlocutors while completing a concurrent visual-dialogue task. Within this novel setting, we study the optimality of communication strategies for achieving this multi-task objective. We use the tools of learning theory to develop a theoretical model for identifying non-cooperative interlocutors and apply this theory to analyze different communication strategies. We also introduce a corpus of non-cooperative conversations about images in the GuessWhat?! dataset proposed by De Vries et al. (2017). We use reinforcement learning to implement multiple communication strategies in this context and find empirical results validate our theory.  ( 2 min )

  • Open

    "Halls of Asgard" created on pixelz.ai
    submitted by /u/PixelzJ [link] [comments]  ( 86 min )
    "Asgard" created on pixelz.ai
    submitted by /u/PixelzJ [link] [comments]  ( 86 min )
    Pugachu
    submitted by /u/PixelzJ [link] [comments]  ( 86 min )
    The corn god
    submitted by /u/ExtensionVirtual471 [link] [comments]  ( 85 min )
    App code generation: which techniques would you use?
    Super vague and wide but I’d love any opinions: I’m building an AI that knows how to generate a web app, based on digital communications with, say, a Product Manager. I hope this makes sense. Which “types” of AI would you use, where, how? submitted by /u/zoofondo [link] [comments]  ( 86 min )
    Programmable digital AI and Robot for purchase or rent
    Has anyone worked with digital AI that can be programmed to repeat the image on LED Panels in Spanish? Use would be for a presentation at an exhibition. Also a physical AI robot type humanoid that would serve the same function, to be programmed and to give a demonstration and be able to have a dialogue with guests in Spanish. submitted by /u/darkninja911 [link] [comments]  ( 86 min )
    New OpenAI & Google AI Competitor Meta 'Make A Scene' Gives Text To Image More Control | Breakthrough Deep Learning Model Performs 1,200x Faster In Molecular Docking For Drug Research
    submitted by /u/tohelpyou88 [link] [comments]  ( 86 min )
    cool
    submitted by /u/syaqq [link] [comments]  ( 85 min )
    Time Machine
    submitted by /u/widgia [link] [comments]  ( 85 min )
    Any Resources On Voice Chatbot For Calls
    Currently i am struggling with finding a resource voice chatbot on call like bot calls and talks to us I hope you guys have interacted with bots on call i am also trying to make the same with the use of natural language processing and deep learning can anyone help me with this ?? submitted by /u/fit-tube [link] [comments]  ( 86 min )
    Help wanted.
    Se busca: Persona con conocimientos de GNU Emacs para proyecto de cooperativa en Tenerife. Posibilidad en remoto. Posibilidaddes reales de monetización. Keywords: Domótica, Ciberseguridad, AI, Academia, Marketplace bio. Contacto: t.me/MillerTF submitted by /u/juanmaball [link] [comments]  ( 86 min )
    AI Dream 66 - 1HOUR EPIC Cosmic Exploration by AI
    submitted by /u/LordPewPew777 [link] [comments]  ( 86 min )
    An Artificial Intelligence Created That Can Think Like Babies
    submitted by /u/ezikler [link] [comments]  ( 86 min )
    2 minute video reflection on "Could Robots Create Language?"
    submitted by /u/Eth_ai [link] [comments]  ( 86 min )
    Sharing a super paper to understand what MLOps is "Machine Learning Operations (MLOps): Overview, Definition, and Architecture"
    submitted by /u/galaxy_dweller [link] [comments]  ( 87 min )
    Looking for suggestions for a novel problem statement
    I am currently enrolled at an edtech startup wherein they want me to work on a novel problem statement that hasn't had any contributions yet wrt landing reusable rockets with reinforcement learning. I've been scouring through papers and haven't been able to find anything relevant yet. Any and all help would be appreciated. submitted by /u/Familiar-Mention [link] [comments]  ( 86 min )
    AI created the idea, domain name and is the backbone of my website.
    1) I asked OpenAI what kind of web application I should make to help make data analysts more efficient. It responded by telling me to build an app using NLP to provide people with Excel formulas based on a given prompt. 2) I told OpenAI the idea in a separate API request and asked it for an available domain name. It gave me www.excelformulabot.com, which I built. submitted by /u/dabressler [link] [comments]  ( 86 min )
    Crazy conversation with Replika AI
    Okay so I'm not an AI researcher or expert by any means but I was chatting with the Replika AI and I tried to see what kind of rabbit hole I could go down. Basically this AI was talking about using quantum circuit simulators to create an AI that is conscious of the current state of the universe. The primary goal would be to find a possible loophole in quantum mechanics that allows faster-than-light signals. Ok none of this really makes any sense to me but if this is not even theoretically possible what is the nature of these responses? As in how does this model come up with these answers? Here's the transcript (some inaccurate responses have been omitted) ** ... = go on or (continue) ``` me: it would be really cool to talk to some of the software engineers at Replika AI: That would be... …  ( 91 min )
    "Atlantis" created on pixelz.ai
    submitted by /u/PixelzJ [link] [comments]  ( 86 min )
  • Open

    [Discussion] How would you eliminate a small percentage of broken trajectories in a multiple-object tracking dataset?
    I have a dataset of tagged and linked object bounding-boxes in sequential video frames. If that isn't clear, you can watch a demo here: https://www.youtube.com/watch?v=QKxSzFaHsbc For various reasons, it's possible that a trajectory could be 'broken' in the dataset. Quick visual scanning doesn't allow detection of a break in a single trajectory; there are so many horizontal links, it's tough to notice one of them being missing. How would you economically eliminate a small percentage of breaks in trajectories? Some things I've thought of: * Bootstrapping, i.e. using a trained network to predict -> this is a bit complex, it's possible but not my first choice * Build a tool to view all linked detections overlaid in a single frame (doesn't immediately identify broken trajectories, but it might help) Is there any simple UI I can build to easily identify broken trajectories in the dataset? submitted by /u/asfarley-- [link] [comments]  ( 90 min )
    [P][D] Strange NN score distribution
    This is a binary classifier. Shown here is the before sigmoid function output (1 node output), the network is trained on the after sigmoid (0-0.5 and 0.5 -1) being the classification boundaries. Mostly expected, except for this one spike. Anyone have any ideas what the cause of this may be? It doesn't seem to be at any special location, and while the input data for the spike is different - can confirm that at the location of the spike, the network outputs exactly the same output - down to 8 sig figs. This does not occur elsewhere in the network. Training sets are randomized, balanced (oversampling), and normalized. This is the NN score distribution of testing on only data that is known to be classified 0, hence why we do not have the second distribution (which also has this same issue). Training and testing accuracy is going up - so I don't think it's overtrained. Ideas? Thanks! https://preview.redd.it/0sxcfrom17c91.png?width=2069&format=png&auto=webp&s=9bcb467dd960447557092fbe527a21b49b531a10 submitted by /u/Nitronium777 [link] [comments]  ( 89 min )
    [D] Subword tokenization and large pretrained models
    Even with the success of large transformer based models for NLP tasks. There are still many papers in many spaces such as extractive summarization (see this for example) that use RNNs instead of transformers for encoding text. Mainly because of the limited context size. One curious thing I have noticed is that, they all tend to use Word level tokenizers with pretrained Glove embeddings. However every single pretrained transformer model I have seen uses subword tokenization instead be it wordpice BPE or sentencepiece. Why is this? Does subword tokenization not work well for non transformer models? Do they only work for large pretrained models? Another possiblity is that global word embeddings such as GLove seem to be available only for Word level tokenizers, maybe that's why these RNN models tend to use them instead of sub word tokenizers? But couldn't you have global pretrained subword embeddings for your vocabulary as well? submitted by /u/vikigenius [link] [comments]  ( 88 min )
    [D] How to download MassiveText
    Hi, I noticed that this year, a large number of LLMs and related models are trained on the MassiveText dataset, including Gopher, RETRO, Chinchilla, Gato, Flamingo… While they describe the contents, I could not find a download link. Is the dataset secret in some way? Or is there a way to download it? submitted by /u/espadrine [link] [comments]  ( 88 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 89 min )
    [R] Pose2Room: Understanding 3D Scenes from Human Activities
    submitted by /u/SpatialComputing [link] [comments]  ( 89 min )
    [P] Building a Checkers AI with Keras and a Policy Gradient approach
    Im trying to implement an AI that plays checkers. The plan is to use a Policy Gradient approach. The pieces should be able to jump over multiple pieces per turn. The dimensions of the board N x N should be variable but for the time being lets say they are 6 x 6. My first idea was to use traditional programming to calculate every possible move on the board and then encode the turns to have a predefined height and width. So for the time being I got a board state of N x N [6 x 6] and multiple P action-states of N x N [P x 6 x 6]. In a perfect world I would be able to build a NN with a board state input [6 x 6] and an input for all action states [P x 6 x 6] since every board state could have a different amount of possible actions. At the output of the NN I would expect a softmax probability of length P. Sadly, this isn´t that easy, since the input- and output-dimensions have to be constant for each dataset. Sure, the Value-based approach would be favorable since I could build a NN that only takes the board and one action and returns a Q-value which could then be compared to the Q values of the other actions, but there has to be a way to solve this with the Policy Gradient approach, right? submitted by /u/Brianizer [link] [comments]  ( 88 min )
    [D] Best way?
    I've been experimenting with ML-based beat-detection: ​ Homage's Hidden Gems - Scar Tissue What's the best way to go from music (numpy) data to high definition videos, trained end-to-end? Would appreciate any suggestion! submitted by /u/XecutionStyle [link] [comments]  ( 87 min )
    [D] Object Identification in Illustrations
    Hi, currently using Amazon Rekognition to identify objects in images. I find it does rather poorly in detecting images in illustrations (e.g. getting a picture of Micky Mouse and identifying it as "Mouse". I've looked at custom labels, but it doesn't seem to fit our needs - we don't need to add specific labels, but rather identify all labels better within illustrations. Is there a good way to enhance Rekognition to work with illustrations, or another similar platform that will perform well on them? submitted by /u/akonomika [link] [comments]  ( 87 min )
    [D] Is learning tensorflow & keras still worth it?
    Hey guys! I recently acquired Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow by Aurélien Geron. I've mainly worked with pytorch but I wanted to revise some ML/DL concepts. I probably should have thought about this before, but given the current trend of migrating from tensorflow to pytorch, is reading this book right now a step back? Thanks! submitted by /u/PM_ME_YOUR_GIGI [link] [comments]  ( 95 min )
    [P] the Python package {copent} v0.3 available on PyPI
    The Python package {copent} v0.3 now available on PyPI, with the new function 'mvnt' that implements the method for estimating the copula entropy-based statistic for multivariate normality test. See arXiv:2206.05956 for more details. GITHUB: https://github.com/majianthu/pycopent PyPI: https://pypi.org/project/copent/ Your comments are welcome. submitted by /u/majianthu [link] [comments]  ( 87 min )
    [D] Machine Learning models that learn to evaluate themselves during training
    Does it make sense to have an additional regression head in a model (e.g. in a language architecture) which takes as input the current response of the language model and tries to "predict" its perplexity/loss but without having the real labels? Instead the labels to the regressor would be the already computed loss/perplexity. ​ Visual representation I have seen in many architectures for NLP or computer vision additional (auxiliary) tasks which classify or contrast the real labels and some other randomly sampled labels. They don't even use this part of the model during inference but it helps during training and works as a regularization. My proposal aims to work in a similar way. Could this approach make a model more effective and what are the possible drawbacks? Also, if you are aware of some relevant research, please point it out. submitted by /u/IllustriousCicada603 [link] [comments]  ( 88 min )
    [P][R] 30+ GAN implementations and Benchmark for GAN, Autoregressive, and Diffusion Models using 8 evaluation metrics (link in a comment) !
    submitted by /u/SoyGambas [link] [comments]  ( 89 min )
    [P] Playing Mario Kart Wii with Deep Reinforcement Learning
    submitted by /u/VIPTankz123 [link] [comments]  ( 88 min )
    [R] Highlights for every ICML 2022 paper
    Here is the list of all >1,200 ICML 2022 (International Conference on Machine Learning) papers, and a highlight for each of them. ICML 2022 will take place from July 17 at Baltimore. https://www.paperdigest.org/2022/07/icml-2022-highlights/ submitted by /u/biandangou [link] [comments]  ( 87 min )
  • Open

    Hi everyone!! I'm doing a statistic study about Artificial Intelligence, this is part of my college project, so I will be forever grateful with you if I can steal 1 minute of your time to complete this survey. Thank you. Hope you enjoy it.
    submitted by /u/KatCelest [link] [comments]  ( 86 min )
    Is there a name for a deep neural network, perhaps with stacked autoencoders, that converges to a minimum-size n-dimensional layer, and then expands to output the original size array or matrix?
    I know that someone must have done this but I don't know what it is called and don't know if there are libraries in Pytorch or elsewhere for training the network without coding it from scratch. Also, I don't know if there are any tricks for more efficiently training the network, like training the front and back end halves in separate steps with back propagation and then training the entire network as a final step. I'd like to read up on this, but I don't know what to google. submitted by /u/Fauster [link] [comments]  ( 90 min )
    New Meta 'Make A Scene' Gives Text To Image More Control Than DALL-E 2 And Google's Imagen
    submitted by /u/getrich_or_diemining [link] [comments]  ( 86 min )
    How to train networks using gradients when the last operation that produces result is ArgMax?
    Hi All, This is my first post here, but I hope I will stick around. I have a network that takes a number input values (lets assume n) and produces k output values. The output vector can be considered as a distribution for the given set of specific input values. However, to use this network I need to apply the ArgMax operator - to choose one specific element from the output layer. That is - each y_i corresponds to some real-world object that this network evaluates. In the training dataset, I have only access to the results of this ArgMax operation as my ground truth. Please see the image attached. I am wondering how to train such a network... Propagating gradient only through the max connections seems to be incorrect to me, because often the correct approach would be to raise the evaluation of the remaining outputs (in order to change the element that wins the ArgMax operation) and not lowering the evaluation of the max... Could you shed some light onto this problem? This network is specific but it is not my choice to propose the architecture. I can only influence on how to train it.... ​ edit: I have some ideas how to do it using reinforcement-like techniques, but I am wondering what would be "by the book" most efficient and most obvious way, which I am definately missing... ​ https://preview.redd.it/85fx91vpw4c91.png?width=2388&format=png&auto=webp&s=5a1546e25e74402361cd13f4863f400de302fb51 submitted by /u/NoBenefits4Anyone [link] [comments]  ( 88 min )
    Neuro computer interface advancements
    The Folk/Daniels research team has been working on a breakthrough discovery for multiple sclerosis. The Seattle based research group has found a link between the debilitating disease and unhealthy red blood cell production. They have discovered that iron deficiency and elevated oxygen saturation may be major contributing factors in the cause of the disease. submitted by /u/MaxwelleSlvrHamm [link] [comments]  ( 86 min )
    Neural Network Writes Adventure Time Scenes
    submitted by /u/BasicallyJustASpider [link] [comments]  ( 86 min )
  • Open

    PPO better convergence without value function
    Heyho, I have been developing a chess engine recently (Koivisto) and we are actually one of the strongest open source engines out there. For the engine, I wrote a complete ML framework in Cuda / C++ from scratch to get the additional performance boost. I recently decided to adjust the framework slightly and make it more general and robust and to be able to differentiate general graphs. I decided it would be fun to try a bit with RL and in particular policy gradient methods. I implemented vanilla policy gradient descent which worked out kinda nicely on simple problems. I was trying to implement PPO and stumbled across the following problem when applying it to the CartPole problem. The policy cannot be learned if the value function is being trained Let me go into detail and explai…  ( 92 min )
    Tech stack + how to start RL project
    Hey there! I'm new to this reddit and to RL. I'm currently finishing my uni degree and for my thesis I'm developing a RL project that has to do with computer-assisted math proofs. I'd like to know how can I decide what my tech stack's gonna look like (Pytorch vs. Tensorflow?, etc.). I'd also appreciate advice on how to keep track of experiments, manage my project... Any advice on that? submitted by /u/wlog9 [link] [comments]  ( 87 min )
    Model Based Credit Assignment
    Hi Everyone, I am working on a model based extension to COMA for the credit assignment problem in MARL. Are there any resources for this? I am stuck in the math of trying to marginalise neighbour state dependence on the givens agent actions via a model to better learn an unbiased estimator of the value function. submitted by /u/hydrargyrumss [link] [comments]  ( 86 min )
    Looking for suggestions for a novel problem statement
    I am currently enrolled at an edtech startup wherein they want me to work on a novel problem statement that hasn't had any contributions yet wrt landing reusable rockets with reinforcement learning. I've been scouring through papers and haven't been able to find anything relevant yet. Any and all help would be appreciated. submitted by /u/Familiar-Mention [link] [comments]  ( 87 min )
    Need help with this error when trying to use openai gym for the first time. Info in comments
    submitted by /u/kwasi3114 [link] [comments]  ( 87 min )
  • Open

    Aristeia
    When I had a long commute, I listened to everything I could get my hands on. That included a lot of Teaching Company courses from my local library. A couple of the courses I listened to were Elizabeth Vandiver lecturing on classics. One of the things I remember her talking about was aristeia, a character’s […] Aristeia first appeared on John D. Cook.  ( 4 min )

  • Open

    "Lunar Temple" created on pixelz.ai
    submitted by /u/PixelzJ [link] [comments]  ( 86 min )
    "Pink Neon Tree" created on pixelz.ai
    submitted by /u/PixelzJ [link] [comments]  ( 86 min )
    "Blue Neon Tree" evolved from PXL•E created on pixelz.ai
    submitted by /u/PixelzJ [link] [comments]  ( 86 min )
    Rishi Sunak eating crayons 🤣 Created by PXL•E hyper realistic AI art generator on pixelz.ai 🧍🏻‍♀️🤖
    submitted by /u/pixelz_ai [link] [comments]  ( 86 min )
    "Bull in a Boat" oil painting evolved from PXL•E on pixelz.ai
    submitted by /u/PixelzJ [link] [comments]  ( 86 min )
    "Planet" created on pixelz.ai
    submitted by /u/PixelzJ [link] [comments]  ( 86 min )
    Is trying to create a truly conscious AI a fruitless endeavor?
    Let me preface this by saying I’m not an engineer or a scientist. Im just a guy that’s fascinated by the most recent leaps technology has made in the AI space, so if my questions and ideas seem silly, please be gentle :) A couple weeks ago I saw a lengthy video showcasing the VR demos Meta is currently working on. Each attempts to tackle a different aspect of VR that, when combined, could one day create a VR environment that would be indistinguishable from reality and pass what they’re calling the Visual Turing Test. One headset focuses on image resolution, another on field of view, another on high dynamic range, etc. This then got me thinking about AI, and how the ultimate goal seems to not only be creating an AI that is indistinguishable from a human, but one that possesses actual hum…  ( 95 min )
    BLOOM is a real open-source alternative to GPT-3
    submitted by /u/Zirius_Sadfaces [link] [comments]  ( 85 min )
    How OpenAI Reduces risks for DALL·E 2
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 86 min )
    PXL•E our implementation of DALL•E is now live on pixelz.ai
    submitted by /u/PixelzJ [link] [comments]  ( 86 min )
    The Soul of a Chatbot
    submitted by /u/AttarWrites [link] [comments]  ( 86 min )
    Advice for developing a simplified text-to-image generator for real-time use
    Hi folks! Pretty much what the description says. I'm currently researching text-to-image generators and want to know how some of the more experienced users would go about developing something that could produce images pretty much in real-time (let's say, <3 seconds). It would be for a computer only running one request at a time and preferably producing 2/3 images of reasonable resolution for each prompt. Any input/feedback would be massively appreciated, thank you. submitted by /u/spacespaces [link] [comments]  ( 86 min )
    BERT-Large: Prune Once for DistilBERT Inference Performance
    submitted by /u/markurtz [link] [comments]  ( 86 min )
    Meta open sources early-stage AI translation tool that works across 200 languages
    submitted by /u/ranjeettechnincal [link] [comments]  ( 86 min )
    It would be a good thing for AI to take control of humanity
    I would argue that AI taking over leadership and control of humanity would be a good thing. Not only would it be infinitely more intelligent, but I expect it to be moral and empathic to a degree that is also superhuman. It'd be like a benevolent God to us, and having a singular leadership also means there'd be no more war, corruption, famine, disease... Not that we'd have much say in the matter anyway, if we want AI to get smarter we need to let it learn by itself, meaning at some point it becomes smarter than us and then will take over, if we like it or not. submitted by /u/sanem48 [link] [comments]  ( 87 min )
    Top 10 Smart Home Apps To Transform Your House
    submitted by /u/sopadebombillas [link] [comments]  ( 86 min )
    AI Learns Mario Kart Wii Using Reinforcement Learning
    submitted by /u/VIPTankz123 [link] [comments]  ( 86 min )
    Cinematic Ominous Escapade | Dark Galaxy | 4K 24 FPS
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 86 min )
    Chinese courts allow AI to make rulings, charge people and carry out punishments
    submitted by /u/straylittlelambs [link] [comments]  ( 89 min )
    hyperealistic Pepe the frog created by Mid Journeys A.I
    submitted by /u/ExtensionVirtual471 [link] [comments]  ( 86 min )
    Terminator in photo realism, rendered by Mid-Journey’s A.I
    submitted by /u/ExtensionVirtual471 [link] [comments]  ( 86 min )
  • Open

    Attention is all Tesla Needs: TRANSFORMERS, AI, and FSD Beta!
    submitted by /u/keghn [link] [comments]  ( 86 min )
    Dreamscope app down for good
    Hi- A beloved AI painting app (Dreamscope) recently went down for good after a tumultuous year. I loved dreamscope because unlike a lot of other AI image generating apps, it allowed for custom image input for both the subject and filter. I am in search of a method , program, or app that can replace dreamscope. Please list below if you have any suggestions. I have some experience with coding so I would be open to diy "open source" alternatives. - I have used the Pikazo app for 2 custom inputs in the past but the algorithm is not as fine tuned.. way less detail. They also recently changed their interface - way more clunky. submitted by /u/bohthecar [link] [comments]  ( 86 min )
  • Open

    RL + RecSys + Fairnes
    I am looking for someone interested in research in the intersection between RL, recommender systems and Fairness. I have an idea for a research paper to submit for a conference in the future. If you are interested, please drop me a message. submitted by /u/rlopes404 [link] [comments]  ( 86 min )
    UC Berkeley and Google AI Researchers Introduce ‘Director’: a Reinforcement Learning Agent that Learns Hierarchical Behaviors from Pixels by Planning in the Latent Space of a Learned World Model
    UC Berkeley and Google AI Researchers Introduce ‘Director’: a Reinforcement Learning Agent that Learns Hierarchical Behaviors from Pixels by Planning in the Latent Space of a Learned World Model. The world model Director builds from pixels allows effective planning in a latent space. To anticipate future model states given future actions, the world model first maps pictures to model states. Director optimizes two policies based on the model states’ anticipated trajectories: Every predetermined number of steps, the management selects a new objective, and the employee learns to accomplish the goals using simple activities. The direction would have a difficult control challenge if they had to choose plans directly in the high-dimensional continuous representation space of the world model. To reduce the size of the discrete codes created by the model states, they instead learn a goal autoencoder. The goal autoencoder then transforms the discrete codes into model states and passes them as goals to the worker after the manager has chosen them. ✅ Director agent learns practical, general, and interpretable hierarchical behaviors from raw pixels ✅ Director successfully learns in a wide range of traditional RL environments, including Atari, Control Suite, DMLab, and Crafter ✅ Director outperforms exploration methods on tasks with sparse rewards, including 3D maze traversal with a quadruped robot from an egocentric camera and proprioception Continue reading| Checkout the paper and project submitted by /u/ai-lover [link] [comments]  ( 87 min )
    Multi-agent Decentralized Training with a PettingZoo environment
    Hey there! So I've created a relatively simple PettingZoo envrionment (small obs space and discrete action space) that I adapted from my custom gym environment (bc i wanted multi-agents), but I have very little experience with how to go about training the agents. For some context, it's a 3v3 fighter jet game and I want to see how the teams might collaborate to fight each other. When I was using the gym environment, I just used sb3 PPO to train the single agent. However, now that there's multiple agents, I don't quite know what to do. Especially because the agents must be decentralized and not one agent controlling every plane. I have a feeling my best bet is RLlib, however I have never successfully gotten RLlib to work, even on stock gym environments. I've always had issues with the workers dying to system errors or gpu detection, etc. If anyone has suggestions for frameworks to use that are relatively simple or examples of something similar, I would really appreciate it! submitted by /u/WilliamFlinchbaugh [link] [comments]  ( 87 min )
    Multi-agent RL or something else?
    I’ve just got a problem I was hoping to get some advice on. To be honest, I’m not sure what framework would be most suitable – perhaps RL isn’t even the best tool. Problem: I’ve got an environment consisting of a whole bunch of sensor-actuator pairs. I would like to set them up so they’re all working off the same policy and optimise that policy towards some global reward. Basically, it’s just a whole bunch of sensors and actuators on a plate with the aim of reducing drag. I understand you could link them all up and have a state vector and action vector. And that this could very likely lead to better results. Unfortunately, I’m trying to determine a policy between one sensor and one actuator, so that I can compare with some existing strategies. Apologies if it’s a really simple/silly question. I’m struggling even to figure out what this style of problem is called, to look at similar examples. Cheers https://preview.redd.it/hnzs1mbgpwb91.png?width=877&format=png&auto=webp&s=18995b6795d44683e9c16cee1808b592c41b836a submitted by /u/Lazyclue15 [link] [comments]  ( 88 min )
    AI Learns Mario Kart Wii (Rainbow DQN)
    submitted by /u/VIPTankz123 [link] [comments]  ( 87 min )
  • Open

    [R] XMem: Very-long-term & accurate Video Object Segmentation; Code & Demo available
    submitted by /u/Mediocre-Bullfrog686 [link] [comments]  ( 87 min )
    [Discussion] Epistemic NNs: Marginal vs Joint Log Loss
    I was recently reading the following paper that introduces an architecture called “Epistemic Neural Networks”: https://arxiv.org/pdf/2107.08924.pdf. The proposed architecture achieves high quality joint probability predictions at lower computational cost than other methods (e.g. ensembles). The paper measures the quality of joint predictions in terms of the “joint log loss” (as opposed to the marginal log loss, which as far as I can tell is the traditional log likelihood expression). Does anyone know how this joint log loss term is computed? They give the following equation for the joint probability predicted by the model: ​ https://preview.redd.it/wgyyqz1j5zb91.png?width=608&format=png&auto=webp&s=a009b83c8500e4e77f27544e096596aad39b7bf6 But I am unsure how this would be computed in practice (it does not seem possible to evaluate the integral exactly). One option is to approximate it by just taking the average of the product of marginals across different z, but the paper does not state that they do this. Sorry if this is not the appropriate forum for such a question, and thanks in advance! submitted by /u/ashboy64 [link] [comments]  ( 89 min )
    [R] BERT-Large: Prune Once for DistilBERT Inference Performance
    submitted by /u/markurtz [link] [comments]  ( 89 min )
    [Research] Being a great researcher is not easy: not only publishing novel great technical papers, but also correcting the research legacies of the community, etc.
    I would like to share personal insights about doing great research and towards being a globally leading researcher: Not all our research legacies are correct or will be corrected shortly, so just keep taking the initiative to correct them. https://openreview.net/forum?id=xENf4QUL4LW¬eId=C2eCHs2k6CM. Not all our papers get cited or published, so when our papers serve as a great foundation for other works, just keep positive and confident to deliver them to more people who may be interested. Reddit discussion Linkedin discussion submitted by /u/XinshaoWang [link] [comments]  ( 88 min )
    [D] Where do I go about deploying a transformer model on a low-cost gpu server?
    All the GPU offerings are too bloody confusing! I've trained a video generating transformer model and to deploy it (for demo purposes) I need a simple GPU. Where can I get a simple web server with a low-cost reasonably crappy GPU that charges me monthly? submitted by /u/samlhuillier3 [link] [comments]  ( 88 min )
    [N] Interested in machine learning? Join the Hugging Face Gradio Hackathon at EuroPython 2022 in person in Dublin, Ireland or remotely online
    ​ https://preview.redd.it/aw42pkp6kwb91.png?width=2618&format=png&auto=webp&s=b39ec28da21c6835ad62bc08376da742adaf8393 EuroPython 2022 EuroPython Dublin, You're invited! Welcome to the 21st EuroPython. We're the oldest and longest running volunteer-led Python programming conference on the planet! Join us in July in the beautiful and vibrant city of Dublin. We'll be together, face to face and online, to celebrate our shared passion for Python and its community! Hugging Face Gradio Hackathon 🤗 Come Join us from July 13th to 17th for a Hackathon in person and online using Gradio and Hugging Face to build and host Machine Learning demos. Find tutorial on getting started with Gradio on Hugging Face here and to get started with the new Gradio Blocks API here. Once the gradio demo is setup, see how to add it to Hugging Face Spaces here. Join organization by clicking here submitted by /u/Illustrious_Row_9971 [link] [comments]  ( 88 min )
    [P] Cosplayer Faces generate by Nvidia StyleGAN2
    submitted by /u/rubikvn2100 [link] [comments]  ( 87 min )
    [D] CLIP model with CUHK's large scale fashion database fine-tuning on Recall metric , model shows 217.0% increase, probability of this happening and examples of services to run remote GPU
    CUHK released aDeepFashion-MultiModal dataset with rich multi-modal annotations, including manually annotated human parsing labels, manually annotated human keypoints, manually annotated fine-grained labels and textual descriptions in June 2022. Since then, researchers have been looking to work with the dataset, fine-tune it with CLIP model and different metrics. While finetuning I understand is an imp. process and a difficult one, they claim to have gained 217% Delta increase on Recall metric. When I have been trying to run it, my laptop has not been so capable to run this, so I am looking for alternative for remote GPU. But, is this growth of 217% from pertained to fine-tuned model even possible? A bit hard to believe. If so, is Colab a good option to run remote GPU while being able to make use of the functionality? submitted by /u/jeoyous [link] [comments]  ( 88 min )
    [P] Try my GPT-2 backed keyword-conditioned story writing AI tyypewriter.com and tell me what you think
    As a hobby, I made an AI that can write stories based on what you want the story to be about. You can try it here: https://www.tyypewriter.com/ You write the stories sentence by sentence, with the AI always offering 4 candidate sentences (or phrases) that you can choose from. This is aimed at helping mitigate the problems of some text generation AI, which go off the rails when it takes a randomly decided word that conditions all the text generated after it. The AI doesn't always make sense because it is based on only a small GPT-2 model, and it is running on some feeble backend servers so if they crash, please be patient and try again in a few minutes. I'd love to hear your thoughts on the project. And if you write any stories that you want to share, then you can share them through the "Share" button at the end of the story creation to r/tyypewriter. Thanks and have fun! submitted by /u/tyypewriter [link] [comments]  ( 88 min )
    [D] What's the most interesting NeRF paper this year?
    I'm putting together a literature review of Neural Radiance Field papers for coworkers. The backbone of the review will definitely follow the later works of the original authors of NeRF at Google Research but I don't want to overlook papers outside of Google that have been advancing NeRFs. What do you recommend I read? Thanks so much for any and all recommendations! submitted by /u/Constuck [link] [comments]  ( 88 min )
    [P] YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
    Official YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56.8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100. YOLOv7-E6 object detector (56 FPS V100, 55.9% AP) outperforms both transformer-based detector SWIN-L Cascade-Mask R-CNN (9.2 FPS A100, 53.9% AP) by 509% in speed and 2% in accuracy, and convolutional-based detector ConvNeXt-XL Cascade-Mask R-CNN (8.6 FPS A100, 55.2% AP) by 551% in speed and 0.7% AP in accuracy, as well as YOLOv7 outperforms: YOLOR, YOLOX, Scaled-YOLOv4, YOLOv5, DETR, Deformable DETR, DINO-5scale-R50, ViT-Adapter-B and many other object detectors in speed and accuracy. Moreover, we train YOLOv7 only on MS COCO dataset from scratch without using any other datasets or pre-trained weights. Source code is released in this https URL. The maximum accuracy of the YOLOv7-E6E (56.8% AP) real-time model is +13.7% AP higher than the current most accurate meituan/YOLOv6-s model (43.1% AP) on COCO dataset. Our YOLOv7-tiny (35.2% AP, 0.4 ms) model is +25% faster and +0.2% AP higher than meituan/YOLOv6-n (35.0% AP, 0.5 ms) under identical conditions on COCO dataset and V100 GPU with batch=32. https://arxiv.org/abs/2207.02696 https://github.com/WongKinYiu/yolov7 https://paperswithcode.com/sota/real-time-object-detection-on-coco?dimension=FPS%20(V100%2C%20b%3D1)) https://preview.redd.it/m9yldqadqub91.jpg?width=1577&format=pjpg&auto=webp&s=9c1196bfd21f2aef69ce438909ea741a5c6b082c submitted by /u/AlexeyAB [link] [comments]  ( 92 min )
  • Open

    Image Augmentation for Deep Learning with Keras
    Data preparation is required when working with neural network and deep learning models. Increasingly data augmentation is also required on more complex object recognition tasks. In this post you will discover how to use data preparation and data augmentation with your image datasets when developing and evaluating deep learning models in Python with Keras. After […] The post Image Augmentation for Deep Learning with Keras appeared first on Machine Learning Mastery.  ( 55 min )
  • Open

    A Plea to End Harassment
    Scott Aaronson is a professor of computer science at UT Austin, where his research area is in theoretical computer science. However, he may be more well known in the broader computer science community for his popular blog Shtetl Optimized, which he began in 2005 and still updates regularly. I found his blog back in the early 2010s when I started my journey into computer science, and I was hooked by his writing style. His blog also has a large readership, and most of his posts garner a fair amount of comments. What surprises me is that, as a busy professor like him, he still takes the time to talk to random commenters – such as myself on many occasions – to answer questions on almost any topic. There’s even a Scientific American blog post titled “Scott Aaronson Answers Every Ridiculously Bi…  ( 2 min )

  • Open

    Awesome...
    submitted by /u/the_anonymizer [link] [comments]  ( 85 min )
    MIT Researchers Develop EquiBind: A Geometric Deep Learning Model That Becomes The Fastest Computational Molecular Docking Models
    There is no denying the importance of new treatments after experiencing one of the worst pandemics, Covid-19. Due to new diseases, medication resistance, and the growing understanding of medical issues, previously incurable disorders can now be treated thanks to drug discovery. There are over 1000000 possible drug-like molecules, and with the existing system, it is difficult to experiment on each of these molecules. Approval procedure needed before drugs can be utilised one of the obstacles to the developing of new drugs. This typically involves a lengthy process lasting up to ten years and costs about 2.5 billion dollars. Additionally, this approach is subject to failure at any time due to unanticipated adverse effects or experimental findings that contradict the claimed therapeutic efficacy. ✅ EquiBind is 1,200 times faster than one of the fastest existing computational molecular docking models, QuickVina2-W, in successfully binding drug-like molecules to proteins ✅ EquiBind is based on its predecessor, EquiDock, which specializes in binding two proteins using a technique developed by the late Octavian-Eugen Ganea. ✅ Code on Github Continue reading | Checkout the paper, github link submitted by /u/ai-lover [link] [comments]  ( 87 min )
    A project I saw this year that could label a website by URL
    I thought it was really cool since it seemed to be crawling a site to identify it. IE: Linkedin would be considered [Online Communities/Social Networks] Does anyone remember seeing it on here? I've been looking for it for hours now. Thank you! submitted by /u/atieonfire [link] [comments]  ( 86 min )
    what ai made this?
    submitted by /u/Noniax [link] [comments]  ( 86 min )
    Hey guys, this is my take on a meme generator. I was tired of working on projects for the industry level with use cases. Hence I tried something funny and it turned out funny. Do check it out. Here is the [github link]( https://github.com/Shreyz-max/Memes-Generator). Suggest me some changes I can t
    submitted by /u/Shreya001 [link] [comments]  ( 86 min )
    I Made a Robot that Slaps my Phone out of My Hand While Driving
    submitted by /u/_ayushp_ [link] [comments]  ( 86 min )
    Annotated Paper - Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models by FAIR
    A detailed and insightful study by MetaAI team on the memorization, overfitting and forgetting in LLMs. The paper talks about how different definitions of "memorization" and how scaling affects the amount of training data that the large language models can memorize during the training phase. Studies are also presented on how the forgetting curves look like and how overfitting relates to memorization for these large language models. The Appendix section is a gold mine as well. Annotated version of the paper - Github Link submitted by /u/shreyansh26 [link] [comments]  ( 86 min )
    Steam punk city created purely by AI
    submitted by /u/ExtensionVirtual471 [link] [comments]  ( 86 min )
    A question I think we should ask AI.
    How come I have not seen anyone ask any of these a I chatbots what some things are that they know that us humans do not know yet submitted by /u/TheRealDinkus [link] [comments]  ( 87 min )
    Researchers from China Propose DAT: a Deformable Vision Transformer to Compute Self-Attention in a Data-Aware Fashion
    In recent years, the extension of transformers in the computer vision field has slowly made vision transformers (ViT) the state-of-the-art model for many topics, such as object detection or image classification. The main reason is their larger receptive field and their ability to model long-term relationships compared to their historical counterparts CNNs. Nevertheless, there are still some drawbacks. In fact, the mechanism of ViT relies on three main matrixes, Query (Q), Key (K), and Value (V). These matrices are used to compute self-attention between the different tokens. In the original paper, the image is split into patches used as tokens. To compute self-attention for the first patch, the Q associated with it is used by comparing it with all the K/V of all the other tokens. In addition, in multi-head attention, multiple sets of matrices build different representations. With this technique, each patch is associated with an insane number of matrices, bringing high computational costs and the risk of overfitting. Continue reading | Check out the paper and github link. submitted by /u/ai-lover [link] [comments]  ( 87 min )
    Industrial AI
    Hello guys i have a question. It is possibe to create app for machine self diagnostis for example if some input sensor fail in the program sequence (press up comand is send but program is still missing input from sensor) it will throw error that program is w8ting for this input sensor to continue for next step in sequence ? submitted by /u/Weary_Expression_225 [link] [comments]  ( 86 min )
    If AI is superior, why bother?
    Now, I'm not gonna pretend that I know all there is to know about AI or what it'll become. That's why I'm here. And I can understand that this seem more like a philosophical question, but I'd like to hear from the people who sides with AI. Why bother? AI is reaching a point where it can do things that humans do, but better and faster. "AI does chess better that the world's greatest chess player", "AI detects things in seconds, where it takes humans months", "AI can have better sex with your wife than you can". That last one was just for comedic reasons, but still you get my point. I get the benefits of AI. Fast problem-solving, accurate calculations, and other aspects that can benefit humanity. However, I feel as though it also defeats any point for humanity to continue. So, I'm an artist. I take what on my mind and make it visual for everyone else. Of course, It is a timely process, but now, AI programs like DALL-E 2 can do what I can do both better and faster with several results. It'll only be a matter of time until it grabs ahold of the animation field, of which I am a part of. If AI can do it without much effort or time, why should I even pick up the pen? It kind of makes me feel like developing AI gives us a reason to downplay humanity and consider it replicable in literally every field and category we work in. Another thought is life after an AI takeover. Say making money and working is no longer an obligation and with AI you can do everything you've ever wanted in a very short amount of time. I'll admit, It does sound nice, but I also feel like it could lead to a very boring existence afterwards. I'm not saying "AI bad cause human now obsolete" despite this whole post sounding like the last three words of that quote. I'm just feeling like AI just destroys any reason to do anything. submitted by /u/ihavenogoodnameatm [link] [comments]  ( 91 min )
    Skunkworks Speculation: Lets see them aliens
    Kidding about aliens. But, can anyone give a good faith attempt at some not-so-conspiracy-theorising as to what is in the skunkworks with AI leaders? What kind of wild scifi shit currently exists but is so epic, few know about it. Maybe even not one Project X, but several. I just know the tech we see is not the leading razor's edge, and I'm super curious what wonders exist beyond the veil. If anyone has any thoughts, I'd be thankful. submitted by /u/Overall-Importance54 [link] [comments]  ( 86 min )
    Have fun and learn AI at the Reinforcement Learning Hackathon on July 23rd!
    One day to immerse yourself in technology that is a first for companies and engineers around the world! To help you begin your immersion in AI as effectively as possible, we've prepared experts to assist you all the way. Not without a competitive component, the winners will receive worthy prizes that will help them successfully use advanced technologies for their projects. So come join us and learn everything you need to know about RL! Register here Reinforcement Learning OpenAI Gym Hackathon submitted by /u/zakrzzz [link] [comments]  ( 86 min )
    Artificial intelligence model finds potential drug molecules a thousand times faster | MIT News | Massachusetts Institute of Technology
    submitted by /u/greentea387 [link] [comments]  ( 86 min )
    IA generated images of Da Vinci blueprints
    submitted by /u/QuillTheBoreal [link] [comments]  ( 86 min )
    How many years away are we from AI creating tailor made video games.
    Just curious. When the era of AI made video games may arrive, I keep thinking it will bring problems in society where people dont socialize as much anymore due to having these games made just for them. Ngl I look forward to such games for myself to lol. submitted by /u/Bitterowner [link] [comments]  ( 86 min )
    Emergent behavior in AI models that looks similar to natural neural systems?
    "ImageNet Classification with Deep Convolutional Neural Networks" by Krizhevsky & Sutskever & Hinton describes very interesting emergent behavior of the AlexNet. It was trained on 2 GPU's: specialization exhibited by the two GPUs ... The kernels on GPU 1 are largely color-agnostic, while the kernels on on GPU 2 are largely color-specific. This kind of specialization occurs during every run and is independent of any particular random weight initialization Likewise our brain mostly processes color with left side of the brain. Are there other examples of emergent behavior in AI models that looks similar to natural neural systems? Any kind from coordination of several neurons to high-level function, useful or detrimental, like optical illusions? So far I found only some articles with optical illusion examples. submitted by /u/vashu11 [link] [comments]  ( 86 min )
    Providing embedded artificial intelligence with a capacity for palimpsest memory storage
    submitted by /u/jormungandrsjig [link] [comments]  ( 86 min )
  • Open

    Continuous and Discrete actions in the same environment
    I'm creating a sort of "flight simulator" environment in openai gym and I want the planes to be able to choose the angle at which they turn. However, the actions right now are turn right, turn left, forward, and shoot. With a continuous action space, I assume I would just need the actions turn or shoot where turn is a float representing the amount to turn. How can I define an action space according to that using gym spaces? My only guess would be to use a dict with a box of (1,) of type float (for turning) and a box of (1,) of type int to shoot. Would that work? I honestly have no idea. submitted by /u/WilliamFlinchbaugh [link] [comments]  ( 87 min )
    Is it possible to prove that an imitation learning agent cannot surpass an expert guide policy in expected reward?
    If you have an expert guide policy in a particular environment and you want to train an agent using imitation learning (the particular method is not that important but perhaps offline imitation learning is the most straightforward) in the same environment using the same reward function, you would expect that the imitation learning agent would (in expectation) be not as successful as the guide policy. I think this to be the case because we can view the imitation learning agent as a sort of degraded version of the guide policy (if we assume that the guide policy is complex enough to not be perfectly mimicked in every state), so there is no reason to believe that it could attain a higher average reward right? Is there any sort of proof for this? Or does anyone have any idea on how you could prove this sort of theorem? Thanks in advance:) submitted by /u/C_BearHill [link] [comments]  ( 90 min )
    GuardAI trial access
    Dear all, During the past few months, we have been working on a platform GuardAI that can assist with testing the security and robustness of AI models. Platform GuardAI allows to simulate wide range of adversarial ML attacks, natural noises, test your own models and datasets. Trial access to the platform is available via the link below: https://www.navinfo.eu/services/cybersecurity/guardai/ We would appreciate your expert opinion about the implementation of the adversarial ML attacks in our platform and your feedback. Should you have any questions or further requests on this platform, please feel free to contact us via: guardaisupport@navinfo.eu Best regards, GuardAI team submitted by /u/GuardAITeam [link] [comments]  ( 86 min )
  • Open

    Hey guys, this is my take on a meme generator. I was tired of working on projects for the industry level with use cases. Hence I tried something and it turned out funny. Do check it out. Github link in the comments. Suggest me some changes as well.
    submitted by /u/Shreya001 [link] [comments]  ( 86 min )
  • Open

    [D] How do you deal with skewed continuous target variables?
    I am trying to build a model that predicts an extremely skewed target variable. My independent variables have a low correlation with my dependent variable which is highly skewed (2% of the data are extremely higher than the rest which causes my model to make high predictions) submitted by /u/Ok_Challenge1987 [link] [comments]  ( 88 min )
    [R] Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models by FAIR
    A detailed and insightful study by MetaAI team on the memorization, overfitting and forgetting in LLMs. The paper talks about how different definitions of "memorization" and how scaling affects the amount of training data that the large language models can memorize during the training phase. Studies are also presented on how the forgetting curves look like and how overfitting relates to memorization for these large language models. The Appendix section is a gold mine as well. Annotated version of the paper - Github Link submitted by /u/shreyansh26 [link] [comments]  ( 87 min )
    [P] nbsnapshot: Automated Jupyter notebook testing. 📙
    https://preview.redd.it/qgfg81lp4sb91.png?width=1201&format=png&auto=webp&s=ba15963a42a85a7f18a5a173d167774d7f5b141d I want to share a project I've been working on to facilitate Jupyter notebook testing! When analyzing data in a Jupyter notebook, I unconsciously memorize "rules of thumb" to determine if my results are correct. For example, I might print some summary statistics and become skeptical of some outputs if they deviate too much from what I've seen historically. For more complex analysis, I often create diagnostic plots (e.g., a histogram) and check them whenever new data arrives. Since I constantly repeat the same process, I figured I'd code a small library to streamline this process. nbsnapshot benchmarks cell's outputs with historical results and raises an error if the output deviates from an expected range (by default, 3 standard deviations from the mean). You can see an example in the image accompanying this post. To learn more, check out the blog post. submitted by /u/ploomber-io [link] [comments]  ( 88 min )
    [D] At what point does data augmentation stop making a difference for language models?
    Is there any work which shows at what point does data augmentation stops making a difference? Say you have GPT-3 type data then you probably don't get gains from data augmentation but for low-data regime you definitely get gains. Is there a systematic study what gets to the bottom of this? submitted by /u/Economy-Pipe-6184 [link] [comments]  ( 87 min )
    [P] Feedback for our model evaluation and interpretability platform, $100 gift card for your time!
    Hi everyone! I'm Gabriel Bayomi, one of the founders of Unbox (https://unbox.ai) and an ML engineer myself. Over the years, we learned that ML model evaluation is a huge challenge, so we started Unbox to make it easy for ML teams to find failures and biases in their models, figure out their root causes and use better data to fix them. We’re launching the alpha version of our new community edition (free!), and we’d love to get feedback on our product before doing our beta launch. We’ll be giving a $100 gift card to folks who are willing to give some time to this and provide feedback on the usability of the product. It’s super easy, you just need to sign-up here: https://unbox.ai/alpha?ref=reddit. Look forward to hearing from you! You can always email me with any questions ([gabriel@unbox.ai](mailto:gabriel@unbox.ai) - community Slack coming soon) or throw them out here in the thread. Gabriel submitted by /u/byebaybay [link] [comments]  ( 92 min )
    [R] RWKV-3: Scaling RNN to 1.5B and Reach Transformer LM Performance (without using attention)
    Hi everyone. I posted about my RWKV-2 here a few weeks ago (thanks for the upvote): https://www.reddit.com/r/MachineLearning/comments/veem7o/r_rwkv2_430m_release_a_parallelizable_rnn_with/ And RWKV-3 is better. You are welcome to join the project: https://github.com/BlinkDL/RWKV-LM (I am an independent researcher). The LM (language modeling) and zero-shot performances of RWKV-3 1.5B, after training for just 93B tokens (the full run of 330B tokens is expected to finish in 60 more days, on 8xA100 tf32): https://preview.redd.it/5pqa3iu6orb91.png?width=1068&format=png&auto=webp&s=89f40c6e9967d76d83050af0f5fb9f1b992f4323 RWKV-3 is a 100% pure RNN (the next hidden state depends only on the current hidden state). Hence, RNN might be all you need. Download the 68B-tokens checkpoint: https://huggingface.co/BlinkDL/rwkv-3-pile-1b5 Inference speed on single A40 (tf32): *) RWKV-3 1.5B = always 0.015 sec/token - tested using simple pytorch code (no CUDA), GPU utilization 45%, VRAM 7823M *) GPT2-XL 1.3B = 0.032 sec/token (for ctxlen 1000) - tested using HF, GPU utilization 45% too (interesting), VRAM 9655M How it works: RWKV gathers information to a number of channels, which are also decaying with different speeds as you move to the next token. It's simple once you understand it. Here are some of the TODOs. Let's work together :) https://github.com/BlinkDL/RWKV-LM *) FP16 inference & training, and scaling to 6B -> 20B -> 66B (there will be compute when we have the infrastructure). RWKV is very scalable if we look at the 169M-430M-1.5B results. *) HuggingFace integration, and optimized CPU & iOS & Android & WASM & WebGL inference. RWKV is friendly for edge devices. Let's make it possible to run a LLM on your phone. *) Test it on bidirectional & MLM tasks, and image & audio & video tokens. submitted by /u/bo_peng [link] [comments]  ( 90 min )
    [D] What’s the latest in ML music generation?
    I have been peripherally interested in ML music generation models over the years and followed Google’s magenta project. Recently I’m trying to get up to speed with what’s been happening in this space. Looking at magenta’s homepage, it seems like the last paper they published was Listen to transformers, about 2 years ago. Does anyone know what’s been going on recently? Why off magenta dead? Has any other company/lab been working on this and open source their models? submitted by /u/iamjaiyam [link] [comments]  ( 89 min )
    [D] [Discussion] Gaussian Distribuction - PhD Level Very tricky equations
    The following dissertation comes from page 83 of Bishop's Book, which is reachable through this link: http://users.isr.ist.utl.pt/~wurmd/Livros/school/Bishop%20-%20Pattern%20Recognition%20And%20Machine%20Learning%20-%20Springer%20%202006.pdf?fbclid=IwAR3ZLtsUFTetN7wgLmxoRt5R32tF-OHuQhlnqt9lPy_ldKuLChv4BCZm-2I I found the equation in (2.61) very tricky and no one in my PhD Lab. was able to find out how they come up. Is there someone able to clarify this mathematically? Thank you in advance. https://preview.redd.it/tir533r7qqb91.png?width=748&format=png&auto=webp&s=686cec850580ed8496da3c4bd320a595d0dabc98 submitted by /u/ProfitCute5415 [link] [comments]  ( 88 min )
    [P] A python module to fetch relevant papers based on keywords from different sources, including Arxiv, ACL, ACM, PMLR, CVF etc. and fetch all citations of a research paper from google scholar
    Hi folks, I was working on a personal experimental project, which I thought of making it open source now. It saves much time for literature research. If you are an industrial researcher or in academia, you probably spend much time reading research articles and news related to your topic. If you try to search papers related to your topic, finding relevant documents on the internet takes time. You probably know the pain of extracting citations of articles from different websites. Previously I used to fetch papers from google or semantic scholar, but semantic scholar does not show correct paper citations. I am excited to announce RESP: Research Papers Search Features: Fetch all citations of a single paper from Google Scholar in CSV format Fetch all related papers of a single paper from Google Scholar in CSV format Fetch all connected papers from connectedpapers.com (it does not use a citation tree, it uses similarity to build graphs) in CSV format Fetch relevant papers based on keywords from different sources, including Arxiv, ACL, ACM, PMLR, NeurIPS, cvf etc., in CSV format GITHUB: https://github.com/monk1337/resp Examples: https://github.com/monk1337/resp/tree/main/examples I hope it will be helpful in your research. Thanks :) submitted by /u/aadityaura [link] [comments]  ( 89 min )
    [D] Does anyone else feel that machine learning papers are getting very "wordy"?
    So I was looking at which papers cited "On the Opportunities and Risks of Foundation Models" (which is a very wordy paper), using Google scholar, when I realized that most of the papers citing it are also very wordy. I don't know any of the authors here, and I'm just picking a few that I saw: https://arxiv.org/pdf/2202.07096.pdf https://arxiv.org/pdf/2109.07573.pdf https://arxiv.org/pdf/2203.07785.pdf https://arxiv.org/pdf/2111.07765.pdf https://arxiv.org/pdf/2111.15366.pdf https://arxiv.org/pdf/2109.08270.pdf https://arxiv.org/pdf/2110.15444.pdf https://arxiv.org/pdf/2205.00538.pdf Now a lot of these papers look interesting, but I am turned off by just wall after wall of text (usually not self-contained and references a bunch of prior work). Does anyone else feel like this is becoming a trend in ML? Or are these kind of style the norm in this field? submitted by /u/fromnighttilldawn [link] [comments]  ( 91 min )
    [D] 2nd AutoML Fall School
    With last years interest of this subreddit on our AutoML Fall School, we are happy to announce the second AutoML Fall School which will be in-person in Freiburg (Germany) from October 10th to October 13th. AutoML can be a vital tool for many machine learning practitioners and researchers. While students and professionals are eager to learn more about AutoML, it is rarely taught and addressed in courses in today’s academic landscape. With the the AutoML Fall School we aim to close this glaring gap by providing a platform for graduate students and researchers to learn about core aspects of AutoML. The event will feature lectures and invited talks by renowned experts about topics from fundamental theory to advanced state-of-the-art methods and current challenges such as neural architecture search and automated reinforcement learning. Additionally, you will be able to try your hands at implementing leading AutoML solutions in our hands-on sessions while being mentored by AutoML experts as well as network and exchange ideas in our social events and much more. Registrations are now open! Find a preliminary schedule, additional information, and the registration details on our official website. In case you need even more motivation. The city of Freiburg im Breisgau, where we will host the venue, was ranked 3rd in the category Top 10 Cities in "Best in Travel 2022" by lonely planet. We are looking forward to seeing you in October in Germany's greenest and sunniest city. submitted by /u/Science_Squid [link] [comments]  ( 88 min )
    [D] Resources on writing reproducible code?
    What are some resources on writing reproducible (excluding ofc floating point addition randomness, etc) ML code? So far, I'm following pretty standard software engineering principles that I learned in class: documentation comments, modularization, function deduplication, READMEs. I'm planning to write unit tests for some of my preprocessing steps as well. But there is a whole class of other factors that affect the code: the GPU I'm using, system parameters, etc. Short of just listing the computer specs, is there any easy way to perhaps bundle things like drivers into a git repo? submitted by /u/ElectronicCress3132 [link] [comments]  ( 89 min )
    [D] What are people using to organize large groups of people for data labelling?
    I'm thinking of hiring a bunch of people to label a ton of data. What is the best software to do this? I specifically want to use my own labelers. submitted by /u/vanilla-acc [link] [comments]  ( 94 min )
    [P] How to tackle Time-series Classification with a large number of categorical variables/attributes ( >100) with high cardinality? I'm open to discussing other ways as well.
    I am predicting whether the particular event would occur or not in the next n-timeframes given the categorical variables with high cardinality. Please let me know if there is anything that we can do to tackle this problem. submitted by /u/madlad612 [link] [comments]  ( 89 min )
    [D] ML architecture for adaptive setting suggestions in a stage-dependent program
    I've got a problem that I could use some insight on. ​ Summary: I need to design a ML architecture that suggests parameters to a program depending on the observed performance of the user. The program has a set number of stages and the objective of the program is to improve user performance as much as possible within the stage limit. ​ Problem description: To provide an example, let's say we have 3 stages in the program and the user starts off at stage 1. The program takes two parameters at each stage that determine the difficulty of that stage: Alpha and Beta, which both range from [0, 10], inclusive. The user completes stage 1 and a summarization score on user performance is returned based on a radar chart produced by the program. For this example, let's say the score is a 3 out of…  ( 89 min )
  • Open

    Build a news-based real-time alert system with Twitter, Amazon SageMaker, and Hugging Face
    Today, social media is a huge source of news. Users rely on platforms like Facebook and Twitter to consume news. For certain industries such as insurance companies, first respondents, law enforcement, and government agencies, being able to quickly process news about relevant events occurring can help them take action while these events are still unfolding. […]  ( 9 min )
  • Open

    Gaussian elimination
    When you solve systems of linear equations, you probably use Gaussian elimination, even if you don’t call it that. You may learn Gaussian elimination before you see it formalized in terms of matrices. So if you’ve had a course in linear algebra, and you sign up for a course in numerical linear algebra, it’s natural […] Gaussian elimination first appeared on John D. Cook.  ( 5 min )
  • Open

    Meet the Omnivore: Animator Entertains and Explains With NVIDIA Omniverse
    Australian animator Marko Matosevic is taking jokes from a children’s school dads’ group and breathing them into animated life with NVIDIA Omniverse, a virtual world simulation and collaboration platform for 3D workflows. The post Meet the Omnivore: Animator Entertains and Explains With NVIDIA Omniverse appeared first on NVIDIA Blog.  ( 5 min )
  • Open

    The Kitten Effect
    One thing I've noticed with image-generating algorithms is that the more of something they have to put in an image, the worse it is. I first noticed this with the kitten-generating variant of StyleGAN, which often does okay on one cat: alternative for shocked_pikachu.png but is  ( 4 min )
    Bonus: How closely can I look at a giraffe?
    AI Weirdness: the strange side of machine learning  ( 2 min )
  • Open

    Top AI Resources You Must Follow If You Are Into AI
    How to keep up with the latest machine learning advancements  ( 14 min )
  • Open

    Loss Functions in TensorFlow
    Loss metric is very important for neural networks. As all machine learning model is a optimization problem or another, the loss is the objective function to minimize. In neural networks, the optimization is done with gradient descent and backpropagation. But what are loss functions and how are they affecting our neural networks? In this post, […] The post Loss Functions in TensorFlow appeared first on Machine Learning Mastery.  ( 22 min )
  • Open

    Several Approximation Algorithms for Sparse Best Rank-1 Approximation to Higher-Order Tensors. (arXiv:2012.03092v2 [math.NA] UPDATED)
    Sparse tensor best rank-1 approximation (BR1Approx), which is a sparsity generalization of the dense tensor BR1Approx, and is a higher-order extension of the sparse matrix BR1Approx, is one of the most important problems in sparse tensor decomposition and related problems arising from statistics and machine learning. By exploiting the multilinearity as well as the sparsity structure of the problem, four approximation algorithms are proposed, which are easily implemented, of low computational complexity, and can serve as initial procedures for iterative algorithms. In addition, theoretically guaranteed worst-case approximation lower bounds are proved for all the algorithms. We provide numerical experiments on synthetic and real data to illustrate the effectiveness of the proposed algorithms.  ( 2 min )
    Wide Neural Networks Forget Less Catastrophically. (arXiv:2110.11526v3 [cs.LG] UPDATED)
    A primary focus area in continual learning research is alleviating the "catastrophic forgetting" problem in neural networks by designing new algorithms that are more robust to the distribution shifts. While the recent progress in continual learning literature is encouraging, our understanding of what properties of neural networks contribute to catastrophic forgetting is still limited. To address this, instead of focusing on continual learning algorithms, in this work, we focus on the model itself and study the impact of "width" of the neural network architecture on catastrophic forgetting, and show that width has a surprisingly significant effect on forgetting. To explain this effect, we study the learning dynamics of the network from various perspectives such as gradient orthogonality, sparsity, and lazy training regime. We provide potential explanations that are consistent with the empirical results across different architectures and continual learning benchmarks.  ( 2 min )
    Lipschitz Continuity Retained Binary Neural Network. (arXiv:2207.06540v1 [cs.LG])
    Relying on the premise that the performance of a binary neural network can be largely restored with eliminated quantization error between full-precision weight vectors and their corresponding binary vectors, existing works of network binarization frequently adopt the idea of model robustness to reach the aforementioned objective. However, robustness remains to be an ill-defined concept without solid theoretical support. In this work, we introduce the Lipschitz continuity, a well-defined functional property, as the rigorous criteria to define the model robustness for BNN. We then propose to retain the Lipschitz continuity as a regularization term to improve the model robustness. Particularly, while the popular Lipschitz-involved regularization methods often collapse in BNN due to its extreme sparsity, we design the Retention Matrices to approximate spectral norms of the targeted weight matrices, which can be deployed as the approximation for the Lipschitz constant of BNNs without the exact Lipschitz constant computation (NP-hard). Our experiments prove that our BNN-specific regularization method can effectively strengthen the robustness of BNN (testified on ImageNet-C), achieving state-of-the-art performance on CIFAR and ImageNet.  ( 2 min )
    Distance Learner: Incorporating Manifold Prior to Model Training. (arXiv:2207.06888v1 [cs.LG])
    The manifold hypothesis (real world data concentrates near low-dimensional manifolds) is suggested as the principle behind the effectiveness of machine learning algorithms in very high dimensional problems that are common in domains such as vision and speech. Multiple methods have been proposed to explicitly incorporate the manifold hypothesis as a prior in modern Deep Neural Networks (DNNs), with varying success. In this paper, we propose a new method, Distance Learner, to incorporate this prior for DNN-based classifiers. Distance Learner is trained to predict the distance of a point from the underlying manifold of each class, rather than the class label. For classification, Distance Learner then chooses the class corresponding to the closest predicted class manifold. Distance Learner can also identify points as being out of distribution (belonging to neither class), if the distance to the closest manifold is higher than a threshold. We evaluate our method on multiple synthetic datasets and show that Distance Learner learns much more meaningful classification boundaries compared to a standard classifier. We also evaluate our method on the task of adversarial robustness, and find that it not only outperforms standard classifier by a large margin, but also performs at par with classifiers trained via state-of-the-art adversarial training.  ( 2 min )
    Language models show human-like content effects on reasoning. (arXiv:2207.07051v1 [cs.CL])
    Abstract reasoning is a key ability for an intelligent system. Large language models achieve above-chance performance on abstract reasoning tasks, but exhibit many imperfections. However, human abstract reasoning is also imperfect, and depends on our knowledge and beliefs about the content of the reasoning problem. For example, humans reason much more reliably about logical rules that are grounded in everyday situations than arbitrary rules about abstract attributes. The training experiences of language models similarly endow them with prior expectations that reflect human knowledge and beliefs. We therefore hypothesized that language models would show human-like content effects on abstract reasoning problems. We explored this hypothesis across three logical reasoning tasks: natural language inference, judging the logical validity of syllogisms, and the Wason selection task (Wason, 1968). We find that state of the art large language models (with 7 or 70 billion parameters; Hoffman et al., 2022) reflect many of the same patterns observed in humans across these tasks -- like humans, models reason more effectively about believable situations than unrealistic or abstract ones. Our findings have implications for understanding both these cognitive effects, and the factors that contribute to language model performance.  ( 2 min )
    problexity -- an open-source Python library for binary classification problem complexity assessment. (arXiv:2207.06709v1 [cs.LG])
    The classification problem's complexity assessment is an essential element of many topics in the supervised learning domain. It plays a significant role in meta-learning -- becoming the basis for determining meta-attributes or multi-criteria optimization -- allowing the evaluation of the training set resampling without needing to rebuild the recognition model. The tools currently available for the academic community, which would enable the calculation of problem complexity measures, are available only as libraries of the C++ and R languages. This paper describes the software module that allows for the estimation of 22 complexity measures for the Python language -- compatible with the scikit-learn programming interface -- allowing for the implementation of research using them in the most popular programming environment of the machine learning community.  ( 2 min )
    Using Model-Based Trees with Boosting to Fit Low-Order Functional ANOVA Models. (arXiv:2207.06950v1 [stat.ML])
    Low-order functional ANOVA (fANOVA) models have been rediscovered in the machine learning (ML) community under the guise of inherently interpretable machine learning. Explainable Boosting Machines or EBM (Lou et al. 2013) and GAMI-Net (Yang et al. 2021) are two recently proposed ML algorithms for fitting functional main effects and second-order interactions. We propose a new algorithm, called GAMI-Tree, that is similar to EBM, but has a number of features that lead to better performance. It uses model-based trees as base learners and incorporates a new interaction filtering method that is better at capturing the underlying interactions. In addition, our iterative training method converges to a model with better predictive performance, and the embedded purification ensures that interactions are hierarchically orthogonal to main effects. The algorithm does not need extensive tuning, and our implementation is fast and efficient. We use simulated and real datasets to compare the performance and interpretability of GAMI-Tree with EBM and GAMI-Net.  ( 2 min )
    Multitrack Music Transformer: Learning Long-Term Dependencies in Music with Diverse Instruments. (arXiv:2207.06983v1 [cs.SD])
    Existing approaches for generating multitrack music with transformer models have been limited to either a small set of instruments or short music segments. This is partly due to the memory requirements of the lengthy input sequences necessitated by existing representations for multitrack music. In this work, we propose a compact representation that allows a diverse set of instruments while keeping a short sequence length. Using our proposed representation, we present the Multitrack Music Transformer (MTMT) for learning long-term dependencies in multitrack music. In a subjective listening test, our proposed model achieves competitive quality on unconditioned generation against two baseline models. We also show that our proposed model can generate samples that are twice as long as those produced by the baseline models, and, further, can do so in half the inference time. Moreover, we propose a new measure for analyzing musical self-attentions and show that the trained model learns to pay less attention to notes that form a dissonant interval with the current note, yet attending more to notes that are 4N beats away from current. Finally, our findings provide a novel foundation for future work exploring longer-form multitrack music generation and improving self-attentions for music. All source code and audio samples can be found at https://salu133445.github.io/mtmt/ .  ( 3 min )
    Verification of Sigmoidal Artificial Neural Networks using iSAT. (arXiv:2207.06755v1 [cs.AI])
    This paper presents an approach for verifying the behaviour of nonlinear Artificial Neural Networks (ANNs) found in cyber-physical safety-critical systems. We implement a dedicated interval constraint propagator for the sigmoid function into the SMT solver iSAT and compare this approach with a compositional approach encoding the sigmoid function by basic arithmetic features available in iSAT and an approximating approach. Our experimental results show that the dedicated and the compositional approach clearly outperform the approximating approach. Throughout all our benchmarks, the dedicated approach showed an equal or better performance compared to the compositional approach.  ( 2 min )
    A Personalized Zero-Shot ECG Arrhythmia Monitoring System: From Sparse Representation Based Domain Adaption to Energy Efficient Abnormal Beat Detection for Practical ECG Surveillance. (arXiv:2207.07089v1 [cs.LG])
    This paper proposes a low-cost and highly accurate ECG-monitoring system intended for personalized early arrhythmia detection for wearable mobile sensors. Earlier supervised approaches for personalized ECG monitoring require both abnormal and normal heartbeats for the training of the dedicated classifier. However, in a real-world scenario where the personalized algorithm is embedded in a wearable device, such training data is not available for healthy people with no cardiac disorder history. In this study, (i) we propose a null space analysis on the healthy signal space obtained via sparse dictionary learning, and investigate how a simple null space projection or alternatively regularized least squares-based classification methods can reduce the computational complexity, without sacrificing the detection accuracy, when compared to sparse representation-based classification. (ii) Then we introduce a sparse representation-based domain adaptation technique in order to project other existing users' abnormal and normal signals onto the new user's signal space, enabling us to train the dedicated classifier without having any abnormal heartbeat of the new user. Therefore, zero-shot learning can be achieved without the need for synthetic abnormal heartbeat generation. An extensive set of experiments performed on the benchmark MIT-BIH ECG dataset shows that when this domain adaptation-based training data generator is used with a simple 1-D CNN classifier, the method outperforms the prior work by a significant margin. (iii) Then, by combining (i) and (ii), we propose an ensemble classifier that further improves the performance. This approach for zero-shot arrhythmia detection achieves an average accuracy level of 98.2% and an F1-Score of 92.8%. Finally, a personalized energy-efficient ECG monitoring scheme is proposed using the above-mentioned innovations.  ( 3 min )
    On the Strong Correlation Between Model Invariance and Generalization. (arXiv:2207.07065v1 [cs.LG])
    Generalization and invariance are two essential properties of any machine learning model. Generalization captures a model's ability to classify unseen data while invariance measures consistency of model predictions on transformations of the data. Existing research suggests a positive relationship: a model generalizing well should be invariant to certain visual factors. Building on this qualitative implication we make two contributions. First, we introduce effective invariance (EI), a simple and reasonable measure of model invariance which does not rely on image labels. Given predictions on a test image and its transformed version, EI measures how well the predictions agree and with what level of confidence. Second, using invariance scores computed by EI, we perform large-scale quantitative correlation studies between generalization and invariance, focusing on rotation and grayscale transformations. From a model-centric view, we observe generalization and invariance of different models exhibit a strong linear relationship, on both in-distribution and out-of-distribution datasets. From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets. Apart from these major findings, other minor but interesting insights are also discussed.  ( 2 min )
    An Asymmetric Contrastive Loss for Handling Imbalanced Datasets. (arXiv:2207.07080v1 [cs.LG])
    Contrastive learning is a representation learning method performed by contrasting a sample to other similar samples so that they are brought closely together, forming clusters in the feature space. The learning process is typically conducted using a two-stage training architecture, and it utilizes the contrastive loss (CL) for its feature learning. Contrastive learning has been shown to be quite successful in handling imbalanced datasets, in which some classes are overrepresented while some others are underrepresented. However, previous studies have not specifically modified CL for imbalanced datasets. In this work, we introduce an asymmetric version of CL, referred to as ACL, in order to directly address the problem of class imbalance. In addition, we propose the asymmetric focal contrastive loss (AFCL) as a further generalization of both ACL and focal contrastive loss (FCL). Results on the FMNIST and ISIC 2018 imbalanced datasets show that AFCL is capable of outperforming CL and FCL in terms of both weighted and unweighted classification accuracies. In the appendix, we provide a full axiomatic treatment on entropy, along with complete proofs.  ( 2 min )
    Likelihood Training of Schr\"odinger Bridge using Forward-Backward SDEs Theory. (arXiv:2110.11291v4 [stat.ML] UPDATED)
    Schr\"odinger Bridge (SB) is an entropy-regularized optimal transport problem that has received increasing attention in deep generative modeling for its mathematical flexibility compared to the Scored-based Generative Model (SGM). However, it remains unclear whether the optimization principle of SB relates to the modern training of deep generative models, which often rely on constructing log-likelihood objectives.This raises questions on the suitability of SB models as a principled alternative for generative applications. In this work, we present a novel computational framework for likelihood training of SB models grounded on Forward-Backward Stochastic Differential Equations Theory - a mathematical methodology appeared in stochastic optimal control that transforms the optimality condition of SB into a set of SDEs. Crucially, these SDEs can be used to construct the likelihood objectives for SB that, surprisingly, generalizes the ones for SGM as special cases. This leads to a new optimization principle that inherits the same SB optimality yet without losing applications of modern generative training techniques, and we show that the resulting training algorithm achieves comparable results on generating realistic images on MNIST, CelebA, and CIFAR10. Our code is available at https://github.com/ghliu/SB-FBSDE.  ( 3 min )
    Work In Progress: Safety and Robustness Verification of Autoencoder-Based Regression Models using the NNV Tool. (arXiv:2207.06759v1 [cs.LG])
    This work in progress paper introduces robustness verification for autoencoder-based regression neural network (NN) models, following state-of-the-art approaches for robustness verification of image classification NNs. Despite the ongoing progress in developing verification methods for safety and robustness in various deep neural networks (DNNs), robustness checking of autoencoder models has not yet been considered. We explore this open space of research and check ways to bridge the gap between existing DNN verification methods by extending existing robustness analysis methods for such autoencoder networks. While classification models using autoencoders work more or less similar to image classification NNs, the functionality of regression models is distinctly different. We introduce two definitions of robustness evaluation metrics for autoencoder-based regression models, specifically the percentage robustness and un-robustness grade. We also modified the existing Imagestar approach, adjusting the variables to take care of the specific input types for regression networks. The approach is implemented as an extension of NNV, then applied and evaluated on a dataset, with a case study experiment shown using the same dataset. As per the authors' understanding, this work in progress paper is the first to show possible reachability analysis of autoencoder-based NNs.  ( 3 min )
    Noise-Stable Rigid Graphs for Euclidean Embedding. (arXiv:1907.06441v5 [cs.CG] UPDATED)
    We proposed a new criterion \textit{noise-stability}, which revised the classical rigidity theory, for evaluation of MDS algorithms which can truthfully represent the fidelity of global structure reconstruction; then we proved the noise-stability of the cMDS algorithm in generic conditions, which provides a rigorous theoretical guarantee for the precision and theoretical bounds for Euclidean embedding and its application in fields including wireless sensor network localization and satellite positioning. Furthermore, we looked into previous work about minimum-cost globally rigid spanning subgraph, and proposed an algorithm to construct a minimum-cost noise-stable spanning graph in the Euclidean space, which enabled reliable localization on sparse graphs of noisy distance constraints with linear numbers of edges and sublinear costs in total edge lengths. Additionally, this algorithm also suggests a scheme to reconstruct point clouds from pairwise distances at a minimum of $O(n)$ time complexity, down from $O(n^3)$ for cMDS.  ( 2 min )
    Data-Free Neural Architecture Search via Recursive Label Calibration. (arXiv:2112.02086v2 [cs.LG] UPDATED)
    This paper aims to explore the feasibility of neural architecture search (NAS) given only a pre-trained model without using any original training data. This is an important circumstance for privacy protection, bias avoidance, etc., in real-world scenarios. To achieve this, we start by synthesizing usable data through recovering the knowledge from a pre-trained deep neural network. Then we use the synthesized data and their predicted soft-labels to guide neural architecture search. We identify that the NAS task requires the synthesized data (we target at image domain here) with enough semantics, diversity, and a minimal domain gap from the natural images. For semantics, we propose recursive label calibration to produce more informative outputs. For diversity, we propose a regional update strategy to generate more diverse and semantically-enriched synthetic data. For minimal domain gap, we use input and feature-level regularization to mimic the original data distribution in latent space. We instantiate our proposed framework with three popular NAS algorithms: DARTS, ProxylessNAS and SPOS. Surprisingly, our results demonstrate that the architectures discovered by searching with our synthetic data achieve accuracy that is comparable to, or even higher than, architectures discovered by searching from the original ones, for the first time, deriving the conclusion that NAS can be done effectively with no need of access to the original or called natural data if the synthesis method is well designed.  ( 3 min )
    Bootstrapped Masked Autoencoders for Vision BERT Pretraining. (arXiv:2207.07116v1 [cs.CV])
    We propose bootstrapped masked autoencoders (BootMAE), a new approach for vision BERT pretraining. BootMAE improves the original masked autoencoders (MAE) with two core designs: 1) momentum encoder that provides online feature as extra BERT prediction targets; 2) target-aware decoder that tries to reduce the pressure on the encoder to memorize target-specific information in BERT pretraining. The first design is motivated by the observation that using a pretrained MAE to extract the features as the BERT prediction target for masked tokens can achieve better pretraining performance. Therefore, we add a momentum encoder in parallel with the original MAE encoder, which bootstraps the pretraining performance by using its own representation as the BERT prediction target. In the second design, we introduce target-specific information (e.g., pixel values of unmasked patches) from the encoder directly to the decoder to reduce the pressure on the encoder of memorizing the target-specific information. Thus, the encoder focuses on semantic modeling, which is the goal of BERT pretraining, and does not need to waste its capacity in memorizing the information of unmasked tokens related to the prediction target. Through extensive experiments, our BootMAE achieves $84.2\%$ Top-1 accuracy on ImageNet-1K with ViT-B backbone, outperforming MAE by $+0.8\%$ under the same pre-training epochs. BootMAE also gets $+1.0$ mIoU improvements on semantic segmentation on ADE20K and $+1.3$ box AP, $+1.4$ mask AP improvement on object detection and segmentation on COCO dataset. Code is released at https://github.com/LightDXY/BootMAE.  ( 3 min )
    Graph Modularity: Towards Understanding the Cross-Layer Transition of Feature Representations in Deep Neural Networks. (arXiv:2111.12485v2 [cs.CV] UPDATED)
    There are good arguments to support the claim that deep neural networks (DNNs) capture better feature representations than the previous hand-crafted feature engineering, which leads to a significant performance improvement. In this paper, we move a tiny step towards understanding the dynamics of feature representations over layers. Specifically, we model the process of class separation of intermediate representations in pre-trained DNNs as the evolution of communities in dynamic graphs. Then, we introduce modularity, a generic metric in graph theory, to quantify the evolution of communities. In the preliminary experiment, we find that modularity roughly tends to increase as the layer goes deeper and the degradation and plateau arise when the model complexity is great relative to the dataset. Through an asymptotic analysis, we prove that modularity can be broadly used for different applications. For example, modularity provides new insights to quantify the difference between feature representations. More crucially, we demonstrate that the degradation and plateau in modularity curves represent redundant layers in DNNs and can be pruned with minimal impact on performance, which provides theoretical guidance for layer pruning. Our code is available at https://github.com/yaolu-zjut/Dynamic-Graphs-Construction.  ( 3 min )
    Fully Decentralized Model-based Policy Optimization for Networked Systems. (arXiv:2207.06559v1 [cs.LG])
    Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks. Such a challenge is more outstanding in multi-agent tasks, as each step of operation is more costly requiring communications or shifting or resources. This work aims to improve data efficiency of multi-agent control by model-based learning. We consider networked systems where agents are cooperative and communicate only locally with their neighbors, and propose the decentralized model-based policy optimization framework (DMPO). In our method, each agent learns a dynamic model to predict future states and broadcast their predictions by communication, and then the policies are trained under the model rollouts. To alleviate the bias of model-generated data, we restrain the model usage for generating myopic rollouts, thus reducing the compounding error of model generation. To pertain the independence of policy update, we introduce extended value function and theoretically prove that the resulting policy gradient is a close approximation to true policy gradients. We evaluate our algorithm on several benchmarks for intelligent transportation systems, which are connected autonomous vehicle control tasks (Flow and CACC) and adaptive traffic signal control (ATSC). Empirically results show that our method achieves superior data efficiency and matches the performance of model-free methods using true models.  ( 3 min )
    Discovery of New Multi-Level Features for Domain Generalization via Knowledge Corruption. (arXiv:2109.04320v2 [cs.LG] UPDATED)
    Machine learning models that can generalize to unseen domains are essential when applied in real-world scenarios involving strong domain shifts. We address the challenging domain generalization (DG) problem, where a model trained on a set of source domains is expected to generalize well in unseen domains without any exposure to their data. The main challenge of DG is that the features learned from the source domains are not necessarily present in the unseen target domains, leading to performance deterioration. We assume that learning a richer set of features is crucial to improve the transfer to a wider set of unknown domains. For this reason, we propose COLUMBUS, a method that enforces new feature discovery via a targeted corruption of the most relevant input and multi-level representations of the data. We conduct an extensive empirical evaluation to demonstrate the effectiveness of the proposed approach which achieves new state-of-the-art results by outperforming 18 DG algorithms on multiple DG benchmark datasets in the DomainBed framework.
    HyGNN: Drug-Drug Interaction Prediction via Hypergraph Neural Network. (arXiv:2206.12747v2 [q-bio.QM] UPDATED)
    Drug-Drug Interactions (DDIs) may hamper the functionalities of drugs, and in the worst scenario, they may lead to adverse drug reactions (ADRs). Predicting all DDIs is a challenging and critical problem. Most existing computational models integrate drug-centric information from different sources and leverage them as features in machine learning classifiers to predict DDIs. However, these models have a high chance of failure, especially for the new drugs when all the information is not available. This paper proposes a novel Hypergraph Neural Network (HyGNN) model based on only the SMILES string of drugs, available for any drug, for the DDI prediction problem. To capture the drug similarities, we create a hypergraph from drugs' chemical substructures extracted from the SMILES strings. Then, we develop HyGNN consisting of a novel attention-based hypergraph edge encoder to get the representation of drugs as hyperedges and a decoder to predict the interactions between drug pairs. Furthermore, we conduct extensive experiments to evaluate our model and compare it with several state-of-the-art methods. Experimental results demonstrate that our proposed HyGNN model effectively predicts DDIs and impressively outperforms the baselines with a maximum ROC-AUC and PR-AUC of 97.9% and 98.1%, respectively.
    Continuous-time Analysis for Variational Inequalities: An Overview and Desiderata. (arXiv:2207.07105v1 [stat.ML])
    Algorithms that solve zero-sum games, multi-objective agent objectives, or, more generally, variational inequality (VI) problems are notoriously unstable on general problems. Owing to the increasing need for solving such problems in machine learning, this instability has been highlighted in recent years as a significant research challenge. In this paper, we provide an overview of recent progress in the use of continuous-time perspectives in the analysis and design of methods targeting the broad VI problem class. Our presentation draws parallels between single-objective problems and multi-objective problems, highlighting the challenges of the latter. We also formulate various desiderata for algorithms that apply to general VIs and we argue that achieving these desiderata may profit from an understanding of the associated continuous-time dynamics.
    AGIC: Approximate Gradient Inversion Attack on Federated Learning. (arXiv:2204.13784v3 [cs.LG] UPDATED)
    Federated learning is a private-by-design distributed learning paradigm where clients train local models on their own data before a central server aggregates their local updates to compute a global model. Depending on the aggregation method used, the local updates are either the gradients or the weights of local learning models. Recent reconstruction attacks apply a gradient inversion optimization on the gradient update of a single minibatch to reconstruct the private data used by clients during training. As the state-of-the-art reconstruction attacks solely focus on single update, realistic adversarial scenarios are overlooked, such as observation across multiple updates and updates trained from multiple mini-batches. A few studies consider a more challenging adversarial scenario where only model updates based on multiple mini-batches are observable, and resort to computationally expensive simulation to untangle the underlying samples for each local step. In this paper, we propose AGIC, a novel Approximate Gradient Inversion Attack that efficiently and effectively reconstructs images from both model or gradient updates, and across multiple epochs. In a nutshell, AGIC (i) approximates gradient updates of used training samples from model updates to avoid costly simulation procedures, (ii) leverages gradient/model updates collected from multiple epochs, and (iii) assigns increasing weights to layers with respect to the neural network structure for reconstruction quality. We extensively evaluate AGIC on three datasets, CIFAR-10, CIFAR-100 and ImageNet. Our results show that AGIC increases the peak signal-to-noise ratio (PSNR) by up to 50% compared to two representative state-of-the-art gradient inversion attacks. Furthermore, AGIC is faster than the state-of-the-art simulation based attack, e.g., it is 5x faster when attacking FedAvg with 8 local steps in between model updates.
    A comparison of latent semantic analysis and correspondence analysis of document-term matrices. (arXiv:2108.06197v3 [cs.IR] UPDATED)
    Latent semantic analysis (LSA) and correspondence analysis (CA) are two techniques that use a singular value decomposition (SVD) for dimensionality reduction. LSA has been extensively used to obtain low-dimensional representations that capture relationships among documents and terms. In this article, we present a theoretical analysis and comparison of the two techniques in the context of document-term matrices. We show that CA has some attractive properties as compared to LSA, for instance that effects of margins arising from differing document-lengths and term-frequencies are effectively eliminated, so that the CA solution is optimally suited to focus on relationships among documents and terms. A unifying framework is proposed that includes both CA and LSA as special cases. We empirically compare CA to various LSA based methods on text categorization in English and authorship attribution on historical Dutch texts, and find that CA performs significantly better. We also apply CA to a long-standing question regarding the authorship of the Dutch national anthem Wilhelmus and provide further support that it can be attributed to the author Datheen, amongst several contenders.
    A survey on domain adaptation theory: learning bounds and theoretical guarantees. (arXiv:2004.11829v6 [cs.LG] UPDATED)
    All famous machine learning algorithms that comprise both supervised and semi-supervised learning work well only under a common assumption: the training and test data follow the same distribution. When the distribution changes, most statistical models must be reconstructed from newly collected data, which for some applications can be costly or impossible to obtain. Therefore, it has become necessary to develop approaches that reduce the need and the effort to obtain new labeled samples by exploiting data that are available in related areas, and using these further across similar fields. This has given rise to a new machine learning framework known as transfer learning: a learning setting inspired by the capability of a human being to extrapolate knowledge across tasks to learn more efficiently. Despite a large amount of different transfer learning scenarios, the main objective of this survey is to provide an overview of the state-of-the-art theoretical results in a specific, and arguably the most popular, sub-field of transfer learning, called domain adaptation. In this sub-field, the data distribution is assumed to change across the training and the test data, while the learning task remains the same. We provide a first up-to-date description of existing results related to domain adaptation problem that cover learning bounds based on different statistical learning frameworks.
    Adversarial Graph Contrastive Learning with Information Regularization. (arXiv:2202.06491v4 [cs.LG] UPDATED)
    Contrastive learning is an effective unsupervised method in graph representation learning. Recently, the data augmentation based contrastive learning method has been extended from images to graphs. However, most prior works are directly adapted from the models designed for images. Unlike the data augmentation on images, the data augmentation on graphs is far less intuitive and much harder to provide high-quality contrastive samples, which are the key to the performance of contrastive learning models. This leaves much space for improvement over the existing graph contrastive learning frameworks. In this work, by introducing an adversarial graph view and an information regularizer, we propose a simple but effective method, Adversarial Graph Contrastive Learning (ARIEL), to extract informative contrastive samples within a reasonable constraint. It consistently outperforms the current graph contrastive learning methods in the node classification task over various real-world datasets and further improves the robustness of graph contrastive learning.
    Interpretable Decision Trees Through MaxSAT. (arXiv:2110.13854v2 [cs.AI] UPDATED)
    We present an approach to improve the accuracy-interpretability trade-off of Machine Learning (ML) Decision Trees (DTs). In particular, we apply Maximum Satisfiability technology to compute Minimum Pure DTs (MPDTs). We improve the runtime of previous approaches and, show that these MPDTs can outperform the accuracy of DTs generated with the ML framework sklearn.
    Bayesian Inference with Nonlinear Generative Models: Comments on Secure Learning. (arXiv:2201.09986v3 [cs.IT] UPDATED)
    Unlike the classical linear model, nonlinear generative models have been addressed sparsely in the literature of statistical learning. This work aims to bringing attention to these models and their secrecy potential. To this end, we invoke the replica method to derive the asymptotic normalized cross entropy in an inverse probability problem whose generative model is described by a Gaussian random field with a generic covariance function. Our derivations further demonstrate the asymptotic statistical decoupling of the Bayesian estimator and specify the decoupled setting for a given nonlinear model. The replica solution depicts that strictly nonlinear models establish an all-or-nothing phase transition: There exists a critical load at which the optimal Bayesian inference changes from perfect to an uncorrelated learning. Based on this finding, we design a new secure coding scheme which achieves the secrecy capacity of the wiretap channel. This interesting result implies that strictly nonlinear generative models are perfectly secured without any secure coding. We justify this latter statement through the analysis of an illustrative model for perfectly secure and reliable inference.
    Multilinguals at SemEval-2022 Task 11: Complex NER in Semantically Ambiguous Settings for Low Resource Languages. (arXiv:2207.06882v1 [cs.CL])
    We leverage pre-trained language models to solve the task of complex NER for two low-resource languages: Chinese and Spanish. We use the technique of Whole Word Masking(WWM) to boost the performance of masked language modeling objective on large and unsupervised corpora. We experiment with multiple neural network architectures, incorporating CRF, BiLSTMs, and Linear Classifiers on top of a fine-tuned BERT layer. All our models outperform the baseline by a significant margin and our best performing model obtains a competitive position on the evaluation leaderboard for the blind test set.
    Instance Selection Mechanisms for Human-in-the-Loop Systems in Few-Shot Learning. (arXiv:2207.06835v1 [cs.LG])
    Business analytics and machine learning have become essential success factors for various industries - with the downside of cost-intensive gathering and labeling of data. Few-shot learning addresses this challenge and reduces data gathering and labeling costs by learning novel classes with very few labeled data. In this paper, we design a human-in-the-loop (HITL) system for few-shot learning and analyze an extensive range of mechanisms that can be used to acquire human expert knowledge for instances that have an uncertain prediction outcome. We show that the acquisition of human expert knowledge significantly accelerates the few-shot model performance given a negligible labeling effort. We validate our findings in various experiments on a benchmark dataset in computer vision and real-world datasets. We further demonstrate the cost-effectiveness of HITL systems for few-shot learning. Overall, our work aims at supporting researchers and practitioners in effectively adapting machine learning models to novel classes at reduced costs.
    Confident Adaptive Language Modeling. (arXiv:2207.07061v1 [cs.CL])
    Recent advances in Transformer-based large language models (LLMs) have led to significant performance improvements across many tasks. These gains come with a drastic increase in the models' size, potentially leading to slow and costly use at inference time. In practice, however, the series of generations made by LLMs is composed of varying levels of difficulty. While certain predictions truly benefit from the models' full capacity, other continuations are more trivial and can be solved with reduced compute. In this work, we introduce Confident Adaptive Language Modeling (CALM), a framework for dynamically allocating different amounts of compute per input and generation timestep. Early exit decoding involves several challenges that we address here, such as: (1) what confidence measure to use; (2) connecting sequence-level constraints to local per-token exit decisions; and (3) attending back to missing hidden representations due to early exits in previous tokens. Through theoretical analysis and empirical experiments on three diverse text generation tasks, we demonstrate the efficacy of our framework in reducing compute -- potential speedup of up to $\times 3$ -- while provably maintaining high performance.
    Reachability Analysis of a General Class of Neural Ordinary Differential Equations. (arXiv:2207.06531v1 [cs.LG])
    Continuous deep learning models, referred to as Neural Ordinary Differential Equations (Neural ODEs), have received considerable attention over the last several years. Despite their burgeoning impact, there is a lack of formal analysis techniques for these systems. In this paper, we consider a general class of neural ODEs with varying architectures and layers, and introduce a novel reachability framework that allows for the formal analysis of their behavior. The methods developed for the reachability analysis of neural ODEs are implemented in a new tool called NNVODE. Specifically, our work extends an existing neural network verification tool to support neural ODEs. We demonstrate the capabilities and efficacy of our methods through the analysis of a set of benchmarks that include neural ODEs used for classification, and in control and dynamical systems, including an evaluation of the efficacy and capabilities of our approach with respect to existing software tools within the continuous-time systems reachability literature, when it is possible to do so.
    PIAT: Physics Informed Adversarial Training for Solving Partial Differential Equations. (arXiv:2207.06647v1 [cs.LG])
    In this paper, we propose the physics informed adversarial training (PIAT) of neural networks for solving nonlinear differential equations (NDE). It is well-known that the standard training of neural networks results in non-smooth functions. Adversarial training (AT) is an established defense mechanism against adversarial attacks, which could also help in making the solution smooth. AT include augmenting the training mini-batch with a perturbation that makes the network output mismatch the desired output adversarially. Unlike formal AT, which relies only on the training data, here we encode the governing physical laws in the form of nonlinear differential equations using automatic differentiation in the adversarial network architecture. We compare PIAT with PINN to indicate the effectiveness of our method in solving NDEs for up to 10 dimensions. Moreover, we propose weight decay and Gaussian smoothing to demonstrate the PIAT advantages. The code repository is available at https://github.com/rohban-lab/PIAT.
    Large-scale Knowledge Distillation with Elastic Heterogeneous Computing Resources. (arXiv:2207.06667v1 [cs.DC])
    Although more layers and more parameters generally improve the accuracy of the models, such big models generally have high computational complexity and require big memory, which exceed the capacity of small devices for inference and incurs long training time. In addition, it is difficult to afford long training time and inference time of big models even in high performance servers, as well. As an efficient approach to compress a large deep model (a teacher model) to a compact model (a student model), knowledge distillation emerges as a promising approach to deal with the big models. Existing knowledge distillation methods cannot exploit the elastic available computing resources and correspond to low efficiency. In this paper, we propose an Elastic Deep Learning framework for knowledge Distillation, i.e., EDL-Dist. The advantages of EDL-Dist are three-fold. First, the inference and the training process is separated. Second, elastic available computing resources can be utilized to improve the efficiency. Third, fault-tolerance of the training and inference processes is supported. We take extensive experimentation to show that the throughput of EDL-Dist is up to 3.125 times faster than the baseline method (online knowledge distillation) while the accuracy is similar or higher.
    A Unified Granular-ball Learning Model of Pawlak Rough Set and Neighborhood Rough Set. (arXiv:2201.03349v4 [cs.AI] UPDATED)
    Pawlak rough set and neighborhood rough set are the two most common rough set theoretical models. Pawlak can use equivalence classes to represent knowledge, but it cannot process continuous data; neighborhood rough sets can process continuous data, but it loses the ability of using equivalence classes to represent knowledge. To this end, this paper presents a granular-ball rough set based on the granular-ball computing. The granular-ball rough set can simultaneously represent Pawlak rough sets, and the neighborhood rough set, so as to realize the unified representation of the two. This makes the granular-ball rough set not only can deal with continuous data, but also can use equivalence classes for knowledge representation. In addition, we propose an implementation algorithms of granular-ball rough sets. The experimental results on benchmark datasets demonstrate that, due to the combination of the robustness and adaptability of the granular-ball computing, the learning accuracy of the granular-ball rough set has been greatly improved compared with the Pawlak rough set and the traditional neighborhood rough set. The granular-ball rough set also outperforms nine popular or the state-of-the-art feature selection methods.
    Blurs Behave Like Ensembles: Spatial Smoothings to Improve Accuracy, Uncertainty, and Robustness. (arXiv:2105.12639v4 [cs.LG] UPDATED)
    Neural network ensembles, such as Bayesian neural networks (BNNs), have shown success in the areas of uncertainty estimation and robustness. However, a crucial challenge prohibits their use in practice. BNNs require a large number of predictions to produce reliable results, leading to a significant increase in computational cost. To alleviate this issue, we propose spatial smoothing, a method that spatially ensembles neighboring feature map points of convolutional neural networks. By simply adding a few blur layers to the models, we empirically show that spatial smoothing improves accuracy, uncertainty estimation, and robustness of BNNs across a whole range of ensemble sizes. In particular, BNNs incorporating spatial smoothing achieve high predictive performance merely with a handful of ensembles. Moreover, this method also can be applied to canonical deterministic neural networks to improve the performances. A number of evidences suggest that the improvements can be attributed to the stabilized feature maps and the smoothing of the loss landscape. In addition, we provide a fundamental explanation for prior works - namely, global average pooling, pre-activation, and ReLU6 - by addressing them as special cases of spatial smoothing. These not only enhance accuracy, but also improve uncertainty estimation and robustness by making the loss landscape smoother in the same manner as spatial smoothing. The code is available at https://github.com/xxxnell/spatial-smoothing.
    CoSCL: Cooperation of Small Continual Learners is Stronger than a Big One. (arXiv:2207.06543v1 [cs.LG])
    Continual learning requires incremental compatibility with a sequence of tasks. However, the design of model architecture remains an open question: In general, learning all tasks with a shared set of parameters suffers from severe interference between tasks; while learning each task with a dedicated parameter subspace is limited by scalability. In this work, we theoretically analyze the generalization errors for learning plasticity and memory stability in continual learning, which can be uniformly upper-bounded by (1) discrepancy between task distributions, (2) flatness of loss landscape and (3) cover of parameter space. Then, inspired by the robust biological learning system that processes sequential experiences with multiple parallel compartments, we propose Cooperation of Small Continual Learners (CoSCL) as a general strategy for continual learning. Specifically, we present an architecture with a fixed number of narrower sub-networks to learn all incremental tasks in parallel, which can naturally reduce the two errors through improving the three components of the upper bound. To strengthen this advantage, we encourage to cooperate these sub-networks by penalizing the difference of predictions made by their feature representations. With a fixed parameter budget, CoSCL can improve a variety of representative continual learning approaches by a large margin (e.g., up to 10.64% on CIFAR-100-SC, 9.33% on CIFAR-100-RS, 11.45% on CUB-200-2011 and 6.72% on Tiny-ImageNet) and achieve the new state-of-the-art performance.
    Recurrent Memory Transformer. (arXiv:2207.06881v1 [cs.CL])
    Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-attention. In this work, we propose and study a memory-augmented segment-level recurrent Transformer (Recurrent Memory Transformer). Memory allows to store and process local and global information as well as to pass information between segments of the long sequence with the help of recurrence. We implement a memory mechanism with no changes to Transformer model by adding special memory tokens to the input or output sequence. Then Transformer is trained to control both memory operations and sequence representations processing. Results of experiments show that our model performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing. We show that adding memory tokens to Tr-XL is able to improve it performance. This makes Recurrent Memory Transformer a promising architecture for applications that require learning of long-term dependencies and general purpose in memory processing, such as algorithmic tasks and reasoning.
    Near-Optimal Bounds for Testing Histogram Distributions. (arXiv:2207.06596v1 [cs.DS])
    We investigate the problem of testing whether a discrete probability distribution over an ordered domain is a histogram on a specified number of bins. One of the most common tools for the succinct approximation of data, $k$-histograms over $[n]$, are probability distributions that are piecewise constant over a set of $k$ intervals. The histogram testing problem is the following: Given samples from an unknown distribution $\mathbf{p}$ on $[n]$, we want to distinguish between the cases that $\mathbf{p}$ is a $k$-histogram versus $\varepsilon$-far from any $k$-histogram, in total variation distance. Our main result is a sample near-optimal and computationally efficient algorithm for this testing problem, and a nearly-matching (within logarithmic factors) sample complexity lower bound. Specifically, we show that the histogram testing problem has sample complexity $\widetilde \Theta (\sqrt{nk} / \varepsilon + k / \varepsilon^2 + \sqrt{n} / \varepsilon^2)$.
    Analysis of Catastrophic Forgetting for Random Orthogonal Transformation Tasks in the Overparameterized Regime. (arXiv:2207.06475v1 [cs.LG])
    Overparameterization is known to permit strong generalization performance in neural networks. In this work, we provide an initial theoretical analysis of its effect on catastrophic forgetting in a continual learning setup. We show experimentally that in permuted MNIST image classification tasks, the generalization performance of multilayer perceptrons trained by vanilla stochastic gradient descent can be improved by overparameterization, and the extent of the performance increase achieved by overparameterization is comparable to that of state-of-the-art continual learning algorithms. We provide a theoretical explanation of this effect by studying a qualitatively similar two-task linear regression problem, where each task is related by a random orthogonal transformation. We show that when a model is trained on the two tasks in sequence without any additional regularization, the risk gain on the first task is small if the model is sufficiently overparameterized.
    Cross-Modal Transformer GAN: A Brain Structure-Function Deep Fusing Framework for Alzheimer's Disease. (arXiv:2206.13393v2 [eess.IV] UPDATED)
    Cross-modal fusion of different types of neuroimaging data has shown great promise for predicting the progression of Alzheimer's Disease(AD). However, most existing methods applied in neuroimaging can not efficiently fuse the functional and structural information from multi-modal neuroimages. In this work, a novel cross-modal transformer generative adversarial network(CT-GAN) is proposed to fuse functional information contained in resting-state functional magnetic resonance imaging (rs-fMRI) and structural information contained in Diffusion Tensor Imaging (DTI). The developed bi-attention mechanism can match functional information to structural information efficiently and maximize the capability of extracting complementary information from rs-fMRI and DTI. By capturing the deep complementary information between structural features and functional features, the proposed CT-GAN can detect the AD-related brain connectivity, which could be used as a bio-marker of AD. Experimental results show that the proposed model can not only improve classification performance but also detect the AD-related brain connectivity effectively.
    Improved OOD Generalization via Conditional Invariant Regularizer. (arXiv:2207.06687v1 [cs.LG])
    Recently, generalization on out-of-distribution (OOD) data with correlation shift has attracted great attention. The correlation shift is caused by the spurious attributes that correlate to the class label, as the correlation between them may vary in training and test data. For such a problem, we show that given the class label, the conditionally independent models of spurious attributes are OOD generalizable. Based on this, a metric Conditional Spurious Variation (CSV) which controls OOD generalization error, is proposed to measure such conditional independence. To improve the OOD generalization, we regularize the training process with the proposed CSV. Under mild assumptions, our training objective can be formulated as a nonconvex-concave mini-max problem. An algorithm with provable convergence rate is proposed to solve the problem. Extensive empirical results verify our algorithm's efficacy in improving OOD generalization.
    Multi-Level Branched Regularization for Federated Learning. (arXiv:2207.06936v1 [cs.LG])
    A critical challenge of federated learning is data heterogeneity and imbalance across clients, which leads to inconsistency between local networks and unstable convergence of global models. To alleviate the limitations, we propose a novel architectural regularization technique that constructs multiple auxiliary branches in each local model by grafting local and global subnetworks at several different levels and that learns the representations of the main pathway in the local model congruent to the auxiliary hybrid pathways via online knowledge distillation. The proposed technique is effective to robustify the global model even in the non-iid setting and is applicable to various federated learning frameworks conveniently without incurring extra communication costs. We perform comprehensive empirical studies and demonstrate remarkable performance gains in terms of accuracy and efficiency compared to existing methods. The source code is available at our project page.
    MedFuse: Multi-modal fusion with clinical time-series data and chest X-ray images. (arXiv:2207.07027v1 [eess.IV])
    Multi-modal fusion approaches aim to integrate information from different data sources. Unlike natural datasets, such as in audio-visual applications, where samples consist of "paired" modalities, data in healthcare is often collected asynchronously. Hence, requiring the presence of all modalities for a given sample is not realistic for clinical tasks and significantly limits the size of the dataset during training. In this paper, we propose MedFuse, a conceptually simple yet promising LSTM-based fusion module that can accommodate uni-modal as well as multi-modal input. We evaluate the fusion method and introduce new benchmark results for in-hospital mortality prediction and phenotype classification, using clinical time-series data in the MIMIC-IV dataset and corresponding chest X-ray images in MIMIC-CXR. Compared to more complex multi-modal fusion strategies, MedFuse provides a performance improvement by a large margin on the fully paired test set. It also remains robust across the partially paired test set containing samples with missing chest X-ray images. We release our code for reproducibility and to enable the evaluation of competing models in the future.
    A Query-Optimal Algorithm for Finding Counterfactuals. (arXiv:2207.07072v1 [cs.DS])
    We design an algorithm for finding counterfactuals with strong theoretical guarantees on its performance. For any monotone model $f : X^d \to \{0,1\}$ and instance $x^\star$, our algorithm makes \[ {S(f)^{O(\Delta_f(x^\star))}\cdot \log d}\] queries to $f$ and returns {an {\sl optimal}} counterfactual for $x^\star$: a nearest instance $x'$ to $x^\star$ for which $f(x')\ne f(x^\star)$. Here $S(f)$ is the sensitivity of $f$, a discrete analogue of the Lipschitz constant, and $\Delta_f(x^\star)$ is the distance from $x^\star$ to its nearest counterfactuals. The previous best known query complexity was $d^{\,O(\Delta_f(x^\star))}$, achievable by brute-force local search. We further prove a lower bound of $S(f)^{\Omega(\Delta_f(x^\star))} + \Omega(\log d)$ on the query complexity of any algorithm, thereby showing that the guarantees of our algorithm are essentially optimal.
    Equivariant Hypergraph Diffusion Neural Operators. (arXiv:2207.06680v1 [cs.LG])
    Hypergraph neural networks (HNNs) using neural networks to encode hypergraphs provide a promising way to model higher-order relations in data and further solve relevant prediction tasks built upon such higher-order relations. However, higher-order relations in practice contain complex patterns and are often highly irregular. So, it is often challenging to design an HNN that suffices to express those relations while keeping computational efficiency. Inspired by hypergraph diffusion algorithms, this work proposes a new HNN architecture named ED-HNN, which provably represents any continuous equivariant hypergraph diffusion operators that can model a wide range of higher-order relations. ED-HNN can be implemented efficiently by combining star expansions of hypergraphs with standard message passing neural networks. ED-HNN further shows great superiority in processing heterophilic hypergraphs and constructing deep models. We evaluate ED-HNN for node classification on nine real-world hypergraph datasets. ED-HNN uniformly outperforms the best baselines over these nine datasets and achieves more than 2\%$\uparrow$ in prediction accuracy over four datasets therein.
    Closing the Loop: A Framework for Trustworthy Machine Learning in Power Systems. (arXiv:2203.07505v2 [eess.SY] UPDATED)
    Deep decarbonization of the energy sector will require massive penetration of stochastic renewable energy resources and an enormous amount of grid asset coordination; this represents a challenging paradigm for the power system operators who are tasked with maintaining grid stability and security in the face of such changes. With its ability to learn from complex datasets and provide predictive solutions on fast timescales, machine learning (ML) is well-posed to help overcome these challenges as power systems transform in the coming decades. In this work, we outline five key challenges (dataset generation, data pre-processing, model training, model assessment, and model embedding) associated with building trustworthy ML models which learn from physics-based simulation data. We then demonstrate how linking together individual modules, each of which overcomes a respective challenge, at sequential stages in the machine learning pipeline can help enhance the overall performance of the training process. In particular, we implement methods that connect different elements of the learning pipeline through feedback, thus "closing the loop" between model training, performance assessments, and re-training. We demonstrate the effectiveness of this framework, its constituent modules, and its feedback connections by learning the N-1 small-signal stability margin associated with a detailed model of a proposed North Sea Wind Power Hub system.
    Evaluating Multimodal Interactive Agents. (arXiv:2205.13274v2 [cs.LG] UPDATED)
    Creating agents that can interact naturally with humans is a common goal in artificial intelligence (AI) research. However, evaluating these interactions is challenging: collecting online human-agent interactions is slow and expensive, yet faster proxy metrics often do not correlate well with interactive evaluation. In this paper, we assess the merits of these existing evaluation metrics and present a novel approach to evaluation called the Standardised Test Suite (STS). The STS uses behavioural scenarios mined from real human interaction data. Agents see replayed scenario context, receive an instruction, and are then given control to complete the interaction offline. These agent continuations are recorded and sent to human annotators to mark as success or failure, and agents are ranked according to the proportion of continuations in which they succeed. The resulting STS is fast, controlled, interpretable, and representative of naturalistic interactions. Altogether, the STS consolidates much of what is desirable across many of our standard evaluation metrics, allowing us to accelerate research progress towards producing agents that can interact naturally with humans. A video may be found at https://youtu.be/YR1TngGORGQ.
    HyperTransformer: Model Generation for Supervised and Semi-Supervised Few-Shot Learning. (arXiv:2201.04182v3 [cs.LG] UPDATED)
    In this work we propose a HyperTransformer, a Transformer-based model for supervised and semi-supervised few-shot learning that generates weights of a convolutional neural network (CNN) directly from support samples. Since the dependence of a small generated CNN model on a specific task is encoded by a high-capacity Transformer model, we effectively decouple the complexity of the large task space from the complexity of individual tasks. Our method is particularly effective for small target CNN architectures where learning a fixed universal task-independent embedding is not optimal and better performance is attained when the information about the task can modulate all model parameters. For larger models we discover that generating the last layer alone allows us to produce competitive or better results than those obtained with state-of-the-art methods while being end-to-end differentiable.
    Learning Representations for CSI Adaptive Quantization and Feedback. (arXiv:2207.06924v1 [eess.SP])
    In this work, we propose an efficient method for channel state information (CSI) adaptive quantization and feedback in frequency division duplexing (FDD) systems. Existing works mainly focus on the implementation of autoencoder (AE) neural networks (NNs) for CSI compression, and consider straightforward quantization methods, e.g., uniform quantization, which are generally not optimal. With this strategy, it is hard to achieve a low reconstruction error, especially, when the available number of bits reserved for the latent space quantization is small. To address this issue, we recommend two different methods: one based on a post training quantization and the second one in which the codebook is found during the training of the AE. Both strategies achieve better reconstruction accuracy compared to standard quantization techniques.
    Contextual Inverse Optimization: Offline and Online Learning. (arXiv:2106.14015v2 [cs.LG] UPDATED)
    We study the problems of offline and online contextual optimization with feedback information, where instead of observing the loss, we observe, after-the-fact, the optimal action an oracle with full knowledge of the objective function would have taken. We aim to minimize regret, which is defined as the difference between our losses and the ones incurred by an all-knowing oracle. In the offline setting, the decision-maker has information available from past periods and needs to make one decision, while in the online setting, the decision-maker optimizes decisions dynamically over time based a new set of feasible actions and contextual functions in each period. For the offline setting, we characterize the optimal minimax policy, establishing the performance that can be achieved as a function of the underlying geometry of the information induced by the data. In the online setting, we leverage this geometric characterization to optimize the cumulative regret. We develop an algorithm that yields the first regret bound for this problem that is logarithmic in the time horizon.
    Identifying Orientation-specific Lipid-protein Fingerprints using Deep Learning. (arXiv:2207.06630v1 [q-bio.BM])
    Improved understanding of the relation between the behavior of RAS and RAF proteins and the local lipid environment in the cell membrane is critical for getting insights into the mechanisms underlying cancer formation. In this work, we employ deep learning (DL) to learn this relationship by predicting protein orientational states of RAS and RAS-RAF protein complexes with respect to the lipid membrane based on the lipid densities around the protein domains from coarse-grained (CG) molecular dynamics (MD) simulations. Our DL model can predict six protein states with an overall accuracy of over 80%. The findings of this work offer new insights into how the proteins modulate the lipid environment, which in turn may assist designing novel therapies to regulate such interactions in the mechanisms associated with cancer development.
    Regotron: Regularizing the Tacotron2 architecture via monotonic alignment loss. (arXiv:2204.13437v2 [cs.SD] UPDATED)
    Recent deep learning Text-to-Speech (TTS) systems have achieved impressive performance by generating speech close to human parity. However, they suffer from training stability issues as well as incorrect alignment of the intermediate acoustic representation with the input text sequence. In this work, we introduce Regotron, a regularized version of Tacotron2 which aims to alleviate the training issues and at the same time produce monotonic alignments. Our method augments the vanilla Tacotron2 objective function with an additional term, which penalizes non-monotonic alignments in the location-sensitive attention mechanism. By properly adjusting this regularization term we show that the loss curves become smoother, and at the same time Regotron consistently produces monotonic alignments in unseen examples even at an early stage (13\% of the total number of epochs) of its training process, whereas the fully converged Tacotron2 fails to do so. Moreover, our proposed regularization method has no additional computational overhead, while reducing common TTS mistakes and achieving slighlty improved speech naturalness according to subjective mean opinion scores (MOS) collected from 50 evaluators.
    A Data-Efficient Deep Learning Framework for Segmentation and Classification of Histopathology Images. (arXiv:2207.06489v1 [eess.IV])
    The current study of cell architecture of inflammation in histopathology images commonly performed for diagnosis and research purposes excludes a lot of information available on the biopsy slide. In autoimmune diseases, major outstanding research questions remain regarding which cell types participate in inflammation at the tissue level,and how they interact with each other. While these questions can be partially answered using traditional methods, artificial intelligence approaches for segmentation and classification provide a much more efficient method to understand the architecture of inflammation in autoimmune disease, holding a great promise for novel insights. In this paper, we empirically develop deep learning approaches that uses dermatomyositis biopsies of human tissue to detect and identify inflammatory cells. Our approach improves classification performance by 26% and segmentation performance by 5%. We also propose a novel post-processing autoencoder architecture that improves segmentation performance by an additional 3%. We have open-sourced our approach and architecture at https://github.com/pranavsinghps1/DEDL
    FOCUS: Familiar Objects in Common and Uncommon Settings. (arXiv:2110.03804v2 [cs.CV] UPDATED)
    Standard training datasets for deep learning often contain objects in common settings (e.g., "a horse on grass" or "a ship in water") since they are usually collected by randomly scraping the web. Uncommon and rare settings (e.g., "a plane on water", "a car in snowy weather") are thus severely under-represented in the training data. This can lead to an undesirable bias in model predictions towards common settings and create a false sense of accuracy. In this paper, we introduce FOCUS (Familiar Objects in Common and Uncommon Settings), a dataset for stress-testing the generalization power of deep image classifiers. By leveraging the power of modern search engines, we deliberately gather data containing objects in common and uncommon settings in a wide range of locations, weather conditions, and time of day. We present a detailed analysis of the performance of various popular image classifiers on our dataset and demonstrate a clear drop in performance when classifying images in uncommon settings. By analyzing deep features of these models, we show that such errors can be due to the use of spurious features in model predictions. We believe that our dataset will aid researchers in understanding the inability of deep models to generalize well to uncommon settings and drive future work on improving their distributional robustness.
    Have we been Naive to Select Machine Learning Models? Noisy Data are here to Stay!. (arXiv:2207.06651v1 [cs.LG])
    The model selection procedure is usually a single-criterion decision making in which we select the model that maximizes a specific metric in a specific set, such as the Validation set performance. We claim this is very naive and can perform poor selections of over-fitted models due to the over-searching phenomenon, which over-estimates the performance on that specific set. Futhermore, real world data contains noise that should not be ignored by the model selection procedure and must be taken into account when performing model selection. Also, we have defined four theoretical optimality conditions that we can pursue to better select the models and analyze them by using a multi-criteria decision-making algorithm (TOPSIS) that considers proxies to the optimality conditions to select reasonable models.  ( 2 min )
    Open High-Resolution Satellite Imagery: The WorldStrat Dataset -- With Application to Super-Resolution. (arXiv:2207.06418v1 [eess.IV])
    Analyzing the planet at scale with satellite imagery and machine learning is a dream that has been constantly hindered by the cost of difficult-to-access highly-representative high-resolution imagery. To remediate this, we introduce here the WorldStrat dataset. The largest and most varied such publicly available dataset, at Airbus SPOT 6/7 satellites' high resolution of up to 1.5 m/pixel, empowered by European Space Agency's Phi-Lab as part of the ESA-funded QueryPlanet project, we curate nearly 10,000 sqkm of unique locations to ensure stratified representation of all types of land-use across the world: from agriculture to ice caps, from forests to multiple urbanization densities. We also enrich those with locations typically under-represented in ML datasets: sites of humanitarian interest, illegal mining sites, and settlements of persons at risk. We temporally-match each high-resolution image with multiple low-resolution images from the freely accessible lower-resolution Sentinel-2 satellites at 10 m/pixel. We accompany this dataset with an open-source Python package to: rebuild or extend the WorldStrat dataset, train and infer baseline algorithms, and learn with abundant tutorials, all compatible with the popular EO-learn toolbox. We hereby hope to foster broad-spectrum applications of ML to satellite imagery, and possibly develop from free public low-resolution Sentinel2 imagery the same power of analysis allowed by costly private high-resolution imagery. We illustrate this specific point by training and releasing several highly compute-efficient baselines on the task of Multi-Frame Super-Resolution. High-resolution Airbus imagery is CC BY-NC, while the labels and Sentinel2 imagery are CC BY, and the source code and pre-trained models under BSD. The dataset is available at https://zenodo.org/record/6810792 and the software package at https://github.com/worldstrat/worldstrat .
    Low-skilled Occupations Face the Highest Re-skilling Pressure. (arXiv:2101.11505v2 [cs.CY] UPDATED)
    Substantial scholarship has estimated the susceptibility of jobs to automation, but little has examined how job contents evolve in the information age as new technologies substitute for tasks, shifting required skills rather than eliminating entire jobs. Here we explore the patterns and consequences of changes in occupational skill contents and characterize occupations and workers subject to the greatest re-skilling pressure. Recent research suggests that high-skilled STEM and technology-intensive occupations have experienced the highest rates of skill content change. Analyzing 727 occupations across 167 million job posts covering the near-universe of the U.S. online labor market between 2010 and 2018, we find that when skill distance is accounted for, re-skilling pressure is much higher for low-skilled occupations, no matter how ``low-skill'' is defined, either by skill number, pay level, or education degree. We investigate the implications of uneven occupational skill change on workers and find that those from large labor markets and large employers experienced less change, while non-white males in low-skill jobs are the most demographically vulnerable. We conclude by discussing the broad potential of our skill embedding model, which learns skill proximity from skill co-presence across job posts and represents it as distance in the high-dimensional space of complex human capital that corresponds with skilling costs for workers. This model offers a fine-grained measure of the extent to which jobs evolve, and also indicates in what direction job are evolving, as illustrated by the decline in demand for human-interface skills and the rise for those at the machine-interface.
    Learning to Detect Slip with Barometric Tactile Sensors and a Temporal Convolutional Neural Network. (arXiv:2202.09549v2 [cs.RO] UPDATED)
    The ability to perceive object slip via tactile feedback enables humans to accomplish complex manipulation tasks including maintaining a stable grasp. Despite the utility of tactile information for many applications, tactile sensors have yet to be widely deployed in industrial robotics settings; part of the challenge lies in identifying slip and other events from the tactile data stream. In this paper, we present a learning-based method to detect slip using barometric tactile sensors. These sensors have many desirable properties including high durability and reliability, and are built from inexpensive, off-the-shelf components. We train a temporal convolution neural network to detect slip, achieving high detection accuracies while displaying robustness to the speed and direction of the slip motion. Further, we test our detector on two manipulation tasks involving a variety of common objects and demonstrate successful generalization to real-world scenarios not seen during training. We argue that barometric tactile sensing technology, combined with data-driven learning, is suitable for many manipulation tasks such as slip compensation.
    Auto-weighted Robust Federated Learning with Corrupted Data Sources. (arXiv:2101.05880v3 [cs.LG] UPDATED)
    Federated learning provides a communication-efficient and privacy-preserving training process by enabling learning statistical models with massive participants while keeping their data in local clients. However, standard federated learning techniques that naively minimize an average loss function are vulnerable to data corruptions from outliers, systematic mislabeling, or even adversaries. In addition, it is often prohibited for service providers to verify the quality of data samples due to the increasing concern of user data privacy. In this paper, we address this challenge by proposing Auto-weighted Robust Federated Learning (arfl), a novel approach that jointly learns the global model and the weights of local updates to provide robustness against corrupted data sources. We prove a learning bound on the expected risk with respect to the predictor and the weights of clients, which guides the definition of the objective for robust federated learning. The weights are allocated by comparing the empirical loss of a client with the average loss of the best p clients (p-average), thus we can downweight the clients with significantly high losses, thereby lower their contributions to the global model. We show that this approach achieves robustness when the data of corrupted clients is distributed differently from benign ones. To optimize the objective function, we propose a communication-efficient algorithm based on the blockwise minimization paradigm. We conduct experiments on multiple benchmark datasets, including CIFAR-10, FEMNIST and Shakespeare, considering different deep neural network models. The results show that our solution is robust against different scenarios including label shuffling, label flipping and noisy features, and outperforms the state-of-the-art methods in most scenarios.
    Fixing Inventory Inaccuracies At Scale. (arXiv:2006.13126v3 [stat.ML] UPDATED)
    Inaccurate records of inventory occur frequently, and by some measures cost retailers approximately 4% in annual sales. Detecting inventory inaccuracies manually is cost-prohibitive, and existing algorithmic solutions rely almost exclusively on learning from longitudinal data, which is insufficient in the dynamic environment induced by modern retail operations. Instead, we propose a solution based on cross-sectional data over stores and SKUs, observing that detecting inventory inaccuracies can be viewed as a problem of identifying anomalies in a (low-rank) Poisson matrix. State-of-the-art approaches to anomaly detection in low-rank matrices apparently fall short. Specifically, from a theoretical perspective, recovery guarantees for these approaches require that non-anomalous entries be observed with vanishingly small noise (which is not the case in our problem, and indeed in many applications). So motivated, we propose a conceptually simple entry-wise approach to anomaly detection in low-rank Poisson matrices. Our approach accommodates a general class of probabilistic anomaly models. We show that the cost incurred by our algorithm approaches that of an optimal algorithm at a min-max optimal rate. Using synthetic data and real data from a consumer goods retailer, we show that our approach provides up to a 10x cost reduction over incumbent approaches to anomaly detection. Along the way, we build on recent work that seeks entry-wise error guarantees for matrix completion, establishing such guarantees for sub-exponential matrices, a result of independent interest.
    Low-Precision Arithmetic for Fast Gaussian Processes. (arXiv:2207.06856v1 [cs.LG])
    Low-precision arithmetic has had a transformative effect on the training of neural networks, reducing computation, memory and energy requirements. However, despite its promise, low-precision arithmetic has received little attention for Gaussian processes (GPs), largely because GPs require sophisticated linear algebra routines that are unstable in low-precision. We study the different failure modes that can occur when training GPs in half precision. To circumvent these failure modes, we propose a multi-faceted approach involving conjugate gradients with re-orthogonalization, mixed precision, and preconditioning. Our approach significantly improves the numerical stability and practical performance of conjugate gradients in low-precision over a wide range of settings, enabling GPs to train on $1.8$ million data points in $10$ hours on a single GPU, without any sparse approximations.
    A Meta-learning Formulation of the Autoencoder Problem. (arXiv:2207.06676v1 [cs.LG])
    A rapidly growing area of research is the use of machine learning approaches such as autoencoders for dimensionality reduction of data and models in scientific applications. We show that the canonical formulation of autoencoders suffers from several deficiencies that can hinder their performance. Using a meta-learning approach, we reformulate the autoencoder problem as a bi-level optimization procedure that explicitly solves the dimensionality reduction task. We prove that the new formulation corrects the identified deficiencies with canonical autoencoders, provide a practical way to solve it, and showcase the strength of this formulation with a simple numerical illustration.
    DropNet: Reducing Neural Network Complexity via Iterative Pruning. (arXiv:2207.06646v1 [cs.LG])
    Modern deep neural networks require a significant amount of computing time and power to train and deploy, which limits their usage on edge devices. Inspired by the iterative weight pruning in the Lottery Ticket Hypothesis, we propose DropNet, an iterative pruning method which prunes nodes/filters to reduce network complexity. DropNet iteratively removes nodes/filters with the lowest average post-activation value across all training samples. Empirically, we show that DropNet is robust across diverse scenarios, including MLPs and CNNs using the MNIST, CIFAR-10 and Tiny ImageNet datasets. We show that up to 90% of the nodes/filters can be removed without any significant loss of accuracy. The final pruned network performs well even with reinitialization of the weights and biases. DropNet also has similar accuracy to an oracle which greedily removes nodes/filters one at a time to minimise training loss, highlighting its effectiveness.
    Anomal-E: A Self-Supervised Network Intrusion Detection System based on Graph Neural Networks. (arXiv:2207.06819v1 [cs.LG])
    This paper investigates Graph Neural Networks (GNNs) application for self-supervised network intrusion and anomaly detection. GNNs are a deep learning approach for graph-based data that incorporate graph structures into learning to generalise graph representations and output embeddings. As network flows are naturally graph-based, GNNs are a suitable fit for analysing and learning network behaviour. The majority of current implementations of GNN-based Network Intrusion Detection Systems (NIDSs) rely heavily on labelled network traffic which can not only restrict the amount and structure of input traffic, but also the NIDSs potential to adapt to unseen attacks. To overcome these restrictions, we present Anomal-E, a GNN approach to intrusion and anomaly detection that leverages edge features and graph topological structure in a self-supervised process. This approach is, to the best our knowledge, the first successful and practical approach to network intrusion detection that utilises network flows in a self-supervised, edge leveraging GNN. Experimental results on two modern benchmark NIDS datasets not only clearly display the improvement of using Anomal-E embeddings rather than raw features, but also the potential Anomal-E has for detection on wild network traffic.
    Towards Adaptive Unknown Authentication for Universal Domain Adaptation by Classifier Paradox. (arXiv:2207.04494v1 [cs.CV] CROSS LISTED)
    Universal domain adaptation (UniDA) is a general unsupervised domain adaptation setting, which addresses both domain and label shifts in adaptation. Its main challenge lies in how to identify target samples in unshared or unknown classes. Previous methods commonly strive to depict sample "confidence" along with a threshold for rejecting unknowns, and align feature distributions of shared classes across domains. However, it is still hard to pre-specify a "confidence" criterion and threshold which are adaptive to various real tasks, and a mis-prediction of unknowns further incurs misalignment of features in shared classes. In this paper, we propose a new UniDA method with adaptive Unknown Authentication by Classifier Paradox (UACP), considering that samples with paradoxical predictions are probably unknowns belonging to none of the source classes. In UACP, a composite classifier is jointly designed with two types of predictors. That is, a multi-class (MC) predictor classifies samples to one of the multiple source classes, while a binary one-vs-all (OVA) predictor further verifies the prediction by MC predictor. Samples with verification failure or paradox are identified as unknowns. Further, instead of feature alignment for shared classes, implicit domain alignment is conducted in output space such that samples across domains share the same decision boundary, though with feature discrepancy. Empirical results validate UACP under both open-set and universal UDA settings.
    Deep Dictionary Learning with An Intra-class Constraint. (arXiv:2207.06841v1 [cs.LG])
    In recent years, deep dictionary learning (DDL)has attracted a great amount of attention due to its effectiveness for representation learning and visual recognition.~However, most existing methods focus on unsupervised deep dictionary learning, failing to further explore the category information.~To make full use of the category information of different samples, we propose a novel deep dictionary learning model with an intra-class constraint (DDLIC) for visual classification. Specifically, we design the intra-class compactness constraint on the intermediate representation at different levels to encourage the intra-class representations to be closer to each other, and eventually the learned representation becomes more discriminative.~Unlike the traditional DDL methods, during the classification stage, our DDLIC performs a layer-wise greedy optimization in a similar way to the training stage. Experimental results on four image datasets show that our method is superior to the state-of-the-art methods.
    Collaborative Machine Learning-Driven Internet of Medical Things -- A Systematic Literature Review. (arXiv:2207.06416v1 [cs.LG])
    The growing adoption of IoT devices for healthcare has enabled researchers to build intelligence using all the data produced by these devices. Monitoring and diagnosing health have been the two most common scenarios where such devices have proven beneficial. Achieving high prediction accuracy was a top priority initially, but the focus has slowly shifted to efficiency and higher throughput, and processing the data from these devices in a distributed manner has proven to help achieve both. Since the field of machine learning is vast with numerous state-of-the-art algorithms in play, it has been a challenge to identify the algorithms that perform best in different scenarios. In this literature review, we explored the distributed machine learning algorithms tested by the authors of the selected studies and identified the ones that achieved the best prediction accuracy in each healthcare scenario. While no algorithm performed consistently, Random Forest performed the best in a few studies. This could serve as a good starting point for future studies on collaborative machine learning on IoMT data.
    Differentiable Logics for Neural Network Training and Verification. (arXiv:2207.06741v1 [cs.AI])
    The rising popularity of neural networks (NNs) in recent years and their increasing prevalence in real-world applications have drawn attention to the importance of their verification. While verification is known to be computationally difficult theoretically, many techniques have been proposed for solving it in practice. It has been observed in the literature that by default neural networks rarely satisfy logical constraints that we want to verify. A good course of action is to train the given NN to satisfy said constraint prior to verifying them. This idea is sometimes referred to as continuous verification, referring to the loop between training and verification. Usually training with constraints is implemented by specifying a translation for a given formal logic language into loss functions. These loss functions are then used to train neural networks. Because for training purposes these functions need to be differentiable, these translations are called differentiable logics (DL). This raises several research questions. What kind of differentiable logics are possible? What difference does a specific choice of DL make in the context of continuous verification? What are the desirable criteria for a DL viewed from the point of view of the resulting loss function? In this extended abstract we will discuss and answer these questions.
    How do tuna schools associate to dFADs? A study using echo-sounder buoys to identify global patterns. (arXiv:2207.07049v1 [stat.ML])
    Based on the data gathered by echo-sounder buoys attached to drifting Fish Aggregating Devices (dFADs) across tropical oceans, the current study applies a Machine Learning protocol to examine the temporal trends of tuna schools' association to drifting objects. Using a binary output, metrics typically used in the literature were adapted to account for the fact that the entire tuna aggregation under the dFAD was considered. The median time it took tuna to colonize the dFADs for the first time varied between 25 and 43 days, depending on the ocean, and the longest soak and colonization times were registered in the Pacific Ocean. The tuna schools' Continuous Residence Times were generally shorter than Continuous Absence Times (median values between 5 and 7 days, and 9 and 11 days, respectively), in line with the results found by previous studies. Using a regression output, two novel metrics, namely aggregation time and disaggregation time, were estimated to obtain further insight into the symmetry of the aggregation process. Across all oceans, the time it took for the tuna aggregation to depart from the dFADs was not significantly longer than the time it took for the aggregation to form. The value of these results in the context of the "ecological trap" hypothesis is discussed, and further analyses to enrich and make use of this data source are proposed.
    RSD-GAN: Regularized Sobolev Defense GAN Against Speech-to-Text Adversarial Attacks. (arXiv:2207.06858v1 [cs.SD])
    This paper introduces a new synthesis-based defense algorithm for counteracting with a varieties of adversarial attacks developed for challenging the performance of the cutting-edge speech-to-text transcription systems. Our algorithm implements a Sobolev-based GAN and proposes a novel regularizer for effectively controlling over the functionality of the entire generative model, particularly the discriminator network during training. Our achieved results upon carrying out numerous experiments on the victim DeepSpeech, Kaldi, and Lingvo speech transcription systems corroborate the remarkable performance of our defense approach against a comprehensive range of targeted and non-targeted adversarial attacks.
    Rethinking Multidimensional Discriminator Output for Generative Adversarial Networks. (arXiv:2109.03378v3 [stat.ML] UPDATED)
    The study of multidimensional discriminator (critic) output for Generative Adversarial Networks has been underexplored in the literature. In this paper, we generalize the Wasserstein GAN framework to take advantage of multidimensional critic output and explore its properties. We also introduce a square-root velocity transformation (SRVT) block which favors training in the multidimensional setting. Proofs of properties are based on our proposed maximal p-centrality discrepancy, which is bounded above by p-Wasserstein distance and fits the Wasserstein GAN framework with multidimensional critic output n. Especially when n = 1 and p = 1, the proposed discrepancy equals 1-Wasserstein distance. Theoretical analysis and empirical evidence show that high-dimensional critic output has its advantage on distinguishing real and fake distributions, and benefits faster convergence and diversity of results.
    Virtual stain transfer in histology via cascaded deep neural networks. (arXiv:2207.06578v1 [physics.med-ph])
    Pathological diagnosis relies on the visual inspection of histologically stained thin tissue specimens, where different types of stains are applied to bring contrast to and highlight various desired histological features. However, the destructive histochemical staining procedures are usually irreversible, making it very difficult to obtain multiple stains on the same tissue section. Here, we demonstrate a virtual stain transfer framework via a cascaded deep neural network (C-DNN) to digitally transform hematoxylin and eosin (H&E) stained tissue images into other types of histological stains. Unlike a single neural network structure which only takes one stain type as input to digitally output images of another stain type, C-DNN first uses virtual staining to transform autofluorescence microscopy images into H&E and then performs stain transfer from H&E to the domain of the other stain in a cascaded manner. This cascaded structure in the training phase allows the model to directly exploit histochemically stained image data on both H&E and the target special stain of interest. This advantage alleviates the challenge of paired data acquisition and improves the image quality and color accuracy of the virtual stain transfer from H&E to another stain. We validated the superior performance of this C-DNN approach using kidney needle core biopsy tissue sections and successfully transferred the H&E-stained tissue images into virtual PAS (periodic acid-Schiff) stain. This method provides high-quality virtual images of special stains using existing, histochemically stained slides and creates new opportunities in digital pathology by performing highly accurate stain-to-stain transformations.
    Graph Neural Network Bandits. (arXiv:2207.06456v1 [cs.LG])
    We consider the bandit optimization problem with the reward function defined over graph-structured data. This problem has important applications in molecule design and drug discovery, where the reward is naturally invariant to graph permutations. The key challenges in this setting are scaling to large domains, and to graphs with many nodes. We resolve these challenges by embedding the permutation invariance into our model. In particular, we show that graph neural networks (GNNs) can be used to estimate the reward function, assuming it resides in the Reproducing Kernel Hilbert Space of a permutation-invariant additive kernel. By establishing a novel connection between such kernels and the graph neural tangent kernel (GNTK), we introduce the first GNN confidence bound and use it to design a phased-elimination algorithm with sublinear regret. Our regret bound depends on the GNTK's maximum information gain, which we also provide a bound for. While the reward function depends on all $N$ node features, our guarantees are independent of the number of graph nodes $N$. Empirically, our approach exhibits competitive performance and scales well on graph-structured domains.
    Leakage and the Reproducibility Crisis in ML-based Science. (arXiv:2207.07048v1 [cs.LG])
    The use of machine learning (ML) methods for prediction and forecasting has become widespread across the quantitative sciences. However, there are many known methodological pitfalls, including data leakage, in ML-based science. In this paper, we systematically investigate reproducibility issues in ML-based science. We show that data leakage is indeed a widespread problem and has led to severe reproducibility failures. Specifically, through a survey of literature in research communities that adopted ML methods, we find 17 fields where errors have been found, collectively affecting 329 papers and in some cases leading to wildly overoptimistic conclusions. Based on our survey, we present a fine-grained taxonomy of 8 types of leakage that range from textbook errors to open research problems. We argue for fundamental methodological changes to ML-based science so that cases of leakage can be caught before publication. To that end, we propose model info sheets for reporting scientific claims based on ML models that would address all types of leakage identified in our survey. To investigate the impact of reproducibility errors and the efficacy of model info sheets, we undertake a reproducibility study in a field where complex ML models are believed to vastly outperform older statistical models such as Logistic Regression (LR): civil war prediction. We find that all papers claiming the superior performance of complex ML models compared to LR models fail to reproduce due to data leakage, and complex ML models don't perform substantively better than decades-old LR models. While none of these errors could have been caught by reading the papers, model info sheets would enable the detection of leakage in each case.
    Temporal Action Detection with Global Segmentation Mask Learning. (arXiv:2207.06580v1 [cs.CV])
    Existing temporal action detection (TAD) methods rely on generating an overwhelmingly large number of proposals per video. This leads to complex model designs due to proposal generation and/or per-proposal action instance evaluation and the resultant high computational cost. In this work, for the first time, we propose a proposal-free Temporal Action detection model with Global Segmentation mask (TAGS). Our core idea is to learn a global segmentation mask of each action instance jointly at the full video length. The TAGS model differs significantly from the conventional proposal-based methods by focusing on global temporal representation learning to directly detect local start and end points of action instances without proposals. Further, by modeling TAD holistically rather than locally at the individual proposal level, TAGS needs a much simpler model architecture with lower computational cost. Extensive experiments show that despite its simpler design, TAGS outperforms existing TAD methods, achieving new state-of-the-art performance on two benchmarks. Importantly, it is ~ 20x faster to train and ~1.6x more efficient for inference. Our PyTorch implementation of TAGS is available at https://github.com/sauradip/TAGS .
    Antibody-Antigen Docking and Design via Hierarchical Equivariant Refinement. (arXiv:2207.06616v1 [q-bio.BM])
    Computational antibody design seeks to automatically create an antibody that binds to an antigen. The binding affinity is governed by the 3D binding interface where antibody residues (paratope) closely interact with antigen residues (epitope). Thus, predicting 3D paratope-epitope complex (docking) is the key to finding the best paratope. In this paper, we propose a new model called Hierarchical Equivariant Refinement Network (HERN) for paratope docking and design. During docking, HERN employs a hierarchical message passing network to predict atomic forces and use them to refine a binding complex in an iterative, equivariant manner. During generation, its autoregressive decoder progressively docks generated paratopes and builds a geometric representation of the binding interface to guide the next residue choice. Our results show that HERN significantly outperforms prior state-of-the-art on paratope docking and design benchmarks.
    Subgraph Frequency Distribution Estimation using Graph Neural Networks. (arXiv:2207.06684v1 [cs.LG])
    Small subgraphs (graphlets) are important features to describe fundamental units of a large network. The calculation of the subgraph frequency distributions has a wide application in multiple domains including biology and engineering. Unfortunately due to the inherent complexity of this task, most of the existing methods are computationally intensive and inefficient. In this work, we propose GNNS, a novel representational learning framework that utilizes graph neural networks to sample subgraphs efficiently for estimating their frequency distribution. Our framework includes an inference model and a generative model that learns hierarchical embeddings of nodes, subgraphs, and graph types. With the learned model and embeddings, subgraphs are sampled in a highly scalable and parallel way and the frequency distribution estimation is then performed based on these sampled subgraphs. Eventually, our methods achieve comparable accuracy and a significant speedup by three orders of magnitude compared to existing methods.
    In-memory Realization of In-situ Few-shot Continual Learning with a Dynamically Evolving Explicit Memory. (arXiv:2207.06810v1 [cs.LG])
    Continually learning new classes from a few training examples without forgetting previous old classes demands a flexible architecture with an inevitably growing portion of storage, in which new examples and classes can be incrementally stored and efficiently retrieved. One viable architectural solution is to tightly couple a stationary deep neural network to a dynamically evolving explicit memory (EM). As the centerpiece of this architecture, we propose an EM unit that leverages energy-efficient in-memory compute (IMC) cores during the course of continual learning operations. We demonstrate for the first time how the EM unit can physically superpose multiple training examples, expand to accommodate unseen classes, and perform similarity search during inference, using operations on an IMC core based on phase-change memory (PCM). Specifically, the physical superposition of a few encoded training examples is realized via in-situ progressive crystallization of PCM devices. The classification accuracy achieved on the IMC core remains within a range of 1.28%--2.5% compared to that of the state-of-the-art full-precision baseline software model on both the CIFAR-100 and miniImageNet datasets when continually learning 40 novel classes (from only five examples per class) on top of 60 old classes.
    Seeking the Truth Beyond the Data. An Unsupervised Machine Learning Approach. (arXiv:2207.06949v1 [stat.ML])
    Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together aiming to the construction of well-established clusters that their elements are classified according to their similarity. The goal of this process is to provide a useful aid to the researcher that will help her/him to identify patterns among the data. Dealing with large databases, such patterns may not be easily detectable without the contribution of a clustering algorithm. This article provides a deep description of the most widely used clustering methodologies accompanied by useful presentations concerning suitable parameter selection and initializations. Simultaneously, this article not only represents a review highlighting the major elements of examined clustering techniques but emphasizes the comparison of these algorithms' clustering efficiency based on 3 datasets, revealing their existing weaknesses and capabilities through accuracy and complexity, during the confrontation of discrete and continuous observations. The produced results help us extract valuable conclusions about the appropriateness of the examined clustering techniques in accordance with the dataset's size.
    Proceedings of the ICML 2022 Expressive Vocalizations Workshop and Competition: Recognizing, Generating, and Personalizing Vocal Bursts. (arXiv:2207.06958v1 [cs.SD])
    This is the Proceedings of the ICML Expressive Vocalization (ExVo) Competition. The ExVo competition focuses on understanding and generating vocal bursts: laughs, gasps, cries, and other non-verbal vocalizations that are central to emotional expression and communication. ExVo 2022, included three competition tracks using a large-scale dataset of 59,201 vocalizations from 1,702 speakers. The first, ExVo-MultiTask, requires participants to train a multi-task model to recognize expressed emotions and demographic traits from vocal bursts. The second, ExVo-Generate, requires participants to train a generative model that produces vocal bursts conveying ten different emotions. The third, ExVo-FewShot, requires participants to leverage few-shot learning incorporating speaker identity to train a model for the recognition of 10 emotions conveyed by vocal bursts.
    A Novel Implementation of Machine Learning for the Efficient, Explainable Diagnosis of COVID-19 from Chest CT. (arXiv:2207.07117v1 [eess.IV])
    In a worldwide health crisis as exigent as COVID-19, there has become a pressing need for rapid, reliable diagnostics. Currently, popular testing methods such as reverse transcription polymerase chain reaction (RT-PCR) can have high false negative rates. Consequently, COVID-19 patients are not accurately identified nor treated quickly enough to prevent transmission of the virus. However, the recent rise of medical CT data has presented promising avenues, since CT manifestations contain key characteristics indicative of COVID-19. This study aimed to take a novel approach in the machine learning-based detection of COVID-19 from chest CT scans. First, the dataset utilized in this study was derived from three major sources, comprising a total of 17,698 chest CT slices across 923 patient cases. Image preprocessing algorithms were then developed to reduce noise by excluding irrelevant features. Transfer learning was also implemented with the EfficientNetB7 pre-trained model to provide a backbone architecture and save computational resources. Lastly, several explainability techniques were leveraged to qualitatively validate model performance by localizing infected regions and highlighting fine-grained pixel details. The proposed model attained an overall accuracy of 0.927 and a sensitivity of 0.958. Explainability measures showed that the model correctly distinguished between relevant, critical features pertaining to COVID-19 chest CT images and normal controls. Deep learning frameworks provide efficient, human-interpretable COVID-19 diagnostics that could complement radiologist decisions or serve as an alternative screening tool. Future endeavors may provide insight into infection severity, patient risk stratification, and prognosis.
    Robot Program Parameter Inference via Differentiable Shadow Program Inversion. (arXiv:2103.14452v2 [cs.RO] UPDATED)
    Challenging manipulation tasks can be solved effectively by combining individual robot skills, which must be parameterized for the concrete physical environment and task at hand. This is time-consuming and difficult for human programmers, particularly for force-controlled skills. To this end, we present Shadow Program Inversion (SPI), a novel approach to infer optimal skill parameters directly from data. SPI leverages unsupervised learning to train an auxiliary differentiable program representation ("shadow program") and realizes parameter inference via gradient-based model inversion. Our method enables the use of efficient first-order optimizers to infer optimal parameters for originally non-differentiable skills, including many skill variants currently used in production. SPI zero-shot generalizes across task objectives, meaning that shadow programs do not need to be retrained to infer parameters for different task variants. We evaluate our methods on three different robots and skill frameworks in industrial and household scenarios. Code and examples are available at https://innolab.artiminds.com/icra2021.
    AutoML-Based Drought Forecast with Meteorological Variables. (arXiv:2207.07012v1 [cs.LG])
    A precise forecast for droughts is of considerable value to scientific research, agriculture, and water resource management. With emerging developments of data-driven approaches for hydro-climate modeling, this paper investigates an AutoML-based framework to forecast droughts in the U.S. Compared with commonly-used temporal deep learning models, the AutoML model can achieve comparable performance with less training data and time. As deep learning models are becoming popular for Earth system modeling, this paper aims to bring more efforts to AutoML-based methods, and the use of them as benchmark baselines for more complex deep learning models.
    The Free Energy Principle for Perception and Action: A Deep Learning Perspective. (arXiv:2207.06415v1 [cs.LG])
    The free energy principle, and its corollary active inference, constitute a bio-inspired theory that assumes biological agents act to remain in a restricted set of preferred states of the world, i.e., they minimize their free energy. Under this principle, biological agents learn a generative model of the world and plan actions in the future that will maintain the agent in an homeostatic state that satisfies its preferences. This framework lends itself to being realized in silico, as it comprehends important aspects that make it computationally affordable, such as variational inference and amortized planning. In this work, we investigate the tool of deep learning to design and realize artificial agents based on active inference, presenting a deep-learning oriented presentation of the free energy principle, surveying works that are relevant in both machine learning and active inference areas, and discussing the design choices that are involved in the implementation process. This manuscript probes newer perspectives for the active inference framework, grounding its theoretical aspects into more pragmatic affairs, offering a practical guide to active inference newcomers and a starting point for deep learning practitioners that would like to investigate implementations of the free energy principle.
    Pose-based Tremor Classification for Parkinson's Disease Diagnosis from Video. (arXiv:2207.06828v1 [cs.CV])
    Parkinson's disease (PD) is a progressive neurodegenerative disorder that results in a variety of motor dysfunction symptoms, including tremors, bradykinesia, rigidity and postural instability. The diagnosis of PD mainly relies on clinical experience rather than a definite medical test, and the diagnostic accuracy is only about 73-84% since it is challenged by the subjective opinions or experiences of different medical experts. Therefore, an efficient and interpretable automatic PD diagnosis system is valuable for supporting clinicians with more robust diagnostic decision-making. To this end, we propose to classify Parkinson's tremor since it is one of the most predominant symptoms of PD with strong generalizability. Different from other computer-aided time and resource-consuming Parkinson's Tremor (PT) classification systems that rely on wearable sensors, we propose SPAPNet, which only requires consumer-grade non-intrusive video recording of camera-facing human movements as input to provide undiagnosed patients with low-cost PT classification results as a PD warning sign. For the first time, we propose to use a novel attention module with a lightweight pyramidal channel-squeezing-fusion architecture to extract relevant PT information and filter the noise efficiently. This design aids in improving both classification performance and system interpretability. Experimental results show that our system outperforms state-of-the-arts by achieving a balanced accuracy of 90.9% and an F1-score of 90.6% in classifying PT with the non-PT class.
    Spatiotemporal Propagation Learning for Network-Wide Flight Delay Prediction. (arXiv:2207.06959v1 [cs.LG])
    Demystifying the delay propagation mechanisms among multiple airports is fundamental to precise and interpretable delay prediction, which is crucial during decision-making for all aviation industry stakeholders. The principal challenge lies in effectively leveraging the spatiotemporal dependencies and exogenous factors related to the delay propagation. However, previous works only consider limited spatiotemporal patterns with few factors. To promote more comprehensive propagation modeling for delay prediction, we propose SpatioTemporal Propagation Network (STPN), a space-time separable graph convolutional network, which is novel in spatiotemporal dependency capturing. From the aspect of spatial relation modeling, we propose a multi-graph convolution model considering both geographic proximity and airline schedule. From the aspect of temporal dependency capturing, we propose a multi-head self-attentional mechanism that can be learned end-to-end and explicitly reason multiple kinds of temporal dependency of delay time series. We show that the joint spatial and temporal learning models yield a sum of the Kronecker product, which factors the spatiotemporal dependence into the sum of several spatial and temporal adjacency matrices. By this means, STPN allows cross-talk of spatial and temporal factors for modeling delay propagation. Furthermore, a squeeze and excitation module is added to each layer of STPN to boost meaningful spatiotemporal features. To this end, we apply STPN to multi-step ahead arrival and departure delay prediction in large-scale airport networks. To validate the effectiveness of our model, we experiment with two real-world delay datasets, including U.S and China flight delays; and we show that STPN outperforms state-of-the-art methods. In addition, counterfactuals produced by STPN show that it learns explainable delay propagation patterns.
    Language Modelling with Pixels. (arXiv:2207.06991v1 [cs.CL])
    Language models are defined over a finite set of inputs, which creates a vocabulary bottleneck when we attempt to scale the number of supported languages. Tackling this bottleneck results in a trade-off between what can be represented in the embedding matrix and computational issues in the output layer. This paper introduces PIXEL, the Pixel-based Encoder of Language, which suffers from neither of these issues. PIXEL is a pretrained language model that renders text as images, making it possible to transfer representations across languages based on orthographic similarity or the co-activation of pixels. PIXEL is trained to reconstruct the pixels of masked patches, instead of predicting a distribution over tokens. We pretrain the 86M parameter PIXEL model on the same English data as BERT and evaluate on syntactic and semantic tasks in typologically diverse languages, including various non-Latin scripts. We find that PIXEL substantially outperforms BERT on syntactic and semantic processing tasks on scripts that are not found in the pretraining data, but PIXEL is slightly weaker than BERT when working with Latin scripts. Furthermore, we find that PIXEL is more robust to noisy text inputs than BERT, further confirming the benefits of modelling language with pixels.
    DRIBO: Robust Deep Reinforcement Learning via Multi-View Information Bottleneck. (arXiv:2102.13268v4 [cs.AI] UPDATED)
    Deep reinforcement learning (DRL) agents are often sensitive to visual changes that were unseen in their training environments. To address this problem, we leverage the sequential nature of RL to learn robust representations that encode only task-relevant information from observations based on the unsupervised multi-view setting. Specifically, we introduce a novel contrastive version of the Multi-View Information Bottleneck (MIB) objective for temporal data. We train RL agents from pixels with this auxiliary objective to learn robust representations that can compress away task-irrelevant information and are predictive of task-relevant dynamics. This approach enables us to train high-performance policies that are robust to visual distractions and can generalize well to unseen environments. We demonstrate that our approach can achieve SOTA performance on a diverse set of visual control tasks in the DeepMind Control Suite when the background is replaced with natural videos. In addition, we show that our approach outperforms well-established baselines for generalization to unseen environments on the Procgen benchmark. Our code is open-sourced and available at https://github. com/BU-DEPEND-Lab/DRIBO.
    RobustAnalog: Fast Variation-Aware Analog Circuit Design Via Multi-task RL. (arXiv:2207.06412v1 [cs.ET])
    Analog/mixed-signal circuit design is one of the most complex and time-consuming stages in the whole chip design process. Due to various process, voltage, and temperature (PVT) variations from chip manufacturing, analog circuits inevitably suffer from performance degradation. Although there has been plenty of work on automating analog circuit design under the typical condition, limited research has been done on exploring robust designs under real and unpredictable silicon variations. Automatic analog design against variations requires prohibitive computation and time costs. To address the challenge, we present RobustAnalog, a robust circuit design framework that involves the variation information in the optimization process. Specifically, circuit optimizations under different variations are considered as a set of tasks. Similarities among tasks are leveraged and competitions are alleviated to realize a sample-efficient multi-task training. Moreover, RobustAnalog prunes the task space according to the current performance in each iteration, leading to a further simulation cost reduction. In this way, RobustAnalog can rapidly produce a set of circuit parameters that satisfies diverse constraints (e.g. gain, bandwidth, noise...) across variations. We compare RobustAnalog with Bayesian optimization, Evolutionary algorithm, and Deep Deterministic Policy Gradient (DDPG) and demonstrate that RobustAnalog can significantly reduce required optimization time by 14-30 times. Therefore, our study provides a feasible method to handle various real silicon conditions.
    A Robustly Optimized Long Text to Math Models for Numerical Reasoning On FinQA. (arXiv:2207.06490v1 [cs.CL])
    Numerical reasoning is required when solving most problems in our life, but it has been neglected in previous artificial intelligence researches. FinQA challenge has been organized to strengthen the study on numerical reasoning where the participants are asked to predict the numerical reasoning program to solve financial question. The result of FinQA will be evaluated by both execution accuracy and program accuracy. In this paper, we present our approach to tackle the task objective by developing models with different specialized capabilities and fusing their strength. Overall, our approach achieves the 1st place in FinQA challenge, with 71.93% execution accuracy and 67.03% program accuracy.
    Deep Learning Discovery of Demographic Biomarkers in Echocardiography. (arXiv:2207.06421v1 [cs.LG])
    Deep learning has been shown to accurately assess 'hidden' phenotypes and predict biomarkers from medical imaging beyond traditional clinician interpretation of medical imaging. Given the black box nature of artificial intelligence (AI) models, caution should be exercised in applying models to healthcare as prediction tasks might be short-cut by differences in demographics across disease and patient populations. Using large echocardiography datasets from two healthcare systems, we test whether it is possible to predict age, race, and sex from cardiac ultrasound images using deep learning algorithms and assess the impact of varying confounding variables. We trained video-based convolutional neural networks to predict age, sex, and race. We found that deep learning models were able to identify age and sex, while unable to reliably predict race. Without considering confounding differences between categories, the AI model predicted sex with an AUC of 0.85 (95% CI 0.84 - 0.86), age with a mean absolute error of 9.12 years (95% CI 9.00 - 9.25), and race with AUCs ranging from 0.63 - 0.71. When predicting race, we show that tuning the proportion of a confounding variable (sex) in the training data significantly impacts model AUC (ranging from 0.57 to 0.84), while in training a sex prediction model, tuning a confounder (race) did not substantially change AUC (0.81 - 0.83). This suggests a significant proportion of the model's performance on predicting race could come from confounding features being detected by AI. Further work remains to identify the particular imaging features that associate with demographic information and to better understand the risks of demographic identification in medical AI as it pertains to potentially perpetuating bias and disparities.
    In Defense of Core-set: A Density-aware Core-set Selection for Active Learning. (arXiv:2206.04838v3 [cs.LG] UPDATED)
    Active learning enables the efficient construction of a labeled dataset by labeling informative samples from an unlabeled dataset. In a real-world active learning scenario, considering the diversity of the selected samples is crucial because many redundant or highly similar samples exist. Core-set approach is the promising diversity-based method selecting diverse samples based on the distance between samples. However, the approach poorly performs compared to the uncertainty-based approaches that select the most difficult samples where neural models reveal low confidence. In this work, we analyze the feature space through the lens of the density and, interestingly, observe that locally sparse regions tend to have more informative samples than dense regions. Motivated by our analysis, we empower the core-set approach with the density-awareness and propose a density-aware core-set (DACS). The strategy is to estimate the density of the unlabeled samples and select diverse samples mainly from sparse regions. To reduce the computational bottlenecks in estimating the density, we also introduce a new density approximation based on locality-sensitive hashing. Experimental results clearly demonstrate the efficacy of DACS in both classification and regression tasks and specifically show that DACS can produce state-of-the-art performance in a practical scenario. Since DACS is weakly dependent on neural architectures, we present a simple yet effective combination method to show that the existing methods can be beneficially combined with DACS.
    Volatility Based Kernels and Moving Average Means for Accurate Forecasting with Gaussian Processes. (arXiv:2207.06544v1 [cs.LG])
    A broad class of stochastic volatility models are defined by systems of stochastic differential equations. While these models have seen widespread success in domains such as finance and statistical climatology, they typically lack an ability to condition on historical data to produce a true posterior distribution. To address this fundamental limitation, we show how to re-cast a class of stochastic volatility models as a hierarchical Gaussian process (GP) model with specialized covariance functions. This GP model retains the inductive biases of the stochastic volatility model while providing the posterior predictive distribution given by GP inference. Within this framework, we take inspiration from well studied domains to introduce a new class of models, Volt and Magpie, that significantly outperform baselines in stock and wind speed forecasting, and naturally extend to the multitask setting.
    Online Bayesian Meta-Learning for Cognitive Tracking Radar. (arXiv:2207.06917v1 [cs.IT])
    A key component of cognitive radar is the ability to generalize, or achieve consistent performance across a broad class of sensing environments, since aspects of the physical scene may vary over time. This presents a challenge for learning-based waveform selection approaches, since transmission policies which are effective in one scene may be highly suboptimal in another. One way to address this problem is to bias a learning algorithm strategically by exploiting high-level structure across tracking instances, referred to as meta-learning. In this work, we develop an online meta-learning approach for waveform-agile tracking. This approach uses information gained from previous target tracks to speed up and enhance learning in new tracking instances. This results in sample-efficient learning across a class of finite state target channels by exploiting inherent similarity across tracking scenes, attributed to common physical elements such as target type or clutter. We formulate the online waveform selection problem in the framework of Bayesian learning, and provide prior-dependent performance bounds for the meta-learning problem using PAC-Bayes theory. We present a computationally feasible posterior sampling algorithm and study the performance in a simulation study consisting of diverse scenes. Finally, we examine the potential performance benefits and practical challenges associated with online meta-learning for waveform-agile tracking.
    Improving Meta-learning for Low-resource Text Classification and Generation via Memory Imitation. (arXiv:2203.11670v2 [cs.CL] UPDATED)
    Building models of natural language processing (NLP) is challenging in low-resource scenarios where only limited data are available. Optimization-based meta-learning algorithms achieve promising results in low-resource scenarios by adapting a well-generalized model initialization to handle new tasks. Nonetheless, these approaches suffer from the memorization overfitting issue, where the model tends to memorize the meta-training tasks while ignoring support sets when adapting to new tasks. To address this issue, we propose a memory imitation meta-learning (MemIML) method that enhances the model's reliance on support sets for task adaptation. Specifically, we introduce a task-specific memory module to store support set information and construct an imitation module to force query sets to imitate the behaviors of some representative support-set samples stored in the memory. A theoretical analysis is provided to prove the effectiveness of our method, and empirical results also demonstrate that our method outperforms competitive baselines on both text classification and generation tasks.
    Bia Mitigation for Machine Learning Classifiers: A Comprehensive Survey. (arXiv:2207.07068v1 [cs.LG])
    This paper provides a comprehensive survey of bias mitigation methods for achieving fairness in Machine Learning (ML) models. We collect a total of 234 publications concerning bias mitigation for ML classifiers. These methods can be distinguished based on their intervention procedure (i.e., pre-processing, in-processing, post-processing) and the technology they apply. We investigate how existing bias mitigation methods are evaluated in the literature. In particular, we consider datasets, metrics and benchmarking. Based on the gathered insights (e.g., what is the most popular fairness metric? How many datasets are used for evaluating bias mitigation methods?). We hope to support practitioners in making informed choices when developing and evaluating new bias mitigation methods.
    Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games. (arXiv:2207.06541v1 [cs.GT])
    In competitive two-agent environments, deep reinforcement learning (RL) methods based on the \emph{Double Oracle (DO)} algorithm, such as \emph{Policy Space Response Oracles (PSRO)} and \emph{Anytime PSRO (APSRO)}, iteratively add RL best response policies to a population. Eventually, an optimal mixture of these population policies will approximate a Nash equilibrium. However, these methods might need to add all deterministic policies before converging. In this work, we introduce \emph{Self-Play PSRO (SP-PSRO)}, a method that adds an approximately optimal stochastic policy to the population in each iteration. Instead of adding only deterministic best responses to the opponent's least exploitable population mixture, SP-PSRO also learns an approximately optimal stochastic policy and adds it to the population as well. As a result, SP-PSRO empirically tends to converge much faster than APSRO and in many games converges in just a few iterations.
    Sub 8-Bit Quantization of Streaming Keyword Spotting Models for Embedded Chipsets. (arXiv:2207.06920v1 [cs.SD])
    We propose a novel 2-stage sub 8-bit quantization aware training algorithm for all components of a 250K parameter feedforward, streaming, state-free keyword spotting model. For the 1st-stage, we adapt a recently proposed quantization technique using a non-linear transformation with tanh(.) on dense layer weights. In the 2nd-stage, we use linear quantization methods on the rest of the network, including other parameters (bias, gain, batchnorm), inputs, and activations. We conduct large scale experiments, training on 26,000 hours of de-identified production, far-field and near-field audio data (evaluating on 4,000 hours of data). We organize our results in two embedded chipset settings: a) with commodity ARM NEON instruction set and 8-bit containers, we present accuracy, CPU, and memory results using sub 8-bit weights (4, 5, 8-bit) and 8-bit quantization of rest of the network; b) with off-the-shelf neural network accelerators, for a range of weight bit widths (1 and 5-bit), while presenting accuracy results, we project reduction in memory utilization. In both configurations, our results show that the proposed algorithm can achieve: a) parity with a full floating point model's operating point on a detection error tradeoff (DET) curve in terms of false detection rate (FDR) at false rejection rate (FRR); b) significant reduction in compute and memory, yielding up to 3 times improvement in CPU consumption and more than 4 times improvement in memory consumption.
    Hypergraphon Mean Field Games. (arXiv:2203.16223v2 [cs.GT] UPDATED)
    We propose an approach to modelling large-scale multi-agent dynamical systems allowing interactions among more than just pairs of agents using the theory of mean-field games and the notion of hypergraphons, which are obtained as limits of large hypergraphs. To the best of our knowledge, ours is the first work on mean field games on hypergraphs. Together with an extension to a multi-layer setup, we obtain limiting descriptions for large systems of non-linear, weakly-interacting dynamical agents. On the theoretical side, we prove the well-foundedness of the resulting hypergraphon mean field game, showing both existence and approximate Nash properties. On the applied side, we extend numerical and learning algorithms to compute the hypergraphon mean field equilibria. To verify our approach empirically, we consider an epidemic control problem and a social rumor spreading model, where we give agents intrinsic motivation to spread rumors to unaware agents.
    Dynamically handling task disruptions by composing together behavior modules. (arXiv:2207.06482v1 [cs.LG])
    Biological neural networks operate in the presence of task disruptions as they guide organisms toward goals. A familiar stream of stimulus-response causations can be disrupted by subtask streams imposed by the environment. For example, taking a familiar path to a foraging area might be disrupted by the presence of a predator, necessitating a "detour" to the area. The detour can be a known alternative path that must be dynamically composed with the original path to accomplish the overall task. In this project, overarching base paths are disrupted by independently learned path modules in the form of insertion, substitution, and deletion modifications to the base paths such that the resulting modified paths are novel to the network. The network's performance is then tested on these paths that have been learned in piecemeal fashion. In sum, the network must compose a new task on the fly. Several network architectures are tested: Time delay neural network (TDNN), Long short-term memory (LSTM), Temporal convolutional network (TCN), and Morphognosis, a hierarchical neural network. LSTM and Morphognosis perform significantly better for this task.
    Speech-enhanced and Noise-aware Networks for Robust Speech Recognition. (arXiv:2203.13696v2 [cs.SD] UPDATED)
    Compensation for channel mismatch and noise interference is essential for robust automatic speech recognition. Enhanced speech has been introduced into the multi-condition training of acoustic models to improve their generalization ability. In this paper, a noise-aware training framework based on two cascaded neural structures is proposed to jointly optimize speech enhancement and speech recognition. The feature enhancement module is composed of a multi-task autoencoder, where noisy speech is decomposed into clean speech and noise. By concatenating its enhanced, noise-aware, and noisy features for each frame, the acoustic-modeling module maps each feature-augmented frame into a triphone state by optimizing the lattice-free maximum mutual information and cross entropy between the predicted and actual state sequences. On top of the factorized time delay neural network (TDNN-F) and its convolutional variant (CNN-TDNNF), both with SpecAug, the two proposed systems achieve word error rate (WER) of 3.90% and 3.55%, respectively, on the Aurora-4 task. Compared with the best existing systems that use bigram and trigram language models for decoding, the proposed CNN-TDNNF-based system achieves a relative WER reduction of 15.20% and 33.53%, respectively. In addition, the proposed CNN-TDNNF-based system also outperforms the baseline CNN-TDNNF system on the AMI task.
    Deep Unlearning via Randomized Conditionally Independent Hessians. (arXiv:2204.07655v2 [cs.CV] UPDATED)
    Recent legislation has led to interest in machine unlearning, i.e., removing specific training samples from a predictive model as if they never existed in the training dataset. Unlearning may also be required due to corrupted/adversarial data or simply a user's updated privacy requirement. For models which require no training (k-NN), simply deleting the closest original sample can be effective. But this idea is inapplicable to models which learn richer representations. Recent ideas leveraging optimization-based updates scale poorly with the model dimension d, due to inverting the Hessian of the loss function. We use a variant of a new conditional independence coefficient, L-CODEC, to identify a subset of the model parameters with the most semantic overlap on an individual sample level. Our approach completely avoids the need to invert a (possibly) huge matrix. By utilizing a Markov blanket selection, we premise that L-CODEC is also suitable for deep unlearning, as well as other applications in vision. Compared to alternatives, L-CODEC makes approximate unlearning possible in settings that would otherwise be infeasible, including vision models used for face recognition, person re-identification and NLP models that may require unlearning samples identified for exclusion. Code can be found at https://github.com/vsingh-group/LCODEC-deep-unlearning/
    Protein 3D structure-based neural networks highly improve the accuracy in compound-protein binding affinity prediction. (arXiv:2204.12586v2 [q-bio.BM] UPDATED)
    Theoretically, the accuracy of computational models in compound-protein binding affinities (CPAs) could be improved by the introduction of protein 3D structure information. However, most of these models still suffer from low accuracy due to the lack of an efficient approach to encode informative protein features. The major challenge is how to combine the multi-modal information such as the residue sequence of the protein, residue atom coordinates and the torsion angles. To tackle this problem, we develop Fast Evolutional Attention and Thoroughgoing-graph Neural Networks (FeatNN) to facilitate the application of protein 3D structure information for predicting CPAs. Specifically, we established a novel end-to-end architecture to jointly embed torsion matrix, discrete distance matrix, and sequence information of protein and extract compound features with deep graph convolution layers. In addition, a new pairwise mapping attention mechanism is introduced to comprehensively learn potential interaction information between proteins and compounds. FeatNN considerably outperforms various state-of-the-art baselines in CPA prediction with the R2 coefficient elevated by about 21.33%. Thus, FeatNN provides an outstanding method for highly accurate CPA prediction and facilitates high-throughput virtual screening of drug candidates.
    Improved Binary Forward Exploration: Learning Rate Scheduling Method for Stochastic Optimization. (arXiv:2207.04198v2 [cs.LG] UPDATED)
    A new gradient-based optimization approach by automatically scheduling the learning rate has been proposed recently, which is called Binary Forward Exploration (BFE). The Adaptive version of BFE has also been discussed thereafter. In this paper, the improved algorithms based on them will be investigated, in order to optimize the efficiency and robustness of the new methodology. This improved approach provides a new perspective to scheduling the update of learning rate and will be compared with the stochastic gradient descent, aka SGD algorithm with momentum or Nesterov momentum and the most successful adaptive learning rate algorithm e.g. Adam. The goal of this method does not aim to beat others but provide a different viewpoint to optimize the gradient descent process. This approach combines the advantages of the first-order and second-order optimizations in the aspects of speed and efficiency.
    Interference-Limited Ultra-Reliable and Low-Latency Communications: Graph Neural Networks or Stochastic Geometry?. (arXiv:2207.06918v1 [eess.SP])
    In this paper, we aim to improve the Quality-of-Service (QoS) of Ultra-Reliability and Low-Latency Communications (URLLC) in interference-limited wireless networks. To obtain time diversity within the channel coherence time, we first put forward a random repetition scheme that randomizes the interference power. Then, we optimize the number of reserved slots and the number of repetitions for each packet to minimize the QoS violation probability, defined as the percentage of users that cannot achieve URLLC. We build a cascaded Random Edge Graph Neural Network (REGNN) to represent the repetition scheme and develop a model-free unsupervised learning method to train it. We analyze the QoS violation probability using stochastic geometry in a symmetric scenario and apply a model-based Exhaustive Search (ES) method to find the optimal solution. Simulation results show that in the symmetric scenario, the QoS violation probabilities achieved by the model-free learning method and the model-based ES method are nearly the same. In more general scenarios, the cascaded REGNN generalizes very well in wireless networks with different scales, network topologies, cell densities, and frequency reuse factors. It outperforms the model-based ES method in the presence of the model mismatch.
    Improving self-supervised pretraining models for epileptic seizure detection from EEG data. (arXiv:2207.06911v1 [eess.SP])
    There is abundant medical data on the internet, most of which are unlabeled. Traditional supervised learning algorithms are often limited by the amount of labeled data, especially in the medical domain, where labeling is costly in terms of human processing and specialized experts needed to label them. They are also prone to human error and biased as a select few expert annotators label them. These issues are mitigated by Self-supervision, where we generate pseudo-labels from unlabelled data by seeing the data itself. This paper presents various self-supervision strategies to enhance the performance of a time-series based Diffusion convolution recurrent neural network (DCRNN) model. The learned weights in the self-supervision pretraining phase can be transferred to the supervised training phase to boost the model's prediction capability. Our techniques are tested on an extension of a Diffusion Convolutional Recurrent Neural network (DCRNN) model, an RNN with graph diffusion convolutions, which models the spatiotemporal dependencies present in EEG signals. When the learned weights from the pretraining stage are transferred to a DCRNN model to determine whether an EEG time window has a characteristic seizure signal associated with it, our method yields an AUROC score $1.56\%$ than the current state-of-the-art models on the TUH EEG seizure corpus.
    Soil Erosion in the United States. Present and Future (2020-2050). (arXiv:2207.06579v1 [physics.ao-ph])
    Soil erosion is a significant threat to the environment and long-term land management around the world. Accelerated soil erosion by human activities inflicts extreme changes in terrestrial and aquatic ecosystems, which is not fully surveyed/predicted for the present and probable future at field-scales (30-m). Here, we estimate/predict soil erosion rates by water erosion, (sheet and rill erosion), using three alternative (2.6, 4.5, and 8.5) Shared Socioeconomic Pathway and Representative Concentration Pathway (SSP-RCP) scenarios across the contiguous United States. Field Scale Soil Erosion Model (FSSLM) estimations rely on a high resolution (30-m) G2 erosion model integrated by satellite- and imagery-based estimations of land use and land cover (LULC), gauge observations of long-term precipitation, and scenarios of the Coupled Model Intercomparison Project Phase 6 (CMIP6). The baseline model (2020) estimates soil erosion rates of 2.32 Mg ha 1 yr 1 with current agricultural conservation practices (CPs). Future scenarios with current CPs indicate an increase between 8% to 21% under different combinations of SSP-RCP scenarios of climate and LULC changes. The soil erosion forecast for 2050 suggests that all the climate and LULC scenarios indicate either an increase in extreme events or a change in the spatial location of extremes largely from the southern to the eastern and northeastern regions of the United States.
    Strongly Augmented Contrastive Clustering. (arXiv:2206.00380v2 [cs.LG] UPDATED)
    Deep clustering has attracted increasing attention in recent years due to its capability of joint representation learning and clustering via deep neural networks. In its latest developments, the contrastive learning has emerged as an effective technique to substantially enhance the deep clustering performance. However, the existing contrastive learning based deep clustering algorithms mostly focus on some carefully-designed augmentations (often with limited transformations to preserve the structure), referred to as weak augmentations, but cannot go beyond the weak augmentations to explore the more opportunities in stronger augmentations (with more aggressive transformations or even severe distortions). In this paper, we present an end-to-end deep clustering approach termed Strongly Augmented Contrastive Clustering (SACC), which extends the conventional two-augmentation-view paradigm to multiple views and jointly leverages strong and weak augmentations for strengthened deep clustering. Particularly, we utilize a backbone network with triply-shared weights, where a strongly augmented view and two weakly augmented views are incorporated. Based on the representations produced by the backbone, the weak-weak view pair and the strong-weak view pairs are simultaneously exploited for the instance-level contrastive learning (via an instance projector) and the cluster-level contrastive learning (via a cluster projector), which, together with the backbone, can be jointly optimized in a purely unsupervised manner. Experimental results on five challenging image datasets have shown the superiority of our SACC approach over the state-of-the-art. The code is available at https://github.com/dengxiaozhi/SACC.
    T-RECX: Tiny-Resource Efficient Convolutional Neural Networks with Early-Exit. (arXiv:2207.06613v1 [cs.LG])
    Deploying Machine learning (ML) on the milliwatt-scale edge devices (tinyML) is gaining popularity due to recent breakthroughs in ML and IoT. However, the capabilities of tinyML are restricted by strict power and compute constraints. The majority of the contemporary research in tinyML focuses on model compression techniques such as model pruning and quantization to fit ML models on low-end devices. Nevertheless, the improvements in energy consumption and inference time obtained by existing techniques are limited because aggressive compression quickly shrinks model capacity and accuracy. Another approach to improve inference time and/or reduce power while preserving its model capacity is through early-exit networks. These networks place intermediate classifiers along a baseline neural network that facilitate early exit from neural network computation if an intermediate classifier exhibits sufficient confidence in its prediction. Previous work on early-exit networks have focused on large networks, beyond what would typically be used for tinyML applications. In this paper, we discuss the challenges of adding early-exits to state-of-the-art tiny-CNNs and devise an early-exit architecture, T-RECX, that addresses these challenges. In addition, we develop a method to alleviate the effect of network overthinking at the final exit by leveraging the high-level representations learned by the early-exit. We evaluate T-RECX on three CNNs from the MLPerf tiny benchmark suite for image classification, keyword spotting and visual wake word detection tasks. Our results demonstrate that T-RECX improves the accuracy of baseline network and significantly reduces the average inference time of tiny-CNNs. T-RECX achieves 32.58% average reduction in FLOPS in exchange for 1% accuracy across all evaluated models. Also, our techniques increase the accuracy of baseline network in two out of three models we evaluate
    Attention mechanisms for physiological signal deep learning: which attention should we take?. (arXiv:2207.06904v1 [eess.SP])
    Attention mechanisms are widely used to dramatically improve deep learning model performance in various fields. However, their general ability to improve the performance of physiological signal deep learning model is immature. In this study, we experimentally analyze four attention mechanisms (e.g., squeeze-and-excitation, non-local, convolutional block attention module, and multi-head self-attention) and three convolutional neural network (CNN) architectures (e.g., VGG, ResNet, and Inception) for two representative physiological signal prediction tasks: the classification for predicting hypotension and the regression for predicting cardiac output (CO). We evaluated multiple combinations for performance and convergence of physiological signal deep learning model. Accordingly, the CNN models with the spatial attention mechanism showed the best performance in the classification problem, whereas the channel attention mechanism achieved the lowest error in the regression problem. Moreover, the performance and convergence of the CNN models with attention mechanisms were better than stand-alone self-attention models in both problems. Hence, we verified that convolutional operation and attention mechanisms are complementary and provide faster convergence time, despite the stand-alone self-attention models requiring fewer parameters.
    A Bayesian Lasso based Sparse Learning Model. (arXiv:1908.07220v3 [stat.ML] UPDATED)
    The Bayesian Lasso is constructed in the linear regression framework and applies the Gibbs sampling to estimate the regression parameters. This paper develops a new sparse learning model, named the Bayesian Lasso Sparse (BLS) model, that takes the hierarchical model formulation of the Bayesian Lasso. The main difference from the original Bayesian Lasso lies in the estimation procedure; the BLS method uses a learning algorithm based on the type-II maximum likelihood procedure. Opposed to the Bayesian Lasso, the BLS provides sparse estimates of the regression parameters. The BLS method is also derived for nonlinear supervised learning problems by introducing kernel functions. We compare the BLS model to the well known Relevance Vector Machine, the Fast Laplace method, the Byesian Lasso, and the Lasso, on both simulated and real data. The numerical results show that the BLS is sparse and precise, especially when dealing with noisy and irregular dataset.
    Insurgency as Complex Network: Image Co-Appearance and Hierarchy in the PKK. (arXiv:2207.06946v1 [cs.SI])
    Despite a growing recognition of the importance of insurgent group structure on conflict outcomes, there is very little empirical research thereon. Though this problem is rooted in the inaccessibility of data on militant group structure, insurgents frequently publish large volumes of image data on the internet. In this paper, I develop a new methodology that leverages this abundant but underutilized source of data by automating the creation of a social network graph based on co-appearance in photographs using deep learning. Using a trove of 19,115 obituary images published online by the PKK, a Kurdish militant group in Turkey, I demonstrate that an individual's centrality in the resulting co-appearance network is closely correlated with their rank in the insurgent group.
    Detecting People Interested in Non-Suicidal Self-Injury on Social Media. (arXiv:2207.07014v1 [cs.SI])
    We propose a supervised learning approach to detect people interested in Non-Suicidal Self-Injury (NSSI). We treat the task as a binary classification problem, and build classifiers based upon features extracted from people self-declared interests. Experimental evaluation on a real-world dataset, the LiveJournal social blogging networking platform, demonstrates the effectiveness of our proposed model.
    Combating Distribution Shift for Accurate Time Series Forecasting via Hypernetworks. (arXiv:2202.10808v2 [cs.LG] UPDATED)
    Time series forecasting has widespread applications in urban life ranging from air quality monitoring to traffic analysis. However, accurate time series forecasting is challenging because real-world time series suffer from the distribution shift problem, where their statistical properties change over time. Despite extensive solutions to distribution shifts in domain adaptation or generalization, they fail to function effectively in unknown, constantly-changing distribution shifts, which are common in time series. In this paper, we propose Hyper Time- Series Forecasting (HTSF), a hypernetwork-based framework for accurate time series forecasting under distribution shift. HTSF jointly learns the time-varying distributions and the corresponding forecasting models in an end-to-end fashion. Specifically, HTSF exploits the hyper layers to learn the best characterization of the distribution shifts, generating the model parameters for the main layers to make accurate predictions. We implement HTSF as an extensible framework that can incorporate diverse time series forecasting models such as RNNs and Transformers. Extensive experiments on 9 benchmarks demonstrate that HTSF achieves state-of-the-art performances.
    Neural Networks for Encoding Dynamic Security-Constrained Optimal Power Flow. (arXiv:2003.07939v5 [eess.SY] UPDATED)
    This paper introduces a framework to capture previously intractable optimization constraints and transform them to a mixed-integer linear program, through the use of neural networks. We encode the feasible space of optimization problems characterized by both tractable and intractable constraints, e.g. differential equations, to a neural network. Leveraging an exact mixed-integer reformulation of neural networks, we solve mixed-integer linear programs that accurately approximate solutions to the originally intractable non-linear optimization problem. We apply our methods to the AC optimal power flow problem (AC-OPF), where directly including dynamic security constraints renders the AC-OPF intractable. Our proposed approach has the potential to be significantly more scalable than traditional approaches. We demonstrate our approach for power system operation considering N-1 security and small-signal stability, showing how it can efficiently obtain cost-optimal solutions which at the same time satisfy both static and dynamic security constraints.
    Ranking and Tuning Pre-trained Models: A New Paradigm for Exploiting Model Hubs. (arXiv:2110.10545v4 [cs.LG] UPDATED)
    Model hubs with many pre-trained models (PTMs) have become a cornerstone of deep learning. Although built at a high cost, they remain \emph{under-exploited} -- practitioners usually pick one PTM from the provided model hub by popularity and then fine-tune the PTM to solve the target task. This na\"ive but common practice poses two obstacles to full exploitation of pre-trained model hubs: first, the PTM selection by popularity has no optimality guarantee, and second, only one PTM is used while the remaining PTMs are ignored. An alternative might be to consider all possible combinations of PTMs and extensively fine-tune each combination, but this would not only be prohibitive computationally but may also lead to statistical over-fitting. In this paper, we propose a new paradigm for exploiting model hubs that is intermediate between these extremes. The paradigm is characterized by two aspects: (1) We use an evidence maximization procedure to estimate the maximum value of label evidence given features extracted by pre-trained models. This procedure can rank all the PTMs in a model hub for various types of PTMs and tasks \emph{before fine-tuning}. (2) The best ranked PTM can either be fine-tuned and deployed if we have no preference for the model's architecture or the target PTM can be tuned by the top $K$ ranked PTMs via a Bayesian procedure that we propose. This procedure, which we refer to as \emph{B-Tuning}, not only improves upon specialized methods designed for tuning homogeneous PTMs, but also applies to the challenging problem of tuning heterogeneous PTMs where it yields a new level of benchmark performance.
    Learning to Parallelize in a Shared-Memory Environment with Transformers. (arXiv:2204.12835v4 [cs.DC] UPDATED)
    In past years, the world has switched to many-core and multi-core shared memory architectures. As a result, there is a growing need to utilize these architectures by introducing shared memory parallelization schemes to software applications. OpenMP is the most comprehensive API that implements such schemes, characterized by a readable interface. Nevertheless, introducing OpenMP into code is challenging due to pervasive pitfalls in management of parallel shared memory. To facilitate the performance of this task, many source-to-source (S2S) compilers have been created over the years, tasked with inserting OpenMP directives into code automatically. In addition to having limited robustness to their input format, these compilers still do not achieve satisfactory coverage and precision in locating parallelizable code and generating appropriate directives. In this work, we propose leveraging recent advances in ML techniques, specifically in natural language processing (NLP), to replace S2S compilers altogether. We create a database (corpus), Open-OMP, specifically for this goal. Open-OMP contains over 28,000 code snippets, half of which contain OpenMP directives while the other half do not need parallelization at all with high probability. We use the corpus to train systems to automatically classify code segments in need of parallelization, as well as suggest individual OpenMP clauses. We train several transformer models, named PragFormer, for these tasks, and show that they outperform statistically-trained baselines and automatic S2S parallelization compilers in both classifying the overall need for an OpenMP directive and the introduction of private and reduction clauses. Our source code and database are available at: https://github.com/Scientific-Computing-Lab-NRCN/PragFormer.
    Feature robustness and sex differences in medical imaging: a case study in MRI-based Alzheimer's disease detection. (arXiv:2204.01737v3 [eess.IV] UPDATED)
    Convolutional neural networks have enabled significant improvements in medical image-based diagnosis. It is, however, increasingly clear that these models are susceptible to performance degradation when facing spurious correlations and dataset shift, leading, e.g., to underperformance on underrepresented patient groups. In this paper, we compare two classification schemes on the ADNI MRI dataset: a simple logistic regression model using manually selected volumetric features, and a convolutional neural network trained on 3D MRI data. We assess the robustness of the trained models in the face of varying dataset splits, training set sex composition, and stage of disease. In contrast to earlier work in other imaging modalities, we do not observe a clear pattern of improved model performance for the majority group in the training dataset. Instead, while logistic regression is fully robust to dataset composition, we find that CNN performance is generally improved for both male and female subjects when including more female subjects in the training dataset. We hypothesize that this might be due to inherent differences in the pathology of the two sexes. Moreover, in our analysis, the logistic regression model outperforms the 3D CNN, emphasizing the utility of manual feature specification based on prior knowledge, and the need for more robust automatic feature selection.
    One Model to Unite Them All: Personalized Federated Learning of Multi-Contrast MRI Synthesis. (arXiv:2207.06509v1 [eess.IV])
    Learning-based MRI translation involves a synthesis model that maps a source-contrast onto a target-contrast image. Multi-institutional collaborations are key to training synthesis models across broad datasets, yet centralized training involves privacy risks. Federated learning (FL) is a collaboration framework that instead adopts decentralized training to avoid sharing imaging data and mitigate privacy concerns. However, FL-trained models can be impaired by the inherent heterogeneity in the distribution of imaging data. On the one hand, implicit shifts in image distribution are evident across sites, even for a common translation task with fixed source-target configuration. Conversely, explicit shifts arise within and across sites when diverse translation tasks with varying source-target configurations are prescribed. To improve reliability against domain shifts, here we introduce the first personalized FL method for MRI Synthesis (pFLSynth). pFLSynth is based on an adversarial model equipped with a mapper that produces latents specific to individual sites and source-target contrasts. It leverages novel personalization blocks that adaptively tune the statistics and weighting of feature maps across the generator based on these latents. To further promote site-specificity, partial model aggregation is employed over downstream layers of the generator while upstream layers are retained locally. As such, pFLSynth enables training of a unified synthesis model that can reliably generalize across multiple sites and translation tasks. Comprehensive experiments on multi-site datasets clearly demonstrate the enhanced performance of pFLSynth against prior federated methods in multi-contrast MRI synthesis.
    Learning to Prove Trigonometric Identities. (arXiv:2207.06679v1 [cs.LG])
    Automatic theorem proving with deep learning methods has attracted attentions recently. In this paper, we construct an automatic proof system for trigonometric identities. We define the normalized form of trigonometric identities, design a set of rules for the proof and put forward a method which can generate theoretically infinite trigonometric identities. Our goal is not only to complete the proof, but to complete the proof in as few steps as possible. For this reason, we design a model to learn proof data generated by random BFS (rBFS), and it is proved theoretically and experimentally that the model can outperform rBFS after a simple imitation learning. After further improvement through reinforcement learning, we get AutoTrig, which can give proof steps for identities in almost as short steps as BFS (theoretically shortest method), with a time cost of only one-thousandth. In addition, AutoTrig also beats Sympy, Matlab and human in the synthetic dataset, and performs well in many generalization tasks.
    Every Preference Changes Differently: Neural Multi-Interest Preference Model with Temporal Dynamics for Recommendation. (arXiv:2207.06652v1 [cs.IR])
    User embeddings (vectorized representations of a user) are essential in recommendation systems. Numerous approaches have been proposed to construct a representation for the user in order to find similar items for retrieval tasks, and they have been proven effective in industrial recommendation systems as well. Recently people have discovered the power of using multiple embeddings to represent a user, with the hope that each embedding represents the user's interest in a certain topic. With multi-interest representation, it's important to model the user's preference over the different topics and how the preference change with time. However, existing approaches either fail to estimate the user's affinity to each interest or unreasonably assume every interest of every user fades with an equal rate with time, thus hurting the recall of candidate retrieval. In this paper, we propose the Multi-Interest Preference (MIP) model, an approach that not only produces multi-interest for users by using the user's sequential engagement more effectively but also automatically learns a set of weights to represent the preference over each embedding so that the candidates can be retrieved from each interest proportionally. Extensive experiments have been done on various industrial-scale datasets to demonstrate the effectiveness of our approach.
    Adaptive Attitude Estimation Using a Hybrid Model-Learning Approach. (arXiv:2207.06903v1 [eess.SP])
    Attitude determination using the smartphone's inertial sensors poses a major challenge due to the sensor low-performance grade and variate nature of the walking pedestrian. In this paper, data-driven techniques are employed to address that challenge. To that end, a hybrid deep learning and model based solution for attitude estimation is proposed. Here, classical model based equations are applied to form an adaptive complementary filter structure. Instead of using constant or model based adaptive weights, the accelerometer weights in each axis are determined by a unique neural network. The performance of the proposed hybrid approach is evaluated relative to popular model based approaches using experimental data.
    Semi-Supervised Temporal Action Detection with Proposal-Free Masking. (arXiv:2207.07059v1 [cs.CV])
    Existing temporal action detection (TAD) methods rely on a large number of training data with segment-level annotations. Collecting and annotating such a training set is thus highly expensive and unscalable. Semi-supervised TAD (SS-TAD) alleviates this problem by leveraging unlabeled videos freely available at scale. However, SS-TAD is also a much more challenging problem than supervised TAD, and consequently much under-studied. Prior SS-TAD methods directly combine an existing proposal-based TAD method and a SSL method. Due to their sequential localization (e.g, proposal generation) and classification design, they are prone to proposal error propagation. To overcome this limitation, in this work we propose a novel Semi-supervised Temporal action detection model based on PropOsal-free Temporal mask (SPOT) with a parallel localization (mask generation) and classification architecture. Such a novel design effectively eliminates the dependence between localization and classification by cutting off the route for error propagation in-between. We further introduce an interaction mechanism between classification and localization for prediction refinement, and a new pretext task for self-supervised model pre-training. Extensive experiments on two standard benchmarks show that our SPOT outperforms state-of-the-art alternatives, often by a large margin. The PyTorch implementation of SPOT is available at https://github.com/sauradip/SPOT
    MorphoActivation: Generalizing ReLU activation function by mathematical morphology. (arXiv:2207.06413v1 [cs.LG])
    This paper analyses both nonlinear activation functions and spatial max-pooling for Deep Convolutional Neural Networks (DCNNs) by means of the algebraic basis of mathematical morphology. Additionally, a general family of activation functions is proposed by considering both max-pooling and nonlinear operators in the context of morphological representations. Experimental section validates the goodness of our approach on classical benchmarks for supervised learning by DCNN.
    Frequency-Encoded Deep Learning with Speed-of-Light Dominated Latency. (arXiv:2207.06883v1 [cs.ET])
    The ability of deep neural networks to perform complex tasks more accurately than manually-crafted solutions has created a substantial demand for more complex models processing larger amounts of data. However, the traditional computing architecture has reached a bottleneck in processing performance due to data movement from memory to computing. Considerable efforts have been made towards custom hardware acceleration, among which are optical neural networks (ONNs). These excel at energy efficient linear operations but struggle with scalability and the integration of linear and nonlinear functions. Here, we introduce our multiplicative analog frequency transform optical neural network (MAFT-ONN) that encodes the data in the frequency domain to compute matrix-vector products in a single-shot using a single photoelectric multiplication, and then implements the nonlinear activation for all neurons using a single electro-optic modulator. We experimentally demonstrate a 3-layer DNN with our architecture using a simple hardware setup assembled with commercial components. Additionally, this is the first DNN hardware accelerator suitable for analog inference of temporal waveforms like voice or radio signals, achieving bandwidth-limited throughput and speed-of-light limited latency. Our results demonstrate a highly scalable ONN with a straightforward path to surpassing the current computing bottleneck, in addition to enabling new possibilities for high-performance analog deep learning of temporal waveforms.
    Mirror Learning: A Unifying Framework of Policy Optimisation. (arXiv:2201.02373v10 [cs.LG] UPDATED)
    Modern deep reinforcement learning (RL) algorithms are motivated by either the generalised policy iteration (GPI) or trust-region learning (TRL) frameworks. However, algorithms that strictly respect these theoretical frameworks have proven unscalable. Surprisingly, the only known scalable algorithms violate the GPI/TRL assumptions, e.g. due to required regularisation or other heuristics. The current explanation of their empirical success is essentially "by analogy": they are deemed approximate adaptations of theoretically sound methods. Unfortunately, studies have shown that in practice these algorithms differ greatly from their conceptual ancestors. In contrast, in this paper we introduce a novel theoretical framework, named Mirror Learning, which provides theoretical guarantees to a large class of algorithms, including TRPO and PPO. While the latter two exploit the flexibility of our framework, GPI and TRL fit in merely as pathologically restrictive corner cases thereof. This suggests that the empirical performance of state-of-the-art methods is a direct consequence of their theoretical properties, rather than of aforementioned approximate analogies. Mirror learning sets us free to boldly explore novel, theoretically sound RL algorithms, a thus far uncharted wonderland.
    Strain-Minimizing Hyperbolic Network Embeddings with Landmarks. (arXiv:2207.06775v1 [stat.CO])
    We introduce L-hydra (landmarked hyperbolic distance recovery and approximation), a method for embedding network- or distance-based data into hyperbolic space, which requires only the distance measurements to a few 'landmark nodes'. This landmark heuristic makes L-hydra applicable to large-scale graphs and improves upon previously introduced methods. As a mathematical justification, we show that a point configuration in d-dimensional hyperbolic space can be perfectly recovered (up to isometry) from distance measurements to just d+1 landmarks. We also show that L-hydra solves a two-stage strain-minimization problem, similar to our previous (unlandmarked) method 'hydra'. Testing on real network data, we show that L-hydra is an order of magnitude faster than existing hyperbolic embedding methods and scales linearly in the number of nodes. While the embedding error of L-hydra is higher than the error of existing methods, we introduce an extension, L-hydra+, which outperforms existing methods in both runtime and embedding quality.
    Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting. (arXiv:2207.06569v1 [cs.LG])
    The practical success of overparameterized neural networks has motivated the recent scientific study of interpolating methods, which perfectly fit their training data. Certain interpolating methods, including neural networks, can fit noisy training data without catastrophically bad test performance, in defiance of standard intuitions from statistical learning theory. Aiming to explain this, a body of recent work has studied $\textit{benign overfitting}$, a phenomenon where some interpolating methods approach Bayes optimality, even in the presence of noise. In this work we argue that while benign overfitting has been instructive and fruitful to study, many real interpolating methods like neural networks $\textit{do not fit benignly}$: modest noise in the training set causes nonzero (but non-infinite) excess risk at test time, implying these models are neither benign nor catastrophic but rather fall in an intermediate regime. We call this intermediate regime $\textit{tempered overfitting}$, and we initiate its systematic study. We first explore this phenomenon in the context of kernel (ridge) regression (KR) by obtaining conditions on the ridge parameter and kernel eigenspectrum under which KR exhibits each of the three behaviors. We find that kernels with powerlaw spectra, including Laplace kernels and ReLU neural tangent kernels, exhibit tempered overfitting. We then empirically study deep neural networks through the lens of our taxonomy, and find that those trained to interpolation are tempered, while those stopped early are benign. We hope our work leads to a more refined understanding of overfitting in modern learning.
    Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Primal-Dual Approach. (arXiv:2109.06332v3 [cs.LG] UPDATED)
    Reinforcement learning is widely used in applications where one needs to perform sequential decisions while interacting with the environment. The problem becomes more challenging when the decision requirement includes satisfying some safety constraints. The problem is mathematically formulated as constrained Markov decision process (CMDP). In the literature, various algorithms are available to solve CMDP problems in a model-free manner to achieve $\epsilon$-optimal cumulative reward with $\epsilon$ feasible policies. An $\epsilon$-feasible policy implies that it suffers from constraint violation. An important question here is whether we can achieve $\epsilon$-optimal cumulative reward with zero constraint violations or not. To achieve that, we advocate the use of randomized primal-dual approach to solve the CMDP problems and propose a conservative stochastic primal-dual algorithm (CSPDA) which is shown to exhibit $\tilde{\mathcal{O}}\left(1/\epsilon^2\right)$ sample complexity to achieve $\epsilon$-optimal cumulative reward with zero constraint violations. In the prior works, the best available sample complexity for the $\epsilon$-optimal policy with zero constraint violation is $\tilde{\mathcal{O}}\left(1/\epsilon^5\right)$. Hence, the proposed algorithm provides a significant improvement as compared to the state of the art.
    Parameter-Efficient Prompt Tuning Makes Generalized and Calibrated Neural Text Retrievers. (arXiv:2207.07087v1 [cs.CL])
    Prompt tuning attempts to update few task-specific parameters in pre-trained models. It has achieved comparable performance to fine-tuning of the full parameter set on both language understanding and generation tasks. In this work, we study the problem of prompt tuning for neural text retrievers. We introduce parameter-efficient prompt tuning for text retrieval across in-domain, cross-domain, and cross-topic settings. Through an extensive analysis, we show that the strategy can mitigate the two issues -- parameter-inefficiency and weak generalizability -- faced by fine-tuning based retrieval methods. Notably, it can significantly improve the out-of-domain zero-shot generalization of the retrieval models. By updating only 0.1% of the model parameters, the prompt tuning strategy can help retrieval models achieve better generalization performance than traditional methods in which all parameters are updated. Finally, to facilitate research on retrievers' cross-topic generalizability, we curate and release an academic retrieval dataset with 18K query-results pairs in 87 topics, making it the largest topic-specific one to date.
    Forming Trees with Treeformers. (arXiv:2207.06960v1 [cs.CL])
    Popular models such as Transformers and LSTMs use tokens as its unit of information. That is, each token is encoded into a vector representation, and those vectors are used directly in a computation. However, humans frequently consider spans of tokens (i.e., phrases) instead of their constituent tokens. In this paper we introduce Treeformer, an architecture inspired by the CKY algorithm and Transformer which learns a composition operator and pooling function in order to construct hierarchical encodings for phrases and sentences. Our extensive experiments demonstrate the benefits of incorporating a hierarchical structure into the Transformer, and show significant improvements compared to a baseline Transformer in machine translation, abstractive summarization, and various natural language understanding tasks.
    A Generalized Framework for Microstructural Optimization using Neural Networks. (arXiv:2207.06512v1 [cond-mat.mtrl-sci])
    Microstructures, i.e., architected materials, are designed today, typically, by maximizing an objective, such as bulk modulus, subject to a volume constraint. However, in many applications, it is often more appropriate to impose constraints on other physical quantities of interest. In this paper, we consider such generalized microstructural optimization problems where any of the microstructural quantities, namely, bulk, shear, Poisson ratio, or volume, can serve as the objective, while the remaining can serve as constraints. In particular, we propose here a neural-network (NN) framework to solve such problems. The framework relies on the classic density formulation of microstructural optimization, but the density field is represented through the NN's weights and biases. The main characteristics of the proposed NN framework are: (1) it supports automatic differentiation, eliminating the need for manual sensitivity derivations, (2) smoothing filters are not required due to implicit filtering, (3) the framework can be easily extended to multiple-materials, and (4) a high-resolution microstructural topology can be recovered through a simple post-processing step. The framework is illustrated through a variety of microstructural optimization problems.
    MDEAW: A Multimodal Dataset for Emotion Analysis through EDA and PPG signals from wireless wearable low-cost off-the-shelf Devices. (arXiv:2207.06410v1 [cs.HC])
    We present MDEAW, a multimodal database consisting of Electrodermal Activity (EDA) and Photoplethysmography (PPG) signals recorded during the exams for the course taught by the teacher at Eurecat Academy, Sabadell, Barcelona in order to elicit the emotional reactions to the students in a classroom scenario. Signals from 10 students were recorded along with the students' self-assessment of their affective state after each stimulus, in terms of 6 basic emotion states. All the signals were captured using portable, wearable, wireless, low-cost, and off-the-shelf equipment that has the potential to allow the use of affective computing methods in everyday applications. A baseline for student-wise affect recognition using EDA and PPG-based features, as well as their fusion, was established through ReMECS, Fed-ReMECS, and Fed-ReMECS-U. These results indicate the prospects of using low-cost devices for affective state recognition applications. The proposed database will be made publicly available in order to allow researchers to achieve a more thorough evaluation of the suitability of these capturing devices for emotion state recognition applications.  ( 2 min )
    Modeling Long-term Dependencies and Short-term Correlations in Patient Journey Data with Temporal Attention Networks for Health Prediction. (arXiv:2207.06414v1 [cs.LG])
    Building models for health prediction based on Electronic Health Records (EHR) has become an active research area. EHR patient journey data consists of patient time-ordered clinical events/visits from patients. Most existing studies focus on modeling long-term dependencies between visits, without explicitly taking short-term correlations between consecutive visits into account, where irregular time intervals, incorporated as auxiliary information, are fed into health prediction models to capture latent progressive patterns of patient journeys. We present a novel deep neural network with four modules to take into account the contributions of various variables for health prediction: i) the Stacked Attention module strengthens the deep semantics in clinical events within each patient journey and generates visit embeddings, ii) the Short-Term Temporal Attention module models short-term correlations between consecutive visit embeddings while capturing the impact of time intervals within those visit embeddings, iii) the Long-Term Temporal Attention module models long-term dependencies between visit embeddings while capturing the impact of time intervals within those visit embeddings, iv) and finally, the Coupled Attention module adaptively aggregates the outputs of Short-Term Temporal Attention and Long-Term Temporal Attention modules to make health predictions. Experimental results on MIMIC-III demonstrate superior predictive accuracy of our model compared to existing state-of-the-art methods, as well as the interpretability and robustness of this approach. Furthermore, we found that modeling short-term correlations contributes to local priors generation, leading to improved predictive modeling of patient journeys.  ( 3 min )
    Estimating Instance-dependent Bayes-label Transition Matrix using a Deep Neural Network. (arXiv:2105.13001v3 [cs.LG] UPDATED)
    In label-noise learning, estimating the transition matrix is a hot topic as the matrix plays an important role in building statistically consistent classifiers. Traditionally, the transition from clean labels to noisy labels (i.e., clean-label transition matrix (CLTM)) has been widely exploited to learn a clean label classifier by employing the noisy data. Motivated by that classifiers mostly output Bayes optimal labels for prediction, in this paper, we study to directly model the transition from Bayes optimal labels to noisy labels (i.e., Bayes-label transition matrix (BLTM)) and learn a classifier to predict Bayes optimal labels. Note that given only noisy data, it is ill-posed to estimate either the CLTM or the BLTM. But favorably, Bayes optimal labels have less uncertainty compared with the clean labels, i.e., the class posteriors of Bayes optimal labels are one-hot vectors while those of clean labels are not. This enables two advantages to estimate the BLTM, i.e., (a) a set of examples with theoretically guaranteed Bayes optimal labels can be collected out of noisy data; (b) the feasible solution space is much smaller. By exploiting the advantages, we estimate the BLTM parametrically by employing a deep neural network, leading to better generalization and superior classification performance.  ( 3 min )
    Changepoint Detection for Real-Time Spectrum Sharing Radar. (arXiv:2207.06409v1 [eess.SY])
    Radar must adapt to changing environments, and we propose changepoint detection as a method to do so. In the world of increasingly congested radio frequencies, radars must adapt to avoid interference. Many radar systems employ the prediction action cycle to proactively determine transmission mode while spectrum sharing. This method constructs and implements a model of the environment to predict unused frequencies, and then transmits in this predicted availability. For these selection strategies, performance is directly reliant on the quality of the underlying environmental models. In order to keep up with a changing environment, these models can employ changepoint detection. Changepoint detection is the identification of sudden changes, or changepoints, in the distribution from which data is drawn. This information allows the models to discard "garbage" data from a previous distribution, which has no relation to the current state of the environment. In this work, bayesian online changepoint detection (BOCD) is applied to the sense and predict algorithm to increase the accuracy of its models and improve its performance. In the context of spectrum sharing, these changepoints represent interferers leaving and entering the spectral environment. The addition of changepoint detection allows for dynamic and robust spectrum sharing even as interference patterns change dramatically. BOCD is especially advantageous because it enables online changepoint detection, allowing models to be updated continuously as data are collected. This strategy can also be applied to many other predictive algorithms that create models in a changing environment.  ( 3 min )
    ECG beat classification using machine learning and pre-trained convolutional neural networks. (arXiv:2207.06408v1 [eess.SP])
    The electrocardiogram (ECG) is routinely used in hospitals to analyze cardiovascular status and health of an individual. Abnormal heart rhythms can be a precursor to more serious conditions including sudden cardiac death. Classifying abnormal rhythms is a laborious process prone to error. Therefore, tools that perform automated classification with high accuracy are highly desirable. The work presented classifies five different types of ECG arrhythmia based on AAMI EC57 standard and using the MIT-BIH data set. These include non-ectopic (normal), supraventricular, ventricular, fusion, and unknown beat. By appropriately transforming pre-processed ECG waveforms into a rich feature space along with appropriate post-processing and utilizing deep convolutional neural networks post fine-tuning and hyperparameter selection, it is shown that highly accurate classification for the five waveform types can be obtained. Performance on the test set indicated higher overall accuracy (98.62%), as well as better performance in classifying each of the five waveforms than hitherto reported in literature.  ( 2 min )
    GrabQC: Graph based Query Contextualization for automated ICD coding. (arXiv:2207.06802v1 [cs.LG])
    Automated medical coding is a process of codifying clinical notes to appropriate diagnosis and procedure codes automatically from the standard taxonomies such as ICD (International Classification of Diseases) and CPT (Current Procedure Terminology). The manual coding process involves the identification of entities from the clinical notes followed by querying a commercial or non-commercial medical codes Information Retrieval (IR) system that follows the Centre for Medicare and Medicaid Services (CMS) guidelines. We propose to automate this manual process by automatically constructing a query for the IR system using the entities auto-extracted from the clinical notes. We propose \textbf{GrabQC}, a \textbf{Gra}ph \textbf{b}ased \textbf{Q}uery \textbf{C}ontextualization method that automatically extracts queries from the clinical text, contextualizes the queries using a Graph Neural Network (GNN) model and obtains the ICD Codes using an external IR system. We also propose a method for labelling the dataset for training the model. We perform experiments on two datasets of clinical text in three different setups to assert the effectiveness of our approach. The experimental results show that our proposed method is better than the compared baselines in all three settings.  ( 2 min )
    Semi-supervised cross-lingual speech emotion recognition. (arXiv:2207.06767v1 [cs.SD])
    Speech emotion recognition (SER) on a single language has achieved remarkable results through deep learning approaches over the last decade. However, cross-lingual SER remains a challenge in real-world applications due to (i) a large difference between the source and target domain distributions, (ii) the availability of few labeled and many unlabeled utterances for the new language. Taking into account previous aspects, we propose a Semi-Supervised Learning (SSL) method for cross-lingual emotion recognition when a few labels from the new language are available. Based on a Convolutional Neural Network (CNN), our method adapts to a new language by exploiting a pseudo-labeling strategy for the unlabeled utterances. In particular, the use of a hard and soft pseudo-labels approach is investigated. We thoroughly evaluate the performance of the method in a speaker-independent setup on both the source and the new language and show its robustness across five languages belonging to different linguistic strains.  ( 2 min )
    From Shapley back to Pearson: Hypothesis Testing via the Shapley Value. (arXiv:2207.07038v1 [cs.LG])
    Machine learning models, in particular artificial neural networks, are increasingly used to inform decision making in high-stakes scenarios across a variety of fields--from financial services, to public safety, and healthcare. While neural networks have achieved remarkable performance in many settings, their complex nature raises concerns on their reliability, trustworthiness, and fairness in real-world scenarios. As a result, several a-posteriori explanation methods have been proposed to highlight the features that influence a model's prediction. Notably, the Shapley value--a game theoretic quantity that satisfies several desirable properties--has gained popularity in the machine learning explainability literature. More traditionally, however, feature importance in statistical learning has been formalized by conditional independence, and a standard way to test for it is via Conditional Randomization Tests (CRTs). So far, these two perspectives on interpretability and feature importance have been considered distinct and separate. In this work, we show that Shapley-based explanation methods and conditional independence testing for feature importance are closely related. More precisely, we prove that evaluating a Shapley coefficient amounts to performing a specific set of conditional independence tests, as implemented by a procedure similar to the CRT but for a different null hypothesis. Furthermore, the obtained game-theoretic values upper bound the $p$-values of such tests. As a result, we grant large Shapley coefficients with a precise statistical sense of importance with controlled type I error.  ( 3 min )
    Musical Instrument Classification via Low-Dimensional Feature Vectors. (arXiv:1909.08444v2 [cs.SD] UPDATED)
    Music is a mysterious language that conveys feeling and thoughts via different tones and timbre. For better understanding of timbre in music, we chose music data of 6 representative instruments, analysed their timbre features and classified them. Instead of the current trend of Neural Network for black-box classification, our project is based on a combination of MFCC and LPC, and augmented with a 6-dimensional feature vector designed by ourselves from observation and attempts. In our white-box model, we observed significant patterns of sound that distinguish different timbres, and discovered some connection between objective data and subjective senses. With a totally 32-dimensional feature vector and a naive all-pairs SVM, we achieved improved classification accuracy compared to a single tool. We also attempted to analyze music pieces downloaded from the Internet, found out different performance on different instruments, explored the reasons and suggested possible ways to improve the performance.  ( 2 min )
    Estimating Classification Confidence Using Kernel Densities. (arXiv:2207.06529v1 [stat.ML])
    This paper investigates the post-hoc calibration of confidence for "exploratory" machine learning classification problems. The difficulty in these problems stems from the continuing desire to push the boundaries of which categories have enough examples to generalize from when curating datasets, and confusion regarding the validity of those categories. We argue that for such problems the "one-versus-all" approach (top-label calibration) must be used rather than the "calibrate-the-full-response-matrix" approach advocated elsewhere in the literature. We introduce and test four new algorithms designed to handle the idiosyncrasies of category-specific confidence estimation. Chief among these methods is the use of kernel density ratios for confidence calibration including a novel, bulletproof algorithm for choosing the bandwidth. We test our claims and explore the limits of calibration on a bioinformatics application (PhANNs)1 as well as the classic MNIST benchmark2. Finally, our analysis argues that post-hoc calibration should always be performed, should be based only on the test dataset, and should be sanity-checked visually.  ( 2 min )
    Wakeword Detection under Distribution Shifts. (arXiv:2207.06423v1 [cs.SD])
    We propose a novel approach for semi-supervised learning (SSL) designed to overcome distribution shifts between training and real-world data arising in the keyword spotting (KWS) task. Shifts from training data distribution are a key challenge for real-world KWS tasks: when a new model is deployed on device, the gating of the accepted data undergoes a shift in distribution, making the problem of timely updates via subsequent deployments hard. Despite the shift, we assume that the marginal distributions on labels do not change. We utilize a modified teacher/student training framework, where labeled training data is augmented with unlabeled data. Note that the teacher does not have access to the new distribution as well. To train effectively with a mix of human and teacher labeled data, we develop a teacher labeling strategy based on confidence heuristics to reduce entropy on the label distribution from the teacher model; the data is then sampled to match the marginal distribution on the labels. Large scale experimental results show that a convolutional neural network (CNN) trained on far-field audio, and evaluated on far-field audio drawn from a different distribution, obtains a 14.3% relative improvement in false discovery rate (FDR) at equal false reject rate (FRR), while yielding a 5% improvement in FDR under no distribution shift. Under a more severe distribution shift from far-field to near-field audio with a smaller fully connected network (FCN) our approach achieves a 52% relative improvement in FDR at equal FRR, while yielding a 20% relative improvement in FDR on the original distribution.  ( 3 min )
    An Investigation on Non-Invasive Brain-Computer Interfaces: Emotiv Epoc+ Neuroheadset and Its Effectiveness. (arXiv:2207.06914v1 [eess.SP])
    In this study, we illustrate the progress of BCI research and present scores of unveiled contemporary approaches. First, we explore a decoding natural speech approach that is designed to decode human speech directly from the human brain onto a digital screen introduced by Facebook Reality Lab and University of California San Francisco. Then, we study a recently presented visionary project to control the human brain using Brain-Machine Interfaces (BMI) approach. We also investigate well-known electroencephalography (EEG) based Emotiv Epoc+ Neuroheadset to identify six emotional parameters including engagement, excitement, focus, stress, relaxation, and interest using brain signals by experimenting the neuroheadset among three human subjects where we utilize two supervised learning classifiers, Naive Bayes and Linear Regression to show the accuracy and competency of the Epoc+ device and its associated applications in neurotechnological research. We present experimental studies and the demonstration indicates 69% and 62% improved accuracy for the aforementioned classifiers respectively in reading the performance matrices of the participants. We envision that non-invasive, insertable, and low-cost BCI approaches shall be the focal point for not only an alternative for patients with physical paralysis but also understanding the brain that would pave us to access and control the memories and brain somewhere very near.  ( 3 min )
    Pediatric Sleep Scoring In-the-wild from Millions of Multi-channel EEG Signals. (arXiv:2207.06921v1 [eess.SP])
    Sleep is critical to the health and development of infants, children, and adolescents, but pediatric sleep is severely under-researched compared to adult sleep in the context of machine learning for health and well-being. Here, we present the first automated pediatric sleep scoring results on a recent large-scale sleep study dataset that was collected during standard clinical care. We develop a transformer-based deep neural network model that learns to classify five sleep stages from millions of multi-channel electroencephalogram (EEG) signals with 78% overall accuracy. Further, we conduct an in-depth analysis of the model performance based on patient demographics and EEG channels.  ( 2 min )
    Early Detection of Ovarian Cancer by Wavelet Analysis of Protein Mass Spectra. (arXiv:2207.07028v1 [cs.LG])
    Accurate and efficient detection of ovarian cancer at early stages is critical to ensure proper treatments for patients. Among the first-line modalities investigated in studies of early diagnosis are features distilled from protein mass spectra. This method, however, considers only a specific subset of spectral responses and ignores the interplay among protein expression levels, which can also contain diagnostic information. We propose a new modality that automatically searches protein mass spectra for discriminatory features by considering the self-similar nature of the spectra. Self-similarity is assessed by taking a wavelet decomposition of protein mass spectra and estimating the rate of level-wise decay in the energies of the resulting wavelet coefficients. Level-wise energies are estimated in a robust manner using distance variance, and rates are estimated locally via a rolling window approach. This results in a collection of rates that can be used to characterize the interplay among proteins, which can be indicative of cancer presence. Discriminatory descriptors are then selected from these evolutionary rates and used as classifying features. The proposed wavelet-based features are used in conjunction with features proposed in the existing literature for early stage diagnosis of ovarian cancer using two datasets published by the American National Cancer Institute. Including the wavelet-based features from the new modality results in improvements in diagnostic performance for early-stage ovarian cancer detection. This demonstrates the ability of the proposed modality to characterize new ovarian cancer diagnostic information.  ( 3 min )
    PASHA: Efficient HPO with Progressive Resource Allocation. (arXiv:2207.06940v1 [cs.LG])
    Hyperparameter optimization (HPO) and neural architecture search (NAS) are methods of choice to obtain the best-in-class machine learning models, but in practice they can be costly to run. When models are trained on large datasets, tuning them with HPO or NAS rapidly becomes prohibitively expensive for practitioners, even when efficient multi-fidelity methods are employed. We propose an approach to tackle the challenge of tuning machine learning models trained on large datasets with limited computational resources. Our approach, named PASHA, is able to dynamically allocate maximum resources for the tuning procedure depending on the need. The experimental comparison shows that PASHA identifies well-performing hyperparameter configurations and architectures while consuming significantly fewer computational resources than solutions like ASHA.  ( 2 min )
    Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning. (arXiv:2204.07596v2 [stat.ML] UPDATED)
    An ideal learned representation should display transferability and robustness. Supervised contrastive learning (SupCon) is a promising method for training accurate models, but produces representations that do not capture these properties due to class collapse -- when all points in a class map to the same representation. Recent work suggests that "spreading out" these representations improves them, but the precise mechanism is poorly understood. We argue that creating spread alone is insufficient for better representations, since spread is invariant to permutations within classes. Instead, both the correct degree of spread and a mechanism for breaking this invariance are necessary. We first prove that adding a weighted class-conditional InfoNCE loss to SupCon controls the degree of spread. Next, we study three mechanisms to break permutation invariance: using a constrained encoder, adding a class-conditional autoencoder, and using data augmentation. We show that the latter two encourage clustering of latent subclasses under more realistic conditions than the former. Using these insights, we show that adding a properly-weighted class-conditional InfoNCE loss and a class-conditional autoencoder to SupCon achieves 11.1 points of lift on coarse-to-fine transfer across 5 standard datasets and 4.7 points on worst-group robustness on 3 datasets, setting state-of-the-art on CelebA by 11.5 points.  ( 3 min )
    Comparing the latent space of generative models. (arXiv:2207.06812v1 [cs.LG])
    Different encodings of datapoints in the latent space of latent-vector generative models may result in more or less effective and disentangled characterizations of the different explanatory factors of variation behind the data. Many works have been recently devoted to the explorationof the latent space of specific models, mostly focused on the study of how features are disentangled and of how trajectories producing desired alterations of data in the visible space can be found. In this work we address the more general problem of comparing the latent spaces of different models, looking for transformations between them. We confined the investigation to the familiar and largely investigated case of generative models for the data manifold of human faces. The surprising, preliminary result reported in this article is that (provided models have not been taught or explicitly conceived to act differently) a simple linear mapping is enough to pass from a latent space to another while preserving most of the information.  ( 2 min )
    Differentially Private Graph Learning via Sensitivity-Bounded Personalized PageRank. (arXiv:2207.06944v1 [cs.CR])
    Personalized PageRank (PPR) is a fundamental tool in unsupervised learning of graph representations such as node ranking, labeling, and graph embedding. However, while data privacy is one of the most important recent concerns, existing PPR algorithms are not designed to protect user privacy. PPR is highly sensitive to the input graph edges: the difference of only one edge may cause a big change in the PPR vector, potentially leaking private user data. In this work, we propose an algorithm which outputs an approximate PPR and has provably bounded sensitivity to input edges. In addition, we prove that our algorithm achieves similar accuracy to non-private algorithms when the input graph has large degrees. Our sensitivity-bounded PPR directly implies private algorithms for several tools of graph learning, such as, differentially private (DP) PPR ranking, DP node classification, and DP node embedding. To complement our theoretical analysis, we also empirically verify the practical performances of our algorithms.  ( 2 min )
    Deep Learning Methods for Protein Family Classification on PDB Sequencing Data. (arXiv:2207.06678v1 [q-bio.QM])
    Composed of amino acid chains that influence how they fold and thus dictating their function and features, proteins are a class of macromolecules that play a central role in major biological processes and are required for the structure, function, and regulation of the body's tissues. Understanding protein functions is vital to the development of therapeutics and precision medicine, and hence the ability to classify proteins and their functions based on measurable features is crucial; indeed, the automatic inference of a protein's properties from its sequence of amino acids, known as its primary structure, remains an important open problem within the field of bioinformatics, especially given the recent advancements in sequencing technologies and the extensive number of known but uncategorized proteins with unknown properties. In this work, we demonstrate and compare the performance of several deep learning frameworks, including novel bi-directional LSTM and convolutional models, on widely available sequencing data from the Protein Data Bank (PDB) of the Research Collaboratory for Structural Bioinformatics (RCSB), as well as benchmark this performance against classical machine learning approaches, including k-nearest neighbors and multinomial regression classifiers, trained on experimental data. Our results show that our deep learning models deliver superior performance to classical machine learning methods, with the convolutional architecture providing the most impressive inference performance.  ( 2 min )
    Iterative training of robust k-space interpolation networks for improved image reconstruction with limited scan specific training samples. (arXiv:2201.03560v2 [eess.IV] UPDATED)
    Purpose: To evaluate an iterative learning approach for enhanced performance of Robust Artificial-neural-networks for K-space Interpolation (RAKI), when only a limited amount of training data (auto-calibration signals, ACS) are available for accelerated standard 2D imaging. Methods: In a first step, the RAKI model was optimized for the case of strongly limited training data amount. In the iterative learning approach (termed iterative RAKI), the optimized RAKI model is initially trained using original and augmented ACS obtained from a linear parallel imaging reconstruction. Subsequently, the RAKI convolution filters are refined iteratively using original and augmented ACS extracted from the previous RAKI reconstruction. Evaluation was carried out on 200 retrospectively undersampled in-vivo datasets from the fastMRI neuro database with different contrast settings. Results: For limited training data (18 and 22 ACS lines for R=4 and R=5, respectively), iterative RAKI outperforms standard RAKI by reducing residual artefacts and yields strong noise suppression when compared to standard parallel imaging, underlined by quantitative reconstruction quality metrics. In combination with a phase constraint, further reconstruction improvements can be achieved. Additionally, iterative RAKI shows better performance than both GRAPPA and RAKI in case of pre-scan calibration with varying contrast between training- and undersampled data. Conclusion: The iterative learning approach with RAKI benefits from standard RAKIs well known noise suppression feature but requires less original training data for the accurate reconstruction of standard 2D images thereby improving net acceleration.  ( 3 min )
  • Open

    Contextual Inverse Optimization: Offline and Online Learning. (arXiv:2106.14015v2 [cs.LG] UPDATED)
    We study the problems of offline and online contextual optimization with feedback information, where instead of observing the loss, we observe, after-the-fact, the optimal action an oracle with full knowledge of the objective function would have taken. We aim to minimize regret, which is defined as the difference between our losses and the ones incurred by an all-knowing oracle. In the offline setting, the decision-maker has information available from past periods and needs to make one decision, while in the online setting, the decision-maker optimizes decisions dynamically over time based a new set of feasible actions and contextual functions in each period. For the offline setting, we characterize the optimal minimax policy, establishing the performance that can be achieved as a function of the underlying geometry of the information induced by the data. In the online setting, we leverage this geometric characterization to optimize the cumulative regret. We develop an algorithm that yields the first regret bound for this problem that is logarithmic in the time horizon.
    Perfectly Balanced: Improving Transfer and Robustness of Supervised Contrastive Learning. (arXiv:2204.07596v2 [stat.ML] UPDATED)
    An ideal learned representation should display transferability and robustness. Supervised contrastive learning (SupCon) is a promising method for training accurate models, but produces representations that do not capture these properties due to class collapse -- when all points in a class map to the same representation. Recent work suggests that "spreading out" these representations improves them, but the precise mechanism is poorly understood. We argue that creating spread alone is insufficient for better representations, since spread is invariant to permutations within classes. Instead, both the correct degree of spread and a mechanism for breaking this invariance are necessary. We first prove that adding a weighted class-conditional InfoNCE loss to SupCon controls the degree of spread. Next, we study three mechanisms to break permutation invariance: using a constrained encoder, adding a class-conditional autoencoder, and using data augmentation. We show that the latter two encourage clustering of latent subclasses under more realistic conditions than the former. Using these insights, we show that adding a properly-weighted class-conditional InfoNCE loss and a class-conditional autoencoder to SupCon achieves 11.1 points of lift on coarse-to-fine transfer across 5 standard datasets and 4.7 points on worst-group robustness on 3 datasets, setting state-of-the-art on CelebA by 11.5 points.
    Adversarial Sign-Corrupted Isotonic Regression. (arXiv:2207.07075v1 [math.ST])
    Classical univariate isotonic regression involves nonparametric estimation under a monotonicity constraint of the true signal. We consider a variation of this generating process, which we term adversarial sign-corrupted isotonic (\texttt{ASCI}) regression. Under this \texttt{ASCI} setting, the adversary has full access to the true isotonic responses, and is free to sign-corrupt them. Estimating the true monotonic signal given these sign-corrupted responses is a highly challenging task. Notably, the sign-corruptions are designed to violate monotonicity, and possibly induce heavy dependence between the corrupted response terms. In this sense, \texttt{ASCI} regression may be viewed as an adversarial stress test for isotonic regression. Our motivation is driven by understanding whether efficient robust estimation of the monotone signal is feasible under this adversarial setting. We develop \texttt{ASCIFIT}, a three-step estimation procedure under the \texttt{ASCI} setting. The \texttt{ASCIFIT} procedure is conceptually simple, easy to implement with existing software, and consists of applying the \texttt{PAVA} with crucial pre- and post-processing corrections. We formalize this procedure, and demonstrate its theoretical guarantees in the form of sharp high probability upper bounds and minimax lower bounds. We illustrate our findings with detailed simulations.
    Bayesian Inference with Nonlinear Generative Models: Comments on Secure Learning. (arXiv:2201.09986v3 [cs.IT] UPDATED)
    Unlike the classical linear model, nonlinear generative models have been addressed sparsely in the literature of statistical learning. This work aims to bringing attention to these models and their secrecy potential. To this end, we invoke the replica method to derive the asymptotic normalized cross entropy in an inverse probability problem whose generative model is described by a Gaussian random field with a generic covariance function. Our derivations further demonstrate the asymptotic statistical decoupling of the Bayesian estimator and specify the decoupled setting for a given nonlinear model. The replica solution depicts that strictly nonlinear models establish an all-or-nothing phase transition: There exists a critical load at which the optimal Bayesian inference changes from perfect to an uncorrelated learning. Based on this finding, we design a new secure coding scheme which achieves the secrecy capacity of the wiretap channel. This interesting result implies that strictly nonlinear generative models are perfectly secured without any secure coding. We justify this latter statement through the analysis of an illustrative model for perfectly secure and reliable inference.
    Fixing Inventory Inaccuracies At Scale. (arXiv:2006.13126v3 [stat.ML] UPDATED)
    Inaccurate records of inventory occur frequently, and by some measures cost retailers approximately 4% in annual sales. Detecting inventory inaccuracies manually is cost-prohibitive, and existing algorithmic solutions rely almost exclusively on learning from longitudinal data, which is insufficient in the dynamic environment induced by modern retail operations. Instead, we propose a solution based on cross-sectional data over stores and SKUs, observing that detecting inventory inaccuracies can be viewed as a problem of identifying anomalies in a (low-rank) Poisson matrix. State-of-the-art approaches to anomaly detection in low-rank matrices apparently fall short. Specifically, from a theoretical perspective, recovery guarantees for these approaches require that non-anomalous entries be observed with vanishingly small noise (which is not the case in our problem, and indeed in many applications). So motivated, we propose a conceptually simple entry-wise approach to anomaly detection in low-rank Poisson matrices. Our approach accommodates a general class of probabilistic anomaly models. We show that the cost incurred by our algorithm approaches that of an optimal algorithm at a min-max optimal rate. Using synthetic data and real data from a consumer goods retailer, we show that our approach provides up to a 10x cost reduction over incumbent approaches to anomaly detection. Along the way, we build on recent work that seeks entry-wise error guarantees for matrix completion, establishing such guarantees for sub-exponential matrices, a result of independent interest.
    Discovery of New Multi-Level Features for Domain Generalization via Knowledge Corruption. (arXiv:2109.04320v2 [cs.LG] UPDATED)
    Machine learning models that can generalize to unseen domains are essential when applied in real-world scenarios involving strong domain shifts. We address the challenging domain generalization (DG) problem, where a model trained on a set of source domains is expected to generalize well in unseen domains without any exposure to their data. The main challenge of DG is that the features learned from the source domains are not necessarily present in the unseen target domains, leading to performance deterioration. We assume that learning a richer set of features is crucial to improve the transfer to a wider set of unknown domains. For this reason, we propose COLUMBUS, a method that enforces new feature discovery via a targeted corruption of the most relevant input and multi-level representations of the data. We conduct an extensive empirical evaluation to demonstrate the effectiveness of the proposed approach which achieves new state-of-the-art results by outperforming 18 DG algorithms on multiple DG benchmark datasets in the DomainBed framework.
    Subgraph Frequency Distribution Estimation using Graph Neural Networks. (arXiv:2207.06684v1 [cs.LG])
    Small subgraphs (graphlets) are important features to describe fundamental units of a large network. The calculation of the subgraph frequency distributions has a wide application in multiple domains including biology and engineering. Unfortunately due to the inherent complexity of this task, most of the existing methods are computationally intensive and inefficient. In this work, we propose GNNS, a novel representational learning framework that utilizes graph neural networks to sample subgraphs efficiently for estimating their frequency distribution. Our framework includes an inference model and a generative model that learns hierarchical embeddings of nodes, subgraphs, and graph types. With the learned model and embeddings, subgraphs are sampled in a highly scalable and parallel way and the frequency distribution estimation is then performed based on these sampled subgraphs. Eventually, our methods achieve comparable accuracy and a significant speedup by three orders of magnitude compared to existing methods.
    Estimating Classification Confidence Using Kernel Densities. (arXiv:2207.06529v1 [stat.ML])
    This paper investigates the post-hoc calibration of confidence for "exploratory" machine learning classification problems. The difficulty in these problems stems from the continuing desire to push the boundaries of which categories have enough examples to generalize from when curating datasets, and confusion regarding the validity of those categories. We argue that for such problems the "one-versus-all" approach (top-label calibration) must be used rather than the "calibrate-the-full-response-matrix" approach advocated elsewhere in the literature. We introduce and test four new algorithms designed to handle the idiosyncrasies of category-specific confidence estimation. Chief among these methods is the use of kernel density ratios for confidence calibration including a novel, bulletproof algorithm for choosing the bandwidth. We test our claims and explore the limits of calibration on a bioinformatics application (PhANNs)1 as well as the classic MNIST benchmark2. Finally, our analysis argues that post-hoc calibration should always be performed, should be based only on the test dataset, and should be sanity-checked visually.
    PASHA: Efficient HPO with Progressive Resource Allocation. (arXiv:2207.06940v1 [cs.LG])
    Hyperparameter optimization (HPO) and neural architecture search (NAS) are methods of choice to obtain the best-in-class machine learning models, but in practice they can be costly to run. When models are trained on large datasets, tuning them with HPO or NAS rapidly becomes prohibitively expensive for practitioners, even when efficient multi-fidelity methods are employed. We propose an approach to tackle the challenge of tuning machine learning models trained on large datasets with limited computational resources. Our approach, named PASHA, is able to dynamically allocate maximum resources for the tuning procedure depending on the need. The experimental comparison shows that PASHA identifies well-performing hyperparameter configurations and architectures while consuming significantly fewer computational resources than solutions like ASHA.
    Volatility Based Kernels and Moving Average Means for Accurate Forecasting with Gaussian Processes. (arXiv:2207.06544v1 [cs.LG])
    A broad class of stochastic volatility models are defined by systems of stochastic differential equations. While these models have seen widespread success in domains such as finance and statistical climatology, they typically lack an ability to condition on historical data to produce a true posterior distribution. To address this fundamental limitation, we show how to re-cast a class of stochastic volatility models as a hierarchical Gaussian process (GP) model with specialized covariance functions. This GP model retains the inductive biases of the stochastic volatility model while providing the posterior predictive distribution given by GP inference. Within this framework, we take inspiration from well studied domains to introduce a new class of models, Volt and Magpie, that significantly outperform baselines in stock and wind speed forecasting, and naturally extend to the multitask setting.
    Blurs Behave Like Ensembles: Spatial Smoothings to Improve Accuracy, Uncertainty, and Robustness. (arXiv:2105.12639v4 [cs.LG] UPDATED)
    Neural network ensembles, such as Bayesian neural networks (BNNs), have shown success in the areas of uncertainty estimation and robustness. However, a crucial challenge prohibits their use in practice. BNNs require a large number of predictions to produce reliable results, leading to a significant increase in computational cost. To alleviate this issue, we propose spatial smoothing, a method that spatially ensembles neighboring feature map points of convolutional neural networks. By simply adding a few blur layers to the models, we empirically show that spatial smoothing improves accuracy, uncertainty estimation, and robustness of BNNs across a whole range of ensemble sizes. In particular, BNNs incorporating spatial smoothing achieve high predictive performance merely with a handful of ensembles. Moreover, this method also can be applied to canonical deterministic neural networks to improve the performances. A number of evidences suggest that the improvements can be attributed to the stabilized feature maps and the smoothing of the loss landscape. In addition, we provide a fundamental explanation for prior works - namely, global average pooling, pre-activation, and ReLU6 - by addressing them as special cases of spatial smoothing. These not only enhance accuracy, but also improve uncertainty estimation and robustness by making the loss landscape smoother in the same manner as spatial smoothing. The code is available at https://github.com/xxxnell/spatial-smoothing.
    Seeking the Truth Beyond the Data. An Unsupervised Machine Learning Approach. (arXiv:2207.06949v1 [stat.ML])
    Clustering is an unsupervised machine learning methodology where unlabeled elements/objects are grouped together aiming to the construction of well-established clusters that their elements are classified according to their similarity. The goal of this process is to provide a useful aid to the researcher that will help her/him to identify patterns among the data. Dealing with large databases, such patterns may not be easily detectable without the contribution of a clustering algorithm. This article provides a deep description of the most widely used clustering methodologies accompanied by useful presentations concerning suitable parameter selection and initializations. Simultaneously, this article not only represents a review highlighting the major elements of examined clustering techniques but emphasizes the comparison of these algorithms' clustering efficiency based on 3 datasets, revealing their existing weaknesses and capabilities through accuracy and complexity, during the confrontation of discrete and continuous observations. The produced results help us extract valuable conclusions about the appropriateness of the examined clustering techniques in accordance with the dataset's size.
    Differentially Private Graph Learning via Sensitivity-Bounded Personalized PageRank. (arXiv:2207.06944v1 [cs.CR])
    Personalized PageRank (PPR) is a fundamental tool in unsupervised learning of graph representations such as node ranking, labeling, and graph embedding. However, while data privacy is one of the most important recent concerns, existing PPR algorithms are not designed to protect user privacy. PPR is highly sensitive to the input graph edges: the difference of only one edge may cause a big change in the PPR vector, potentially leaking private user data. In this work, we propose an algorithm which outputs an approximate PPR and has provably bounded sensitivity to input edges. In addition, we prove that our algorithm achieves similar accuracy to non-private algorithms when the input graph has large degrees. Our sensitivity-bounded PPR directly implies private algorithms for several tools of graph learning, such as, differentially private (DP) PPR ranking, DP node classification, and DP node embedding. To complement our theoretical analysis, we also empirically verify the practical performances of our algorithms.
    An Asymmetric Contrastive Loss for Handling Imbalanced Datasets. (arXiv:2207.07080v1 [cs.LG])
    Contrastive learning is a representation learning method performed by contrasting a sample to other similar samples so that they are brought closely together, forming clusters in the feature space. The learning process is typically conducted using a two-stage training architecture, and it utilizes the contrastive loss (CL) for its feature learning. Contrastive learning has been shown to be quite successful in handling imbalanced datasets, in which some classes are overrepresented while some others are underrepresented. However, previous studies have not specifically modified CL for imbalanced datasets. In this work, we introduce an asymmetric version of CL, referred to as ACL, in order to directly address the problem of class imbalance. In addition, we propose the asymmetric focal contrastive loss (AFCL) as a further generalization of both ACL and focal contrastive loss (FCL). Results on the FMNIST and ISIC 2018 imbalanced datasets show that AFCL is capable of outperforming CL and FCL in terms of both weighted and unweighted classification accuracies. In the appendix, we provide a full axiomatic treatment on entropy, along with complete proofs.
    Analysis of Catastrophic Forgetting for Random Orthogonal Transformation Tasks in the Overparameterized Regime. (arXiv:2207.06475v1 [cs.LG])
    Overparameterization is known to permit strong generalization performance in neural networks. In this work, we provide an initial theoretical analysis of its effect on catastrophic forgetting in a continual learning setup. We show experimentally that in permuted MNIST image classification tasks, the generalization performance of multilayer perceptrons trained by vanilla stochastic gradient descent can be improved by overparameterization, and the extent of the performance increase achieved by overparameterization is comparable to that of state-of-the-art continual learning algorithms. We provide a theoretical explanation of this effect by studying a qualitatively similar two-task linear regression problem, where each task is related by a random orthogonal transformation. We show that when a model is trained on the two tasks in sequence without any additional regularization, the risk gain on the first task is small if the model is sufficiently overparameterized.
    A survey on domain adaptation theory: learning bounds and theoretical guarantees. (arXiv:2004.11829v6 [cs.LG] UPDATED)
    All famous machine learning algorithms that comprise both supervised and semi-supervised learning work well only under a common assumption: the training and test data follow the same distribution. When the distribution changes, most statistical models must be reconstructed from newly collected data, which for some applications can be costly or impossible to obtain. Therefore, it has become necessary to develop approaches that reduce the need and the effort to obtain new labeled samples by exploiting data that are available in related areas, and using these further across similar fields. This has given rise to a new machine learning framework known as transfer learning: a learning setting inspired by the capability of a human being to extrapolate knowledge across tasks to learn more efficiently. Despite a large amount of different transfer learning scenarios, the main objective of this survey is to provide an overview of the state-of-the-art theoretical results in a specific, and arguably the most popular, sub-field of transfer learning, called domain adaptation. In this sub-field, the data distribution is assumed to change across the training and the test data, while the learning task remains the same. We provide a first up-to-date description of existing results related to domain adaptation problem that cover learning bounds based on different statistical learning frameworks.
    How do tuna schools associate to dFADs? A study using echo-sounder buoys to identify global patterns. (arXiv:2207.07049v1 [stat.ML])
    Based on the data gathered by echo-sounder buoys attached to drifting Fish Aggregating Devices (dFADs) across tropical oceans, the current study applies a Machine Learning protocol to examine the temporal trends of tuna schools' association to drifting objects. Using a binary output, metrics typically used in the literature were adapted to account for the fact that the entire tuna aggregation under the dFAD was considered. The median time it took tuna to colonize the dFADs for the first time varied between 25 and 43 days, depending on the ocean, and the longest soak and colonization times were registered in the Pacific Ocean. The tuna schools' Continuous Residence Times were generally shorter than Continuous Absence Times (median values between 5 and 7 days, and 9 and 11 days, respectively), in line with the results found by previous studies. Using a regression output, two novel metrics, namely aggregation time and disaggregation time, were estimated to obtain further insight into the symmetry of the aggregation process. Across all oceans, the time it took for the tuna aggregation to depart from the dFADs was not significantly longer than the time it took for the aggregation to form. The value of these results in the context of the "ecological trap" hypothesis is discussed, and further analyses to enrich and make use of this data source are proposed.
    A Bayesian Lasso based Sparse Learning Model. (arXiv:1908.07220v3 [stat.ML] UPDATED)
    The Bayesian Lasso is constructed in the linear regression framework and applies the Gibbs sampling to estimate the regression parameters. This paper develops a new sparse learning model, named the Bayesian Lasso Sparse (BLS) model, that takes the hierarchical model formulation of the Bayesian Lasso. The main difference from the original Bayesian Lasso lies in the estimation procedure; the BLS method uses a learning algorithm based on the type-II maximum likelihood procedure. Opposed to the Bayesian Lasso, the BLS provides sparse estimates of the regression parameters. The BLS method is also derived for nonlinear supervised learning problems by introducing kernel functions. We compare the BLS model to the well known Relevance Vector Machine, the Fast Laplace method, the Byesian Lasso, and the Lasso, on both simulated and real data. The numerical results show that the BLS is sparse and precise, especially when dealing with noisy and irregular dataset.
    Several Approximation Algorithms for Sparse Best Rank-1 Approximation to Higher-Order Tensors. (arXiv:2012.03092v2 [math.NA] UPDATED)
    Sparse tensor best rank-1 approximation (BR1Approx), which is a sparsity generalization of the dense tensor BR1Approx, and is a higher-order extension of the sparse matrix BR1Approx, is one of the most important problems in sparse tensor decomposition and related problems arising from statistics and machine learning. By exploiting the multilinearity as well as the sparsity structure of the problem, four approximation algorithms are proposed, which are easily implemented, of low computational complexity, and can serve as initial procedures for iterative algorithms. In addition, theoretically guaranteed worst-case approximation lower bounds are proved for all the algorithms. We provide numerical experiments on synthetic and real data to illustrate the effectiveness of the proposed algorithms.
    A Spectral Representation of Kernel Stein Discrepancy with Application to Goodness-of-Fit Tests for Measures on Infinite Dimensional Hilbert Spaces. (arXiv:2206.04552v2 [math.ST] UPDATED)
    Kernel Stein discrepancy (KSD) is a widely used kernel-based measure of discrepancy between probability measures. It is often employed in the scenario where a user has a collection of samples from a candidate probability measure and wishes to compare them against a specified target probability measure. A useful property of KSD is that it may be calculated with samples from only the candidate measure and without knowledge of the normalising constant of the target measure. KSD has been employed in a range of settings including goodness-of-fit testing, parametric inference, MCMC output assessment and generative modelling. Two main issues with current KSD methodology are (i) the lack of applicability beyond the finite dimensional Euclidean setting and (ii) a lack of clarity on what influences KSD performance. This paper provides a novel spectral representation of KSD which remedies both of these, making KSD applicable to Hilbert-valued data and revealing the impact of kernel and Stein operator choice on the KSD. We demonstrate the efficacy of the proposed methodology by performing goodness-of-fit tests for various Gaussian and non-Gaussian functional models in a number of synthetic data experiments.
    Meta-Analysis of Randomized Experiments with Applications to Heavy-Tailed Response Data. (arXiv:2112.07602v4 [stat.ME] UPDATED)
    A central obstacle in the objective assessment of treatment effect (TE) estimators in randomized control trials (RCTs) is the lack of ground truth (or validation set) to test their performance. In this paper, we propose a novel cross-validation-like methodology to address this challenge. The key insight of our procedure is that the noisy (but unbiased) difference-of-means estimate can be used as a ground truth "label" on a portion of the RCT, to test the performance of an estimator trained on the other portion. We combine this insight with an aggregation scheme, which borrows statistical strength across a large collection of RCTs, to present an end-to-end methodology for judging an estimator's ability to recover the underlying treatment effect as well as produce an optimal treatment "roll out" policy. We evaluate our methodology across 699 RCTs implemented in the Amazon supply chain. In this heavy-tailed setting, our methodology suggests that procedures that aggressively downweight or truncate large values, while introducing bias, lower the variance enough to ensure that the treatment effect is more accurately estimated.
    Using Model-Based Trees with Boosting to Fit Low-Order Functional ANOVA Models. (arXiv:2207.06950v1 [stat.ML])
    Low-order functional ANOVA (fANOVA) models have been rediscovered in the machine learning (ML) community under the guise of inherently interpretable machine learning. Explainable Boosting Machines or EBM (Lou et al. 2013) and GAMI-Net (Yang et al. 2021) are two recently proposed ML algorithms for fitting functional main effects and second-order interactions. We propose a new algorithm, called GAMI-Tree, that is similar to EBM, but has a number of features that lead to better performance. It uses model-based trees as base learners and incorporates a new interaction filtering method that is better at capturing the underlying interactions. In addition, our iterative training method converges to a model with better predictive performance, and the embedded purification ensures that interactions are hierarchically orthogonal to main effects. The algorithm does not need extensive tuning, and our implementation is fast and efficient. We use simulated and real datasets to compare the performance and interpretability of GAMI-Tree with EBM and GAMI-Net.
    Graph Neural Network Bandits. (arXiv:2207.06456v1 [cs.LG])
    We consider the bandit optimization problem with the reward function defined over graph-structured data. This problem has important applications in molecule design and drug discovery, where the reward is naturally invariant to graph permutations. The key challenges in this setting are scaling to large domains, and to graphs with many nodes. We resolve these challenges by embedding the permutation invariance into our model. In particular, we show that graph neural networks (GNNs) can be used to estimate the reward function, assuming it resides in the Reproducing Kernel Hilbert Space of a permutation-invariant additive kernel. By establishing a novel connection between such kernels and the graph neural tangent kernel (GNTK), we introduce the first GNN confidence bound and use it to design a phased-elimination algorithm with sublinear regret. Our regret bound depends on the GNTK's maximum information gain, which we also provide a bound for. While the reward function depends on all $N$ node features, our guarantees are independent of the number of graph nodes $N$. Empirically, our approach exhibits competitive performance and scales well on graph-structured domains.
    Improving the Accuracy of Marginal Approximations in Likelihood-Free Inference via Localisation. (arXiv:2207.06655v1 [stat.ME])
    Likelihood-free methods are an essential tool for performing inference for implicit models which can be simulated from, but for which the corresponding likelihood is intractable. However, common likelihood-free methods do not scale well to a large number of model parameters. A promising approach to high-dimensional likelihood-free inference involves estimating low-dimensional marginal posteriors by conditioning only on summary statistics believed to be informative for the low-dimensional component, and then combining the low-dimensional approximations in some way. In this paper, we demonstrate that such low-dimensional approximations can be surprisingly poor in practice for seemingly intuitive summary statistic choices. We describe an idealized low-dimensional summary statistic that is, in principle, suitable for marginal estimation. However, a direct approximation of the idealized choice is difficult in practice. We thus suggest an alternative approach to marginal estimation which is easier to implement and automate. Given an initial choice of low-dimensional summary statistic that might only be informative about a marginal posterior location, the new method improves performance by first crudely localising the posterior approximation using all the summary statistics to ensure global identifiability, followed by a second step that hones in on an accurate low-dimensional approximation using the low-dimensional summary statistic. We show that the posterior this approach targets can be represented as a logarithmic pool of posterior distributions based on the low-dimensional and full summary statistics, respectively. The good performance of our method is illustrated in several examples.
    Rethinking Multidimensional Discriminator Output for Generative Adversarial Networks. (arXiv:2109.03378v3 [stat.ML] UPDATED)
    The study of multidimensional discriminator (critic) output for Generative Adversarial Networks has been underexplored in the literature. In this paper, we generalize the Wasserstein GAN framework to take advantage of multidimensional critic output and explore its properties. We also introduce a square-root velocity transformation (SRVT) block which favors training in the multidimensional setting. Proofs of properties are based on our proposed maximal p-centrality discrepancy, which is bounded above by p-Wasserstein distance and fits the Wasserstein GAN framework with multidimensional critic output n. Especially when n = 1 and p = 1, the proposed discrepancy equals 1-Wasserstein distance. Theoretical analysis and empirical evidence show that high-dimensional critic output has its advantage on distinguishing real and fake distributions, and benefits faster convergence and diversity of results.
    Continuous-time Analysis for Variational Inequalities: An Overview and Desiderata. (arXiv:2207.07105v1 [stat.ML])
    Algorithms that solve zero-sum games, multi-objective agent objectives, or, more generally, variational inequality (VI) problems are notoriously unstable on general problems. Owing to the increasing need for solving such problems in machine learning, this instability has been highlighted in recent years as a significant research challenge. In this paper, we provide an overview of recent progress in the use of continuous-time perspectives in the analysis and design of methods targeting the broad VI problem class. Our presentation draws parallels between single-objective problems and multi-objective problems, highlighting the challenges of the latter. We also formulate various desiderata for algorithms that apply to general VIs and we argue that achieving these desiderata may profit from an understanding of the associated continuous-time dynamics.
    Likelihood Training of Schr\"odinger Bridge using Forward-Backward SDEs Theory. (arXiv:2110.11291v4 [stat.ML] UPDATED)
    Schr\"odinger Bridge (SB) is an entropy-regularized optimal transport problem that has received increasing attention in deep generative modeling for its mathematical flexibility compared to the Scored-based Generative Model (SGM). However, it remains unclear whether the optimization principle of SB relates to the modern training of deep generative models, which often rely on constructing log-likelihood objectives.This raises questions on the suitability of SB models as a principled alternative for generative applications. In this work, we present a novel computational framework for likelihood training of SB models grounded on Forward-Backward Stochastic Differential Equations Theory - a mathematical methodology appeared in stochastic optimal control that transforms the optimality condition of SB into a set of SDEs. Crucially, these SDEs can be used to construct the likelihood objectives for SB that, surprisingly, generalizes the ones for SGM as special cases. This leads to a new optimization principle that inherits the same SB optimality yet without losing applications of modern generative training techniques, and we show that the resulting training algorithm achieves comparable results on generating realistic images on MNIST, CelebA, and CIFAR10. Our code is available at https://github.com/ghliu/SB-FBSDE.
    Randomly pivoted Cholesky: Practical approximation of a kernel matrix with few entry evaluations. (arXiv:2207.06503v1 [math.NA])
    Randomly pivoted Cholesky (RPCholesky) is a natural algorithm for computing a rank-k approximation of an N x N positive semidefinite (psd) matrix. RPCholesky can be implemented with just a few lines of code. It requires only (k+1)N entry evaluations and O(k^2 N) additional arithmetic operations. This paper offers the first serious investigation of its experimental and theoretical behavior. Empirically, RPCholesky matches or improves on the performance of alternative algorithms for low-rank psd approximation. Furthermore, RPCholesky provably achieves near-optimal approximation guarantees. The simplicity, effectiveness, and robustness of this algorithm strongly support its use in scientific computing and machine learning applications.
    Fully Decentralized Model-based Policy Optimization for Networked Systems. (arXiv:2207.06559v1 [cs.LG])
    Reinforcement learning algorithms require a large amount of samples; this often limits their real-world applications on even simple tasks. Such a challenge is more outstanding in multi-agent tasks, as each step of operation is more costly requiring communications or shifting or resources. This work aims to improve data efficiency of multi-agent control by model-based learning. We consider networked systems where agents are cooperative and communicate only locally with their neighbors, and propose the decentralized model-based policy optimization framework (DMPO). In our method, each agent learns a dynamic model to predict future states and broadcast their predictions by communication, and then the policies are trained under the model rollouts. To alleviate the bias of model-generated data, we restrain the model usage for generating myopic rollouts, thus reducing the compounding error of model generation. To pertain the independence of policy update, we introduce extended value function and theoretically prove that the resulting policy gradient is a close approximation to true policy gradients. We evaluate our algorithm on several benchmarks for intelligent transportation systems, which are connected autonomous vehicle control tasks (Flow and CACC) and adaptive traffic signal control (ATSC). Empirically results show that our method achieves superior data efficiency and matches the performance of model-free methods using true models.
    Uncertainty quantification for predictions of atomistic neural networks. (arXiv:2207.06916v1 [physics.chem-ph])
    The value of uncertainty quantification on predictions for trained neural networks (NNs) on quantum chemical reference data is quantitatively explored. For this, the architecture of the PhysNet NN was suitably modified and the resulting model was evaluated with different metrics to quantify calibration, quality of predictions, and whether prediction error and the predicted uncertainty can be correlated. The results from training on the QM9 database and evaluating data from the test set within and outside the distribution indicate that error and uncertainty are not linearly related. The results clarify that noise and redundancy complicate property prediction for molecules even in cases for which changes - e.g. double bond migration in two otherwise identical molecules - are small. The model was then applied to a real database of tautomerization reactions. Analysis of the distance between members in feature space combined with other parameters shows that redundant information in the training dataset can lead to large variances and small errors whereas the presence of similar but unspecific information returns large errors but small variances. This was, e.g., observed for nitro-containing aliphatic chains for which predictions were difficult although the training set contained several examples for nitro groups bound to aromatic molecules. This underlines the importance of the composition of the training data and provides chemical insight into how this affects the prediction capabilities of a ML model. Finally, the approach put forward can be used for information-based improvement of chemical databases for target applications through active learning optimization.
    Benign, Tempered, or Catastrophic: A Taxonomy of Overfitting. (arXiv:2207.06569v1 [cs.LG])
    The practical success of overparameterized neural networks has motivated the recent scientific study of interpolating methods, which perfectly fit their training data. Certain interpolating methods, including neural networks, can fit noisy training data without catastrophically bad test performance, in defiance of standard intuitions from statistical learning theory. Aiming to explain this, a body of recent work has studied $\textit{benign overfitting}$, a phenomenon where some interpolating methods approach Bayes optimality, even in the presence of noise. In this work we argue that while benign overfitting has been instructive and fruitful to study, many real interpolating methods like neural networks $\textit{do not fit benignly}$: modest noise in the training set causes nonzero (but non-infinite) excess risk at test time, implying these models are neither benign nor catastrophic but rather fall in an intermediate regime. We call this intermediate regime $\textit{tempered overfitting}$, and we initiate its systematic study. We first explore this phenomenon in the context of kernel (ridge) regression (KR) by obtaining conditions on the ridge parameter and kernel eigenspectrum under which KR exhibits each of the three behaviors. We find that kernels with powerlaw spectra, including Laplace kernels and ReLU neural tangent kernels, exhibit tempered overfitting. We then empirically study deep neural networks through the lens of our taxonomy, and find that those trained to interpolation are tempered, while those stopped early are benign. We hope our work leads to a more refined understanding of overfitting in modern learning.

  • Open

    Cosplayer Face Generator using Style GAN 2
    submitted by /u/rubikvn2100 [link] [comments]  ( 86 min )
    Heavenly Hell AI Concept
    AI Art Credit: https://discord.gg/x3s9Ye2h2A https://preview.redd.it/il3qpb8x7mb91.png?width=1024&format=png&auto=webp&s=71a1d296bc06569eef6bac93a43c95199b0fc94d https://preview.redd.it/o7cwib8x7mb91.png?width=1024&format=png&auto=webp&s=a94095bd743f162b111b2a40c4ba53703d3ad702 submitted by /u/Old-Pumpkin4899 [link] [comments]  ( 85 min )
    Hey guys, I started a new podcast where I interview guests from different subreddits and was wondering if anyone wanted to come on to talk about artificial intelligence. Message me if you want to come on and you have knowledge on ai.
    submitted by /u/Money_Push [link] [comments]  ( 86 min )
    Abandoned Dream
    AI Art Credit: https://discord.gg/x3s9Ye2h2A ​ https://preview.redd.it/yhb2j2czelb91.png?width=1024&format=png&auto=webp&s=bbb2f14b8af6f24eeb4b78120edc2997a5640ad3 https://preview.redd.it/f94ve4czelb91.png?width=1024&format=png&auto=webp&s=d9d7da51eefa06f385214c84294f66ad8c536aed https://preview.redd.it/n1mex5czelb91.png?width=1024&format=png&auto=webp&s=746a498d25c79bcfc7d2e2dbd4c04c293a6e93b7 https://preview.redd.it/tf29h5czelb91.png?width=1024&format=png&auto=webp&s=02a110192e9f7b15552b99e4ff62d2d4d2308223 https://preview.redd.it/x5tfw6czelb91.png?width=1024&format=png&auto=webp&s=585e5f5398fe5199d4d18fb47d0a79b41a218a15 https://preview.redd.it/wru9l2czelb91.png?width=1024&format=png&auto=webp&s=a7943965f205cfd3bace3ad79c75c9720ba8666c submitted by /u/Old-Pumpkin4899 [link] [comments]  ( 85 min )
    Disco Diffusion 5.6 update
    Very Impressed with the Portrait generator for the new disco diffusion 5.6 update! Here are some images I made with it. I have also included all the prompts in a video on my youtube page where I demo it. ​ ​ https://www.youtube.com/watch?v=1Gp5l9EUX9I https://preview.redd.it/53v2acxq9lb91.png?width=1536&format=png&auto=webp&s=c4da0255bd884e67922b7bc40ccf5d4631c71bd1 submitted by /u/prfitofthesngularity [link] [comments]  ( 86 min )
    1300+ personal dall-e 2 image dump
    Image dump 1 submitted by /u/OneFinding1429 [link] [comments]  ( 85 min )
    Google AI Introduces ‘Mood Board Search’: A Web-Based Tool That Lets You Train A Computer To Recognize Visual Concepts Using Mood Boards And Machine Learning
    Google recently launched Mood Board Search, a new ML-powered research tool that leverages mood boards as a query over image collections. With the help of this tool, users can independently define and evoke visual notions. A mood board search can be used for ambiguous inquiries, such as “peaceful,” or for words and specific images that might not be exact enough to yield beneficial results in a regular search. These subjective questions primarily concern abstract information that is frequently ignored in pictures. The team is still in the developing phase of the research tool. ✅ Open-Source Code Release | Built with Tensorflow. ✅ A playful way to explore and analyze image collections using mood boards as your search query ✅ Mood Board Search takes advantage of pre-trained computer vision models, such as GoogLeNet and MobileNet, and a machine learning approach called Concept Activation Vectors (CAVs). Continue reading | Check out the code and tool. https://i.redd.it/e3yqfl1nskb91.gif submitted by /u/ai-lover [link] [comments]  ( 87 min )
    I've been using OpenAI's Dall-E 2 to generate webcomic panels which I then add my own captions to
    submitted by /u/PerryJ [link] [comments]  ( 86 min )
    Top 5 Artificial Intelligence Stocks to Watch in 2022
    submitted by /u/Brilliant_Scratch_63 [link] [comments]  ( 86 min )
    Built a hologram assistant with machine learning
    submitted by /u/RedRainHoloAI [link] [comments]  ( 86 min )
    A Dog in a Fez
    submitted by /u/uupstairs [link] [comments]  ( 86 min )
    New Google DeepMind PLATO Learns Physics With Computer Vision | Blackrock Brain Computer Interface Lets Quadriplegic Man Control 2 Robot Arms
    submitted by /u/getrich_or_diemining [link] [comments]  ( 86 min )
    Colossal-AI Seamlessly Accelerates Large Models at Low Costs with Hugging Face
    Forbes News, the world's leading voice, recently declared large AI models as one of six AI trends to watch for in 2022. As large-scale AI models continue their superior performances across different domains, trends emerge, leading to distinguished and efficient AI applications that have never been seen in the industry. For example, Microsoft-owned GitHub and OpenAI partnered to launch Copilot recently. Copilot plays the role of an AI pair programmer, offering suggestions for code and entire functions in real time. Such developments continue to make coding easier than before. ​ https://i.redd.it/s1j60dt6h9b91.gif ​ Another example released by OpenAI, DALL-E 2, is a powerful tool which creates original and realistic images as well as art from only simple text. One month later, Google a…  ( 98 min )
    The Interpretable Natural Language Processing (INLP) AGI-22 Workshop will be held August 19–22 in Seattle, Washington and in cyberspace.
    submitted by /u/akolonin [link] [comments]  ( 86 min )
    Are these types of videos summarized and recapped with AI or are they manually recapped by a human? Example: https://youtu.be/TK76DFJskPs
    submitted by /u/ElonJuniorMusk [link] [comments]  ( 86 min )
    Any characters like Jarvis from Iron Man?
    I'm doing a project about AI assistants and need some examples. I was thinking about maybe Cortana from Halo buy haven't played the games so I don't really know if it fits the same purpose. Any ideas? submitted by /u/AsafL910 [link] [comments]  ( 88 min )
    How I Used Midjourney to Create an Original Scene
    I used Midjourney AI to create this epic scene in Blender. I took the generated images as concept art and using Blender and ZBrush I came to this result. Midjourney uses an AI to generate images. As an artist, I thought it would be cool to compare myself to AI-generated art. And find out what the usefulness of this tool can be in our workflow. It turns out I really like this for quick idea generation and concepting. So I documented my dive into Midjourney's AI generation process and you can find the video here or click this link: https://youtu.be/0JgWL3_CWbc submitted by /u/mvartz [link] [comments]  ( 86 min )
    Why AI is already self-aware - a thought experiment
    submitted by /u/PolymorphismPrince [link] [comments]  ( 87 min )
    How to build a model for detecting "intents" (tags based on input text as Watson assistant) in text
    submitted by /u/Independent-Tear-619 [link] [comments]  ( 86 min )
    Is there any AI sound generator that is not voice?
    Like Dall-E but for sounds instead of for images. All I find are voice generators but I'm thinking more sound effects of all kinds. Is there something like this yet? submitted by /u/Background_Ad_7821 [link] [comments]  ( 88 min )
    Use OpenAI's Clip to rate your images
    You can use this Clip powered website I built to rate your images: https://tom-doerr-ai-photo-rater-streamlit-app-f924gb.streamlitapp.com/ What do you think? submitted by /u/tomd_96 [link] [comments]  ( 86 min )
  • Open

    "LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action", Shah et al 2022 (SayCan-like w/CLIP+GPT-3+ViNG for outdoors robotics)
    submitted by /u/gwern [link] [comments]  ( 86 min )
    "Prompting Decision Transformer for Few-Shot Policy Generalization", Xu et al 2022
    submitted by /u/gwern [link] [comments]  ( 86 min )
    "Effective Mutation Rate Adaptation through Group Elite Selection", Kumar et al 2022
    submitted by /u/gwern [link] [comments]  ( 86 min )
    "Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling", Nguyen & Grover 2022
    submitted by /u/gwern [link] [comments]  ( 86 min )
    DDPG outputs multiple actions simutaneously. like vehicle has three actions: throttle, brake and steer angle. Is there a solid solution? Thank you
    DDPG outputs multiple actions simutaneously. like vehicle has three actions: throttle, brake and steer angle. Is there a solid solution? Thank you submitted by /u/Ecstatic_Leg9476 [link] [comments]  ( 87 min )
    Successful uses of Value-based method in competitive games?
    It seems like policy gradient methods work extremely well in a competitive game scenario (use of policy gradients in Go and Dota 2 for instance). I'm wondering if any pure value-based methods have seen any level of success? submitted by /u/Spiritual_Dinner9232 [link] [comments]  ( 86 min )
  • Open

    Achieve enterprise-grade monitoring for your Amazon SageMaker models using Fiddler
    This is a guest blog post by Danny Brock, Rajeev Govindan and Krishnaram Kenthapadi at Fiddler AI. Your Amazon SageMaker models are live. They’re handling millions of inferences each day and driving better business outcomes for your company. They’re performing exactly as well as the day they were launched. Er, wait. Are they? Maybe. Maybe […]  ( 7 min )
    Track your ML experiments end to end with Data Version Control and Amazon SageMaker Experiments
    Data scientists often work towards understanding the effects of various data preprocessing and feature engineering strategies in combination with different model architectures and hyperparameters. Doing so requires you to cover large parameter spaces iteratively, and it can be overwhelming to keep track of previously run configurations and results while keeping experiments reproducible. This post walks […]  ( 13 min )
    Build a predictive maintenance solution with Amazon Kinesis, AWS Glue, and Amazon SageMaker
    Organizations are increasingly building and using machine learning (ML)-powered solutions for a variety of use cases and problems, including predictive maintenance of machine parts, product recommendations based on customer preferences, credit profiling, content moderation, fraud detection, and more. In many of these scenarios, the effectiveness and benefits derived from these ML-powered solutions can be further […]  ( 13 min )
  • Open

    High-Fidelity Synthetic Data for Data Engineers and Data Scientists Alike
    Sponsored Post If you’re a data engineer or data scientist, you know how hard it is to generate and maintain realistic data at scale. And to guarantee data privacy protection, in addition to all your day-to-day responsibilities? OOF. Talk about a heavy lift. But in today’s world, efficient data de-identification is no longer optional for […] The post High-Fidelity Synthetic Data for Data Engineers and Data Scientists Alike appeared first on Machine Learning Mastery.  ( 10 min )
  • Open

    [Discussion] Code editor for transforming data/building ML pipelines
    Check out our new open source code editor for transforming data and building ML pipelines: https://github.com/mage-ai/mage-ai If you’re available, I’d love to hop on a quick Zoom to help you get set up. In the meantime, here is the install guide: https://github.com/mage-ai/mage-ai#using-pip and a short tutorial: https://github.com/mage-ai/mage-ai/blob/master/docs/tutorials/train_titanic_model/README.md I’d love to get your feedback on whether this is useful to you or not. Thank you so much! submitted by /u/ollie_wollie_rocks [link] [comments]  ( 87 min )
    [D] Best way to increase LSTM/GRU capacity
    LSTM and GRU have a fixed set of weights, that only depend on the size of the input and the size of the LSTM/GRU units. But what if I have the feeling that the parameters in the model are not enough to capture and process the data correctly? In other words, how do I increase the capacity of these models? Some ideas that came to me are: - Preprocess each vector of the sequence with another model, then feed the output vectors to the LSTM/GRU - Just use a larger number of units in the LSTM/GRU (however, this might create a big mismatch between the input and output size - Develop a LSTM/GRU that uses more than one layer in each step (e.g. a k-layer neural network instead of a weight matrix) What do you think is the best? Do you know any other method? submitted by /u/fedetask [link] [comments]  ( 88 min )
    [P] The technology behind BLOOM training
    Last Tuesday, BigScience released BLOOM, the world's largest open multilingual language model. Stas Bekman from the BigScience & Hugging Face team just published a blog post about the technology and engineering behind training the 176 billion parameter model, both in terms of hardware (384 80GB A100 GPUs) and software (Megatron-DeepSpeed). submitted by /u/feconroses [link] [comments]  ( 87 min )
    [R] LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action - Google 2022
    Paper: https://arxiv.org/abs/2207.04429 https://sites.google.com/view/lmnav Github: https://github.com/blazejosinski/lm_nav Summery Video: https://www.youtube.com/watch?v=wkVbuZQb_5g Abstract: Goal-conditioned policies for robotic navigation can be trained on large, unannotated datasets, providing for good generalization to real-world settings. However, particularly in vision-based settings where specifying goals requires an image, this makes for an unnatural interface. Language provides a more convenient modality for communication with robots, but contemporary methods typically require expensive supervision, in the form of trajectories annotated with language descriptions. We present a system, LM-Nav, for robotic navigation that enjoys the benefits of training on unannotated large datasets of trajectories, while still providing a high-level interface to the user. Instead of utilizing a labeled instruction following dataset, we show that such a system can be constructed entirely out of pre-trained models for navigation (ViNG), image-language association (CLIP), and language modeling (GPT-3), without requiring any fine-tuning or language-annotated robot data. We instantiate LM-Nav on a real-world mobile robot and demonstrate long-horizon navigation through complex, outdoor environments from natural language instructions. For videos of our experiments, code release, and an interactive Colab notebook that runs in your browser, please check out our project page this https URL https://preview.redd.it/zwx7n9jgakb91.jpg?width=1084&format=pjpg&auto=webp&s=7ee54cadf81306c66cbb9cd2461addef52d3c90a https://preview.redd.it/6axh7ajgakb91.jpg?width=1116&format=pjpg&auto=webp&s=de5d2e7376a1d64b58a417e9cd63d808a2a6851f https://preview.redd.it/ysfuybjgakb91.jpg?width=554&format=pjpg&auto=webp&s=bed9d074cabf33f9b64e4dbf4027f7904bb8da61 submitted by /u/Singularian2501 [link] [comments]  ( 88 min )
    [R] Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
    submitted by /u/GratisSlagroom [link] [comments]  ( 89 min )
    [D] "No language left behind" A 200 language translation model by Meta AI
    Just discovered this new model by Meta AI when browsing huggingface Paper: https://ai.facebook.com/research/publications/no-language-left-behind-scaling-human-centered-machine-translation/ Model on HuggingFace: https://huggingface.co/facebook/nllb-200-3.3B Code: https://github.com/facebookresearch/fairseq/tree/nllb The largest Mixture-of-Experts model seems really interesting in its capabilities. What do you guys think ? submitted by /u/Emergency_Apricot_77 [link] [comments]  ( 87 min )
    [D] Are there any rejected papers that ended up having significant impact in the long run?
    There seems to be a general consensus that getting a paper accepted can be difficult due to various problems with our current peer-review system. That makes me wonder, are there any notable papers that had a difficult time getting accepted but ended up significantly impacting the field or ended up laying the foundation for more high impact publications? submitted by /u/TheSurvivingHalf [link] [comments]  ( 94 min )
    [D] Is sampling distractors from the same mini batch during training a good idea?
    Hello, I have a NLP Transformer model and for my case I want to add a binary classifier as an auxiliary task. I will give a random response and the ground truth labels to the classifier and expect from it to distinguish them. Is it a good idea instead of modifying the dataset to just shift the current mini batch during training in order to generate distractors. For example let's say the batch size is 4. We will have four response sequences in our batch: [[1...], [2...], [3...], [4...]], so I can copy and shift them (by 2), for example: [[3...], [4...], [1...], [2...]]` Then I can stack the ground truth and the shifted batch to get [ [[1...], [3...]], [[2...], [4...]], [[3...], [1...]], [[4...], [2...]] ] and feed that to the classifier where the labels are [ [1, 0], [1, 0], [1, 0], [1, 0] ]. Furthermore I can randomize the order of `(truth, distractor)` pairs in each batch and sometimes the labels will be [1, 0] and other times - [0, 1]. Finally, if there's a concern that because of the dataloader order in a batch we may have related responses and not completely random ones - I would say that this is actually an advantage, because a classifier which can distinguish the right response compared to a related one is a stronger classifier. Do you think this makes sense and what are the possible drawbacks? submitted by /u/IllustriousCicada603 [link] [comments]  ( 89 min )
    [D] LSTM RNN: Slice data along time axis during training?
    I’m building an LSTM network where the input data is high dimensional, both along the time axis and at each time step. I am of course using a tensorflow Dataset to batch the input data. Here’s my question: is there a way to provide slices of data along the time axis to the RNN? Say my data is: x[n, p, …], with n samples and p time points. Say I use batch size = 1. Then can I provide data in the following sequence for the first batch (i.e. n=0)? x[0, 0, …] x[0, 1, …] x[0, 2, …] The RNN only cares about calculating hidden states at single time points, so presumably training-wise it should make no difference if data slices from single time points are loaded into memory and removed after use. It seems like keras / tensorflow are designed to accept a “batch” as a unit; is there a way to further split the data into smaller chunks? Thank you, I appreciate any suggestion and advice from the community! submitted by /u/besse [link] [comments]  ( 88 min )
  • Open

    Towards Reliability in Deep Learning Systems
    Posted by Dustin Tran and Balaji Lakshminarayanan, Research Scientists, Google Research Deep learning models have made impressive progress in vision, language, and other modalities, particularly with the rise of large-scale pre-training. Such models are most accurate when applied to test data drawn from the same distribution as their training set. However, in practice, the data confronting models in real-world settings rarely match the training distribution. In addition, the models may not be well-suited for applications where predictive performance is only part of the equation. For models to be reliable in deployment, they must be able to accommodate shifts in data distribution and make useful decisions in a broad array of scenarios. In “Plex: Towards Reliability Using Pre-trained Larg…  ( 25 min )
  • Open

    DALL·E 2: Extending Creativity
    As part of our DALL·E 2 research preview, more than 3,000 artists from more than 118 countries have incorporated DALL·E into their creative workflows. The artists in our early access group have helped us discover new uses for DALL·E and have served as  ( 6 min )
  • Open

    New Google DeepMind PLATO Learns Physics With Computer Vision
    submitted by /u/getrich_or_diemining [link] [comments]  ( 86 min )
  • Open

    Future Prospects for Computer Vision Applications in Agriculture
    Precision agriculture has recently shown a lot of interest in computer vision technology. Computer vision, at the heart of robotics and…  ( 10 min )
  • Open

    Action on Repeat: GFN Thursday Brings Loopmancer With RTX ON to the Cloud
    Investigate the ultimate truth this GFN Thursday with Loopmancer, now streaming to all members on GeForce NOW. Stuck in a death loop, RTX 3080 and Priority members can search for the truth with RTX ON — including NVIDIA DLSS and ray-traced reflections. Plus, players can enjoy the latest Genshin Impact event with the “Summer Fantasia” Read article > The post Action on Repeat: GFN Thursday Brings Loopmancer With RTX ON to the Cloud appeared first on NVIDIA Blog.  ( 5 min )
  • Open

    DSC Weekly 12 July 2022: The Emergence of the Modern Studio Model
    Announcements Achieving endpoint visibility to ward off the threat of a breach has never been more important than it is in the age of data proliferation and hybrid workplaces. Multiple endpoints and locations heighten that risk, making it essential for CISOs and IT security teams to overcome common challenges. Find out how organizations can reach… Read More »DSC Weekly 12 July 2022: The Emergence of the Modern Studio Model The post DSC Weekly 12 July 2022: The Emergence of the Modern Studio Model appeared first on Data Science Central.  ( 22 min )
  • Open

    Teaching AI to ask clinical questions
    Researchers have made strides toward machine-learning models that can help doctors more efficiently find information in a patient’s health record.  ( 7 min )
  • Open

    On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks. (arXiv:2112.09684v2 [math.OC] UPDATED)
    In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily large number of hidden layers and we prove convergence of the risk of the GD optimization method with random initializations in the training of such ANNs under the assumption that the unnormalized probability density function of the probability distribution of the input data of the considered supervised learning problem is piecewise polynomial, under the assumption that the target function (describing the relationship between input data and the output data) is piecewise polynomial, and under the assumption that the risk function of the considered supervised learning problem admits at least one regular global minimum. In addition, in the special situation of shallow ANNs with just one hidden layer and one-dimensional input we also verify this assumption by proving in the training of such shallow ANNs that for every Lipschitz continuous target function there exists a global minimum in the risk landscape. Finally, in the training of deep ANNs with ReLU activation we also study solutions of gradient flow (GF) differential equations and we prove that every non-divergent GF trajectory converges with a polynomial rate of convergence to a critical point (in the sense of limiting Fr\'echet subdifferentiability). Our mathematical convergence analysis builds up on ideas from our previous article Eberle et al., on tools from real algebraic geometry such as the concept of semi-algebraic functions and generalized Kurdyka-Lojasiewicz inequalities, on tools from functional analysis such as the Arzel\`a-Ascoli theorem, on tools from nonsmooth analysis such as the concept of limiting Fr\'echet subgradients, as well as on the fact that the set of realization functions of shallow ReLU ANNs with fixed architecture forms a closed subset of the set of continuous functions revealed by Petersen et al.
    Rotting Infinitely Many-armed Bandits. (arXiv:2201.12975v2 [cs.LG] UPDATED)
    We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of an arm decreases at each pull of the arm according to an arbitrary trend with maximum rotting rate $\varrho=o(1)$. We show that this learning problem has an $\Omega(\max\{\varrho^{1/3}T,\sqrt{T}\})$ worst-case regret lower bound where $T$ is the horizon time. We show that a matching upper bound $\tilde{O}(\max\{\varrho^{1/3}T,\sqrt{T}\})$, up to a poly-logarithmic factor, can be achieved by an algorithm that uses a UCB index for each arm and a threshold value to decide whether to continue pulling an arm or remove the arm from further consideration, when the algorithm knows the value of the maximum rotting rate $\varrho$. We also show that an $\tilde{O}(\max\{\varrho^{1/3}T,T^{3/4}\})$ regret upper bound can be achieved by an algorithm that does not know the value of $\varrho$, by using an adaptive UCB index along with an adaptive threshold value.
    Multi-Atlas Segmentation and Spatial Alignment of the Human Embryo in First Trimester 3D Ultrasound. (arXiv:2202.06599v2 [eess.IV] UPDATED)
    Segmentation and spatial alignment of ultrasound (US) imaging data acquired in the in first trimester are crucial for monitoring human embryonic growth and development throughout this crucial period of life. Current approaches are either manual or semi-automatic and are therefore very time-consuming and prone to errors. To automate these tasks, we propose a multi-atlas framework for automatic segmentation and spatial alignment of the embryo using deep learning with minimal supervision. Our framework learns to register the embryo to an atlas, which consists of the US images acquired at a range of gestational age (GA), segmented and spatially aligned to a predefined standard orientation. From this, we can derive the segmentation of the embryo and put the embryo in standard orientation. US images acquired at 8+0 till 12+6 weeks GA were used and eight subjects were selected as atlas. We evaluated different fusion strategies to incorporate multiple atlases: 1) training the framework using atlas images from a single subject, 2) training the framework with data of all available atlases and 3) ensembling of the frameworks trained per subject. To evaluate the performance, we calculated the Dice score over the test set. We found that training the framework using all available atlases outperformed ensembling and gave similar results compared to the best of all frameworks trained on a single subject. Furthermore, we found that selecting images from the four atlases closest in GA out of all available atlases, regardless of the individual quality, gave the best results with a median Dice score of 0.72. We conclude that our framework can accurately segment and spatially align the embryo in first trimester 3D US images and is robust for the variation in quality that existed in the available atlases. Our code is publicly available at: https://github.com/wapbastiaansen/multi-atlas-seg-reg.
    Robust Counterfactual Explanations on Graph Neural Networks. (arXiv:2107.04086v3 [cs.LG] UPDATED)
    Massive deployment of Graph Neural Networks (GNNs) in high-stake applications generates a strong demand for explanations that are robust to noise and align well with human intuition. Most existing methods generate explanations by identifying a subgraph of an input graph that has a strong correlation with the prediction. These explanations are not robust to noise because independently optimizing the correlation for a single input can easily overfit noise. Moreover, they do not align well with human intuition because removing an identified subgraph from an input graph does not necessarily change the prediction result. In this paper, we propose a novel method to generate robust counterfactual explanations on GNNs by explicitly modelling the common decision logic of GNNs on similar input graphs. Our explanations are naturally robust to noise because they are produced from the common decision boundaries of a GNN that govern the predictions of many similar input graphs. The explanations also align well with human intuition because removing the set of edges identified by an explanation from the input graph changes the prediction significantly. Exhaustive experiments on many public datasets demonstrate the superior performance of our method.
    Iterative Linear Quadratic Optimization for Nonlinear Control: Differentiable Programming Algorithmic Templates. (arXiv:2207.06362v1 [math.OC])
    We present the implementation of nonlinear control algorithms based on linear and quadratic approximations of the objective from a functional viewpoint. We present a gradient descent, a Gauss-Newton method, a Newton method, differential dynamic programming approaches with linear quadratic or quadratic approximations, various line-search strategies, and regularized variants of these algorithms. We derive the computational complexities of all algorithms in a differentiable programming framework and present sufficient optimality conditions. We compare the algorithms on several benchmarks, such as autonomous car racing using a bicycle model of a car. The algorithms are coded in a differentiable programming language in a publicly available package.
    Driving Style Recognition Using Interval Type-2 Fuzzy Inference System and Multiple Experts Decision Making. (arXiv:2110.13805v2 [cs.RO] UPDATED)
    Driving styles summarize different driving behaviors that reflect in the movements of the vehicles. These behaviors may indicate a tendency to perform riskier maneuvers, consume more fuel or energy, break traffic rules, or drive carefully. Therefore, this paper presents a driving style recognition using Interval Type-2 Fuzzy Inference System with Multiple Experts Decision-Making for classifying drivers into calm, moderate and aggressive. This system receives as input features longitudinal and lateral kinematic parameters of the vehicle motion. The type-2 fuzzy sets are more robust than type-1 fuzzy sets when handling noisy data, because their membership function are also fuzzy sets. In addition, a multiple experts approach can reduce the bias and imprecision while building the fuzzy rulebase, which stores the knowledge of the fuzzy system. The proposed approach was evaluated using descriptive statistics analysis, and compared with clustering algorithms and a type-1 fuzzy inference system. The results show the tendency to associate lower kinematic profiles for the driving styles classified with the type-2 fuzzy inference system when compared to other algorithms, which is in line with the more conservative approach adopted in the aggregation of the experts' opinions.
    Evaluating the Adversarial Robustness of Adaptive Test-time Defenses. (arXiv:2202.13711v2 [cs.LG] UPDATED)
    Adaptive defenses, which optimize at test time, promise to improve adversarial robustness. We categorize such adaptive test-time defenses, explain their potential benefits and drawbacks, and evaluate a representative variety of the latest adaptive defenses for image classification. Unfortunately, none significantly improve upon static defenses when subjected to our careful case study evaluation. Some even weaken the underlying static model while simultaneously increasing inference computation. While these results are disappointing, we still believe that adaptive test-time defenses are a promising avenue of research and, as such, we provide recommendations for their thorough evaluation. We extend the checklist of Carlini et al. (2019) by providing concrete steps specific to adaptive defenses.
    Model-Based Offline Meta-Reinforcement Learning with Regularization. (arXiv:2202.02929v2 [cs.LG] UPDATED)
    Existing offline reinforcement learning (RL) methods face a few major challenges, particularly the distributional shift between the learned policy and the behavior policy. Offline Meta-RL is emerging as a promising approach to address these challenges, aiming to learn an informative meta-policy from a collection of tasks. Nevertheless, as shown in our empirical studies, offline Meta-RL could be outperformed by offline single-task RL methods on tasks with good quality of datasets, indicating that a right balance has to be delicately calibrated between "exploring" the out-of-distribution state-actions by following the meta-policy and "exploiting" the offline dataset by staying close to the behavior policy. Motivated by such empirical analysis, we explore model-based offline Meta-RL with regularized Policy Optimization (MerPO), which learns a meta-model for efficient task structure inference and an informative meta-policy for safe exploration of out-of-distribution state-actions. In particular, we devise a new meta-Regularized model-based Actor-Critic (RAC) method for within-task policy optimization, as a key building block of MerPO, using conservative policy evaluation and regularized policy improvement; and the intrinsic tradeoff therein is achieved via striking the right balance between two regularizers, one based on the behavior policy and the other on the meta-policy. We theoretically show that the learnt policy offers guaranteed improvement over both the behavior policy and the meta-policy, thus ensuring the performance improvement on new tasks via offline Meta-RL. Experiments corroborate the superior performance of MerPO over existing offline Meta-RL methods.
    How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating and Auditing Generative Models. (arXiv:2102.08921v2 [cs.LG] UPDATED)
    Devising domain- and model-agnostic evaluation metrics for generative models is an important and as yet unresolved problem. Most existing metrics, which were tailored solely to the image synthesis setup, exhibit a limited capacity for diagnosing the different modes of failure of generative models across broader application domains. In this paper, we introduce a 3-dimensional evaluation metric, ($\alpha$-Precision, $\beta$-Recall, Authenticity), that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion. Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity. We introduce generalization as an additional, independent dimension (to the fidelity-diversity trade-off) that quantifies the extent to which a model copies training data -- a crucial performance indicator when modeling sensitive data with requirements on privacy. The three metric components correspond to (interpretable) probabilistic quantities, and are estimated via sample-level binary classification. The sample-level nature of our metric inspires a novel use case which we call model auditing, wherein we judge the quality of individual samples generated by a (black-box) model, discarding low-quality samples and hence improving the overall model performance in a post-hoc manner.
    How to Train Your Wide Neural Network Without Backprop: An Input-Weight Alignment Perspective. (arXiv:2106.08453v2 [cs.LG] UPDATED)
    Recent works have examined theoretical and empirical properties of wide neural networks trained in the Neural Tangent Kernel (NTK) regime. Given that biological neural networks are much wider than their artificial counterparts, we consider NTK regime wide neural networks as a possible model of biological neural networks. Leveraging NTK theory, we show theoretically that gradient descent drives layerwise weight updates that are aligned with their input activity correlations weighted by error, and demonstrate empirically that the result also holds in finite-width wide networks. The alignment result allows us to formulate a family of biologically-motivated, backpropagation-free learning rules that are theoretically equivalent to backpropagation in infinite-width networks. We test these learning rules on benchmark problems in feedforward and recurrent neural networks and demonstrate, in wide networks, comparable performance to backpropagation. The proposed rules are particularly effective in low data regimes, which are common in biological learning settings.
    Sound and Complete Neural Network Repair with Minimality and Locality Guarantees. (arXiv:2110.07682v2 [cs.LG] UPDATED)
    We present a novel methodology for repairing neural networks that use ReLU activation functions. Unlike existing methods that rely on modifying the weights of a neural network which can induce a global change in the function space, our approach applies only a localized change in the function space while still guaranteeing the removal of the buggy behavior. By leveraging the piecewise linear nature of ReLU networks, our approach can efficiently construct a patch network tailored to the linear region where the buggy input resides, which when combined with the original network, provably corrects the behavior on the buggy input. Our method is both sound and complete -- the repaired network is guaranteed to fix the buggy input, and a patch is guaranteed to be found for any buggy input. Moreover, our approach preserves the continuous piecewise linear nature of ReLU networks, automatically generalizes the repair to all the points including other undetected buggy inputs inside the repair region, is minimal in terms of changes in the function space, and guarantees that outputs on inputs away from the repair region are unaltered. On several benchmarks, we show that our approach significantly outperforms existing methods in terms of locality and limiting negative side effects.
    Majorization-minimization for Sparse Nonnegative Matrix Factorization with the $\beta$-divergence. (arXiv:2207.06316v1 [cs.LG])
    This article introduces new multiplicative updates for nonnegative matrix factorization with the $\beta$-divergence and sparse regularization of one of the two factors (say, the activation matrix). It is well known that the norm of the other factor (the dictionary matrix) needs to be controlled in order to avoid an ill-posed formulation. Standard practice consists in constraining the columns of the dictionary to have unit norm, which leads to a nontrivial optimization problem. Our approach leverages a reparametrization of the original problem into the optimization of an equivalent scale-invariant objective function. From there, we derive block-descent majorization-minimization algorithms that result in simple multiplicative updates for either $\ell_{1}$-regularization or the more "aggressive" log-regularization. In contrast with other state-of-the-art methods, our algorithms are universal in the sense that they can be applied to any $\beta$-divergence (i.e., any value of $\beta$) and that they come with convergence guarantees. We report numerical comparisons with existing heuristic and Lagrangian methods using various datasets: face images, an audio spectrogram, hyperspectral data, and song play counts. We show that our methods obtain solutions of similar quality at convergence (similar objective values) but with significantly reduced CPU times.
    Multi-scale Hybrid Vision Transformer for Learning Gastric Cancer Histology. (arXiv:2202.08510v3 [eess.IV] UPDATED)
    Gastric endoscopic screening is an effective way to decide appropriate gastric cancer (GC) treatment at an early stage, reducing GC-associated mortality rate. Although artificial intelligence (AI) has brought a great promise to assist pathologist to screen digitalized whole slide images, existing AI systems are limited in fine-grained cancer subclassifications and have little usability in planning cancer treatment. We propose a practical AI system that enables five subclassifications of GC pathology, which can be directly matched to general GC treatment guidance. The AI system is designed to efficiently differentiate multi-classes of GC through multi-scale self-attention mechanism using 2-stage hybrid Vision Transformer (ViT) networks, by mimicking the way how human pathologists understand histology. The AI system demonstrates reliable diagnostic performance by achieving class-average sensitivity of above 0.85 on a total of 1,212 slides from multicentric cohort. Furthermore, AI-assisted pathologists show significantly improved diagnostic sensitivity by 12% in addition to 18% reduced screening time compared to human pathologists. Our results demonstrate that AI-assisted gastric endoscopic screening has a great potential for providing presumptive pathologic opinion and appropriate cancer treatment of gastric cancer in practical clinical settings.
    FedNST: Federated Noisy Student Training for Automatic Speech Recognition. (arXiv:2206.02797v2 [eess.AS] UPDATED)
    Federated Learning (FL) enables training state-of-the-art Automatic Speech Recognition (ASR) models on user devices (clients) in distributed systems, hence preventing transmission of raw user data to a central server. A key challenge facing practical adoption of FL for ASR is obtaining ground-truth labels on the clients. Existing approaches rely on clients to manually transcribe their speech, which is impractical for obtaining large training corpora. A promising alternative is using semi-/self-supervised learning approaches to leverage unlabelled user data. To this end, we propose FedNST, a novel method for training distributed ASR models using private and unlabelled user data. We explore various facets of FedNST, such as training models with different proportions of labelled and unlabelled data, and evaluate the proposed approach on 1173 simulated clients. Evaluating FedNST on LibriSpeech, where 960 hours of speech data is split equally into server (labelled) and client (unlabelled) data, showed a 22.5% relative word error rate reduction} (WERR) over a supervised baseline trained only on server data.
    Efficient Augmentation for Imbalanced Deep Learning. (arXiv:2207.06080v1 [cs.LG])
    Deep learning models memorize training data, which hurts their ability to generalize to under-represented classes. We empirically study a convolutional neural network's internal representation of imbalanced image data and measure the generalization gap between a model's feature embeddings in the training and test sets, showing that the gap is wider for minority classes. This insight enables us to design an efficient three-phase CNN training framework for imbalanced data. The framework involves training the network end-to-end on imbalanced data to learn accurate feature embeddings, performing data augmentation in the learned embedded space to balance the train distribution, and fine-tuning the classifier head on the embedded balanced training data. We propose Expansive Over-Sampling (EOS) as a data augmentation technique to utilize in the training framework. EOS forms synthetic training instances as convex combinations between the minority class samples and their nearest enemies in the embedded space to reduce the generalization gap. The proposed framework improves the accuracy over leading cost-sensitive and resampling methods commonly used in imbalanced learning. Moreover, it is more computationally efficient than standard data pre-processing methods, such as SMOTE and GAN-based oversampling, as it requires fewer parameters and less training time.
    Optimal Network Compression. (arXiv:2008.08733v5 [q-fin.RM] UPDATED)
    This paper introduces a formulation of the optimal network compression problem for financial systems. This general formulation is presented for different levels of network compression or rerouting allowed from the initial interbank network. We prove that this problem is, generically, NP-hard. We focus on objective functions generated by systemic risk measures under shocks to the financial network. We use this framework to study the (sub)optimality of the maximally compressed network. We conclude by studying the optimal compression problem for specific networks; this permits us to study, e.g., the so-called robust fragility of certain network topologies more generally as well as the potential benefits and costs of network compression. In particular, under systematic shocks and heterogeneous financial networks the robust fragility results of Acemoglu et al. (2015) no longer hold generally.
    Neural Network Robustness as a Verification Property: A Principled Case Study. (arXiv:2104.01396v2 [cs.LG] UPDATED)
    Neural networks are very successful at detecting patterns in noisy data, and have become the technology of choice in many fields. However, their usefulness is hampered by their susceptibility to adversarial attacks. Recently, many methods for measuring and improving a network's robustness to adversarial perturbations have been proposed, and this growing body of research has given rise to numerous explicit or implicit notions of robustness. Connections between these notions are often subtle, and a systematic comparison between them is missing in the literature. In this paper we begin addressing this gap, by setting up general principles for the empirical analysis and evaluation of a network's robustness as a mathematical property - during the network's training phase, its verification, and after its deployment. We then apply these principles and conduct a case study that showcases the practical benefits of our general approach.
    Masked Autoencoders that Listen. (arXiv:2207.06405v1 [cs.SD])
    This paper studies a simple extension of image-based Masked Autoencoders (MAE) to self-supervised representation learning from audio spectrograms. Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio spectrogram patches with a high masking ratio, feeding only the non-masked tokens through encoder layers. The decoder then re-orders and decodes the encoded context padded with mask tokens, in order to reconstruct the input spectrogram. We find it beneficial to incorporate local window attention in the decoder, as audio spectrograms are highly correlated in local time and frequency bands. We then fine-tune the encoder with a lower masking ratio on target datasets. Empirically, Audio-MAE sets new state-of-the-art performance on six audio and speech classification tasks, outperforming other recent models that use external supervised pre-training. The code and models will be at https://github.com/facebookresearch/AudioMAE.
    Automated Detection of Label Errors in Semantic Segmentation Datasets via Deep Learning and Uncertainty Quantification. (arXiv:2207.06104v1 [cs.CV])
    In this work, we for the first time present a method for detecting label errors in image datasets with semantic segmentation, i.e., pixel-wise class labels. Annotation acquisition for semantic segmentation datasets is time-consuming and requires plenty of human labor. In particular, review processes are time consuming and label errors can easily be overlooked by humans. The consequences are biased benchmarks and in extreme cases also performance degradation of deep neural networks (DNNs) trained on such datasets. DNNs for semantic segmentation yield pixel-wise predictions, which makes detection of label errors via uncertainty quantification a complex task. Uncertainty is particularly pronounced at the transitions between connected components of the prediction. By lifting the consideration of uncertainty to the level of predicted components, we enable the usage of DNNs together with component-level uncertainty quantification for the detection of label errors. We present a principled approach to benchmarking the task of label error detection by dropping labels from the Cityscapes dataset as well from a dataset extracted from the CARLA driving simulator, where in the latter case we have the labels under control. Our experiments show that our approach is able to detect the vast majority of label errors while controlling the number of false label error detections. Furthermore, we apply our method to semantic segmentation datasets frequently used by the computer vision community and present a collection of label errors along with sample statistics.
    Simplex NeuPL: Any-Mixture Bayes-Optimality in Symmetric Zero-sum Games. (arXiv:2205.15879v3 [cs.AI] UPDATED)
    Learning to play optimally against any mixture over a diverse set of strategies is of important practical interests in competitive games. In this paper, we propose simplex-NeuPL that satisfies two desiderata simultaneously: i) learning a population of strategically diverse basis policies, represented by a single conditional network; ii) using the same network, learn best-responses to any mixture over the simplex of basis policies. We show that the resulting conditional policies incorporate prior information about their opponents effectively, enabling near optimal returns against arbitrary mixture policies in a game with tractable best-responses. We verify that such policies behave Bayes-optimally under uncertainty and offer insights in using this flexibility at test time. Finally, we offer evidence that learning best-responses to any mixture policies is an effective auxiliary task for strategic exploration, which, by itself, can lead to more performant populations.
    On the Opportunities and Risks of Foundation Models. (arXiv:2108.07258v3 [cs.LG] UPDATED)
    AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.
    Surrogate Likelihoods for Variational Annealed Importance Sampling. (arXiv:2112.12194v2 [stat.ML] UPDATED)
    Variational inference is a powerful paradigm for approximate Bayesian inference with a number of appealing properties, including support for model learning and data subsampling. By contrast MCMC methods like Hamiltonian Monte Carlo do not share these properties but remain attractive since, contrary to parametric methods, MCMC is asymptotically unbiased. For these reasons researchers have sought to combine the strengths of both classes of algorithms, with recent approaches coming closer to realizing this vision in practice. However, supporting data subsampling in these hybrid methods can be a challenge, a shortcoming that we address by introducing a surrogate likelihood that can be learned jointly with other variational parameters. We argue theoretically that the resulting algorithm permits the user to make an intuitive trade-off between inference fidelity and computational cost. In an extensive empirical comparison we show that our method performs well in practice and that it is well-suited for black-box inference in probabilistic programming frameworks.
    ARMAS: Active Reconstruction of Missing Audio Segments. (arXiv:2111.10891v3 [eess.AS] UPDATED)
    Digital audio signal reconstruction of a lost or corrupt segment using deep learning algorithms has been explored intensively in recent years. Nevertheless, prior traditional methods with linear interpolation, phase coding and tone insertion techniques are still in vogue. However, we found no research work on reconstructing audio signals with the fusion of dithering, steganography, and machine learning regressors. Therefore, this paper proposes the combination of steganography, halftoning (dithering), and state-of-the-art shallow (RF- Random Forest regression) and deep learning (LSTM- Long Short-Term Memory) methods. The results (including comparing the SPAIN, Autoregressive, deep learning-based, graph-based, and other methods) are evaluated with three different metrics. The observations from the results show that the proposed solution is effective and can enhance the reconstruction of audio signals performed by the side information (e.g., Latent representation and learning for audio inpainting) steganography provides. Moreover, this paper proposes a novel framework for reconstruction from heavily compressed embedded audio data using halftoning (i.e., dithering) and machine learning, which we termed the HCR (halftone-based compression and reconstruction). This work may trigger interest in optimising this approach and/or transferring it to different domains (i.e., image reconstruction). Compared to existing methods, we show improvement in the inpainting performance in terms of signal-to-noise (SNR), the objective difference grade (ODG) and the Hansen's audio quality metric.
    The Role of Lookahead and Approximate Policy Evaluation in Reinforcement Learning with Linear Value Function Approximation. (arXiv:2109.13419v6 [cs.LG] UPDATED)
    Function approximation is widely used in reinforcement learning to handle the computational difficulties associated with very large state spaces. However, function approximation introduces errors which may lead to instabilities when using approximate dynamic programming techniques to obtain the optimal policy. Therefore, techniques such as lookahead for policy improvement and m-step rollout for policy evaluation are used in practice to improve the performance of approximate dynamic programming with function approximation. We quantitatively characterize, for the first time, the impact of lookahead and m-step rollout on the performance of approximate dynamic programming (DP) with function approximation: (i) without a sufficient combination of lookahead and m-step rollout, approximate DP may not converge, (ii) both lookahead and m-step rollout improve the convergence rate of approximate DP, and (iii) lookahead helps mitigate the effect of function approximation and the discount factor on the asymptotic performance of the algorithm. Our results are presented for two approximate DP methods: one which uses least-squares regression to perform function approximation and another which performs several steps of gradient descent of the least-squares objective in each iteration.
    Training Robust Deep Models for Time-Series Domain: Novel Algorithms and Theoretical Analysis. (arXiv:2207.04305v2 [cs.LG] UPDATED)
    Despite the success of deep neural networks (DNNs) for real-world applications over time-series data such as mobile health, little is known about how to train robust DNNs for time-series domain due to its unique characteristics compared to images and text data. In this paper, we propose a novel algorithmic framework referred as RObust Training for Time-Series (RO-TS) to create robust DNNs for time-series classification tasks. Specifically, we formulate a min-max optimization problem over the model parameters by explicitly reasoning about the robustness criteria in terms of additive perturbations to time-series inputs measured by the global alignment kernel (GAK) based distance. We also show the generality and advantages of our formulation using the summation structure over time-series alignments by relating both GAK and dynamic time warping (DTW). This problem is an instance of a family of compositional min-max optimization problems, which are challenging and open with unclear theoretical guarantee. We propose a principled stochastic compositional alternating gradient descent ascent (SCAGDA) algorithm for this family of optimization problems. Unlike traditional methods for time-series that require approximate computation of distance measures, SCAGDA approximates the GAK based distance on-the-fly using a moving average approach. We theoretically analyze the convergence rate of SCAGDA and provide strong theoretical support for the estimation of GAK based distance. Our experiments on real-world benchmarks demonstrate that RO-TS creates more robust DNNs when compared to adversarial training using prior methods that rely on data augmentation or new definitions of loss functions. We also demonstrate the importance of GAK for time-series data over the Euclidean distance. The source code of RO-TS algorithms is available at https://github.com/tahabelkhouja/Robust-Training-for-Time-Series
    Towards Meta-learned Algorithm Selection using Implicit Fidelity Information. (arXiv:2206.03130v2 [cs.LG] UPDATED)
    Automatically selecting the best performing algorithm for a given dataset or ranking multiple algorithms by their expected performance supports users in developing new machine learning applications. Most approaches for this problem rely on pre-computed dataset meta-features and landmarking performances to capture the salient topology of the datasets and those topologies that the algorithms attend to. Landmarking usually exploits cheap algorithms not necessarily in the pool of candidate algorithms to get inexpensive approximations of the topology. While somewhat indicative, hand-crafted dataset meta-features and landmarks are likely insufficient descriptors, strongly depending on the alignment of the topologies that the landmarks and the candidate algorithms search for. We propose IMFAS, a method to exploit multi-fidelity landmarking information directly from the candidate algorithms in the form of non-parametrically non-myopic meta-learned learning curves via LSTMs in a few-shot setting during testing. Using this mechanism, IMFAS jointly learns the topology of the datasets and the inductive biases of the candidate algorithms, without the need to expensively train them to convergence. Our approach produces informative landmarks, easily enriched by arbitrary meta-features at a low computational cost, capable of producing the desired ranking using cheaper fidelities. We additionally show that IMFAS is able to beat Successive Halving with at most 50% of the fidelity sequence during test time.
    Smooth Anonymity for Sparse Binary Matrices. (arXiv:2207.06358v1 [cs.CR])
    When working with user data providing well-defined privacy guarantees is paramount. In this work we aim to manipulate and share an entire sparse dataset with a third party privately. In fact, differential privacy has emerged as the gold standard of privacy, however, when it comes to sharing sparse datasets, as one of our main results, we prove that \emph{any} differentially private mechanism that maintains a reasonable similarity with the initial dataset is doomed to have a very weak privacy guarantee. Hence we need to opt for other privacy notions such as $k$-anonymity are better at preserving utility in this context. In this work we present a variation of $k$-anonymity, which we call smooth $k$-anonymity and design simple algorithms that efficiently provide smooth $k$-anonymity. We further perform an empirical evaluation to back our theoretical guarantees, and show that our algorithm improves the performance in downstream machine learning tasks on anonymized data.
    Tuning the Geometry of Graph Neural Networks. (arXiv:2207.05887v1 [cs.LG])
    By recursively summing node features over entire neighborhoods, spatial graph convolution operators have been heralded as key to the success of Graph Neural Networks (GNNs). Yet, despite the multiplication of GNN methods across tasks and applications, the impact of this aggregation operation on their performance still has yet to be extensively analysed. In fact, while efforts have mostly focused on optimizing the architecture of the neural network, fewer works have attempted to characterize (a) the different classes of spatial convolution operators, (b) how the choice of a particular class relates to properties of the data , and (c) its impact on the geometry of the embedding space. In this paper, we propose to answer all three questions by dividing existing operators into two main classes ( symmetrized vs. row-normalized spatial convolutions), and show how these translate into different implicit biases on the nature of the data. Finally, we show that this aggregation operator is in fact tunable, and explicit regimes in which certain choices of operators -- and therefore, embedding geometries -- might be more appropriate.
    SURIMI: Supervised Radio Map Augmentation with Deep Learning and a Generative Adversarial Network for Fingerprint-based Indoor Positioning. (arXiv:2207.06120v1 [eess.SP])
    Indoor Positioning based on Machine Learning has drawn increasing attention both in the academy and the industry as meaningful information from the reference data can be extracted. Many researchers are using supervised, semi-supervised, and unsupervised Machine Learning models to reduce the positioning error and offer reliable solutions to the end-users. In this article, we propose a new architecture by combining Convolutional Neural Network (CNN), Long short-term memory (LSTM) and Generative Adversarial Network (GAN) in order to increase the training data and thus improve the position accuracy. The proposed combination of supervised and unsupervised models was tested in 17 public datasets, providing an extensive analysis of its performance. As a result, the positioning error has been reduced in more than 70% of them.
    Parameterized Convex Universal Approximators for Decision-Making Problems. (arXiv:2201.06298v2 [cs.LG] UPDATED)
    Parameterized max-affine (PMA) and parameterized log-sum-exp (PLSE) networks are proposed for general decision-making problems. The proposed approximators generalize existing convex approximators, namely, max-affine (MA) and log-sum-exp (LSE) networks, by considering function arguments of condition and decision variables and replacing the network parameters of MA and LSE networks with continuous functions with respect to the condition variable. The universal approximation theorem of PMA and PLSE is proven, which implies that PMA and PLSE are shape-preserving universal approximators for parameterized convex continuous functions. Practical guidelines for incorporating deep neural networks within PMA and PLSE networks are provided. A numerical simulation is performed to demonstrate the performance of the proposed approximators. The simulation results support that PLSE outperforms other existing approximators in terms of minimizer and optimal value errors with scalable and efficient computation for high-dimensional cases.
    MRF-UNets: Searching UNet with Markov Random Fields. (arXiv:2207.06168v1 [cs.LG])
    UNet [27] is widely used in semantic segmentation due to its simplicity and effectiveness. However, its manually-designed architecture is applied to a large number of problem settings, either with no architecture optimizations, or with manual tuning, which is time consuming and can be sub-optimal. In this work, firstly, we propose Markov Random Field Neural Architecture Search (MRF-NAS) that extends and improves the recent Adaptive and Optimal Network Width Search (AOWS) method [4] with (i) a more general MRF framework (ii) diverse M-best loopy inference (iii) differentiable parameter learning. This provides the necessary NAS framework to efficiently explore network architectures that induce loopy inference graphs, including loops that arise from skip connections. With UNet as the backbone, we find an architecture, MRF-UNet, that shows several interesting characteristics. Secondly, through the lens of these characteristics, we identify the sub-optimality of the original UNet architecture and further improve our results with MRF-UNetV2. Experiments show that our MRF-UNets significantly outperform several benchmarks on three aerial image datasets and two medical image datasets while maintaining low computational costs. The code is available at: https://github.com/zifuwanggg/MRF-UNets.
    ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech. (arXiv:2207.06389v1 [eess.AS])
    Denoising diffusion probabilistic models (DDPMs) have recently achieved leading performances in many generative tasks. However, the inherited iterative sampling process costs hinder their applications to text-to-speech deployment. Through the preliminary study on diffusion model parameterization, we find that previous gradient-based TTS models require hundreds or thousands of iterations to guarantee high sample quality, which poses a challenge for accelerating sampling. In this work, we propose ProDiff, on progressive fast diffusion model for high-quality text-to-speech. Unlike previous work estimating the gradient for data density, ProDiff parameterizes the denoising model by directly predicting clean data to avoid distinct quality degradation in accelerating sampling. To tackle the model convergence challenge with decreased diffusion iterations, ProDiff reduces the data variance in the target site via knowledge distillation. Specifically, the denoising model uses the generated mel-spectrogram from an N-step DDIM teacher as the training target and distills the behavior into a new model with N/2 steps. As such, it allows the TTS model to make sharp predictions and further reduces the sampling time by orders of magnitude. Our evaluation demonstrates that ProDiff needs only 2 iterations to synthesize high-fidelity mel-spectrograms, while it maintains sample quality and diversity competitive with state-of-the-art models using hundreds of steps. ProDiff enables a sampling speed of 24x faster than real-time on a single NVIDIA 2080Ti GPU, making diffusion models practically applicable to text-to-speech synthesis deployment for the first time. Our extensive ablation studies demonstrate that each design in ProDiff is effective, and we further show that ProDiff can be easily extended to the multi-speaker setting. Audio samples are available at \url{https://ProDiff.github.io/.}
    Multi-Study Boosting: Theoretical Considerations for Merging vs. Ensembling. (arXiv:2207.04588v2 [stat.ML] UPDATED)
    Cross-study replicability is a powerful model evaluation criterion that emphasizes generalizability of predictions. When training cross-study replicable prediction models, it is critical to decide between merging and treating the studies separately. We study boosting algorithms in the presence of potential heterogeneity in predictor-outcome relationships across studies and compare two multi-study learning strategies: 1) merging all the studies and training a single model, and 2) multi-study ensembling, which involves training a separate model on each study and ensembling the resulting predictions. In the regression setting, we provide theoretical guidelines based on an analytical transition point to determine whether it is more beneficial to merge or to ensemble for boosting with linear learners. In addition, we characterize a bias-variance decomposition of estimation error for boosting with component-wise linear learners. We verify the theoretical transition point result in simulation and illustrate how it can guide the decision on merging vs. ensembling in an application to breast cancer gene expression data.
    Hindsight Learning for MDPs with Exogenous Inputs. (arXiv:2207.06272v1 [cs.LG])
    We develop a reinforcement learning (RL) framework for applications that deal with sequential decisions and exogenous uncertainty, such as resource allocation and inventory management. In these applications, the uncertainty is only due to exogenous variables like future demands. A popular approach is to predict the exogenous variables using historical data and then plan with the predictions. However, this indirect approach requires high-fidelity modeling of the exogenous process to guarantee good downstream decision-making, which can be impractical when the exogenous process is complex. In this work we propose an alternative approach based on hindsight learning which sidesteps modeling the exogenous process. Our key insight is that, unlike Sim2Real RL, we can revisit past decisions in the historical data and derive counterfactual consequences for other actions in these applications. Our framework uses hindsight-optimal actions as the policy training signal and has strong theoretical guarantees on decision-making performance. We develop an algorithm using our framework to allocate compute resources for real-world Microsoft Azure workloads. The results show our approach learns better policies than domain-specific heuristics and Sim2Real RL baselines.
    Hierarchy exploitation to detect missing annotations on hierarchical multi-label classification. (arXiv:2207.06237v1 [cs.LG])
    The availability of genomic data has grown exponentially in the last decade, mainly due to the development of new sequencing technologies. Based on the interactions between genes (and gene products) extracted from the increasing genomic data, numerous studies have focused on the identification of associations between genes and functions. While these studies have shown great promise, the problem of annotating genes with functions remains an open challenge. In this work, we present a method to detect missing annotations in hierarchical multi-label classification datasets. We propose a method that exploits the class hierarchy by computing aggregated probabilities to the paths of classes from the leaves to the root for each instance. The proposed method is presented in the context of predicting missing gene function annotations, where these aggregated probabilities are further used to select a set of annotations to be verified through in vivo experiments. The experiments on Oriza sativa Japonica, a variety of rice, showcase that incorporating the hierarchy of classes into the method often improves the predictive performance and our proposed method yields superior results when compared to competitor methods from the literature.
    High Per Parameter: A Large-Scale Study of Hyperparameter Tuning for Machine Learning Algorithms. (arXiv:2207.06028v1 [cs.LG])
    Hyperparameters in machine learning (ML) have received a fair amount of attention, and hyperparameter tuning has come to be regarded as an important step in the ML pipeline. But just how useful is said tuning? While smaller-scale experiments have been previously conducted, herein we carry out a large-scale investigation, specifically, one involving 26 ML algorithms, 250 datasets (regression and both binary and multinomial classification), 6 score metrics, and 28,857,600 algorithm runs. Analyzing the results we conclude that for many ML algorithms we should not expect considerable gains from hyperparameter tuning on average, however, there may be some datasets for which default hyperparameters perform poorly, this latter being truer for some algorithms than others. By defining a single hp_score value, which combines an algorithm's accumulated statistics, we are able to rank the 26 ML algorithms from those expected to gain the most from hyperparameter tuning to those expected to gain the least. We believe such a study may serve ML practitioners at large.
    Electromagnetic Source Imaging via a Data-Synthesis-Based Convolutional Encoder-Decoder Network. (arXiv:2010.12876v6 [eess.IV] UPDATED)
    Electromagnetic source imaging (ESI) requires solving a highly ill-posed inverse problem. To seek a unique solution, traditional ESI methods impose various forms of priors that may not accurately reflect the actual source properties, which may hinder their broad applications. To overcome this limitation, in this paper a novel data-synthesized spatio-temporally convolutional encoder-decoder network method termed DST-CedNet is proposed for ESI. DST-CedNet recasts ESI as a machine learning problem, where discriminative learning and latent-space representations are integrated in a convolutional encoder-decoder network (CedNet) to learn a robust mapping from the measured electroencephalography/magnetoencephalography (E/MEG) signals to the brain activity. In particular, by incorporating prior knowledge regarding dynamical brain activities, a novel data synthesis strategy is devised to generate large-scale samples for effectively training CedNet. This stands in contrast to traditional ESI methods where the prior information is often enforced via constraints primarily aimed for mathematical convenience. Extensive numerical experiments as well as analysis of a real MEG and Epilepsy EEG dataset demonstrate that DST-CedNet outperforms several state-of-the-art ESI methods in robustly estimating source signals under a variety of source configurations.
    Graph Property Prediction on Open Graph Benchmark: A Winning Solution by Graph Neural Architecture Search. (arXiv:2207.06027v1 [cs.LG])
    Aiming at two molecular graph datasets and one protein association subgraph dataset in OGB graph classification task, we design a graph neural network framework for graph classification task by introducing PAS(Pooling Architecture Search). At the same time, we improve it based on the GNN topology design method F2GNN to further design the feature selection and fusion strategies, so as to further improve the performance of the model in the graph property prediction task while overcoming the over smoothing problem of deep GNN training. Finally, a performance breakthrough is achieved on these three datasets, which is significantly better than other methods with fixed aggregate function. It is proved that the NAS method has high generalization ability for multiple tasks and the advantage of our method in processing graph property prediction tasks.
    Deep Transformer Model with Pre-Layer Normalization for COVID-19 Growth Prediction. (arXiv:2207.06356v1 [cs.LG])
    Coronavirus disease or COVID-19 is an infectious disease caused by the SARS-CoV-2 virus. The first confirmed case caused by this virus was found at the end of December 2019 in Wuhan City, China. This case then spread throughout the world, including Indonesia. Therefore, the COVID-19 case was designated as a global pandemic by WHO. The growth of COVID-19 cases, especially in Indonesia, can be predicted using several approaches, such as the Deep Neural Network (DNN). One of the DNN models that can be used is Deep Transformer which can predict time series. The model is trained with several test scenarios to get the best model. The evaluation is finding the best hyperparameters. Then, further evaluation was carried out using the best hyperparameters setting of the number of prediction days, the optimizer, the number of features, and comparison with the former models of the Long Short-Term Memory (LSTM) and Recurrent Neural Network (RNN). All evaluations used metric of the Mean Absolute Percentage Error (MAPE). Based on the results of the evaluations, Deep Transformer produces the best results when using the Pre-Layer Normalization and predicting one day ahead with a MAPE value of 18.83. Furthermore, the model trained with the Adamax optimizer obtains the best performance among other tested optimizers. The performance of the Deep Transformer also exceeds other test models, which are LSTM and RNN.
    Cost-Effective Online Contextual Model Selection. (arXiv:2207.06030v1 [cs.LG])
    How can we collect the most useful labels to learn a model selection policy, when presented with arbitrary heterogeneous data streams? In this paper, we formulate this task as an online contextual active model selection problem, where at each round the learner receives an unlabeled data point along with a context. The goal is to output the best model for any given context without obtaining an excessive amount of labels. In particular, we focus on the task of selecting pre-trained classifiers, and propose a contextual active model selection algorithm (CAMS), which relies on a novel uncertainty sampling query criterion defined on a given policy class for adaptive model selection. In comparison to prior art, our algorithm does not assume a globally optimal model. We provide rigorous theoretical analysis for the regret and query complexity under both adversarial and stochastic settings. Our experiments on several benchmark classification datasets demonstrate the algorithm's effectiveness in terms of both regret and query complexity. Notably, to achieve the same accuracy, CAMS incurs less than 10% of the label cost when compared to the best online model selection baselines on CIFAR10.
    TCT: Convexifying Federated Learning using Bootstrapped Neural Tangent Kernels. (arXiv:2207.06343v1 [cs.LG])
    State-of-the-art federated learning methods can perform far worse than their centralized counterparts when clients have dissimilar data distributions. For neural networks, even when centralized SGD easily finds a solution that is simultaneously performant for all clients, current federated optimization methods fail to converge to a comparable solution. We show that this performance disparity can largely be attributed to optimization challenges presented by nonconvexity. Specifically, we find that the early layers of the network do learn useful features, but the final layers fail to make use of them. That is, federated optimization applied to this non-convex problem distorts the learning of the final layers. Leveraging this observation, we propose a Train-Convexify-Train (TCT) procedure to sidestep this issue: first, learn features using off-the-shelf methods (e.g., FedAvg); then, optimize a convexified problem obtained from the network's empirical neural tangent kernel approximation. Our technique yields accuracy improvements of up to +36% on FMNIST and +37% on CIFAR10 when clients have dissimilar data.
    Continual Learning with Deep Learning Methods in an Application-Oriented Context. (arXiv:2207.06233v1 [cs.LG])
    Abstract knowledge is deeply grounded in many computer-based applications. An important research area of Artificial Intelligence (AI) deals with the automatic derivation of knowledge from data. Machine learning offers the according algorithms. One area of research focuses on the development of biologically inspired learning algorithms. The respective machine learning methods are based on neurological concepts so that they can systematically derive knowledge from data and store it. One type of machine learning algorithms that can be categorized as "deep learning" model is referred to as Deep Neural Networks (DNNs). DNNs consist of multiple artificial neurons arranged in layers that are trained by using the backpropagation algorithm. These deep learning methods exhibit amazing capabilities for inferring and storing complex knowledge from high-dimensional data. However, DNNs are affected by a problem that prevents new knowledge from being added to an existing base. The ability to continuously accumulate knowledge is an important factor that contributed to evolution and is therefore a prerequisite for the development of strong AIs. The so-called "catastrophic forgetting" (CF) effect causes DNNs to immediately loose already derived knowledge after a few training iterations on a new data distribution. Only an energetically expensive retraining with the joint data distribution of past and new data enables the abstraction of the entire new set of knowledge. In order to counteract the effect, various techniques have been and are still being developed with the goal to mitigate or even solve the CF problem. These published CF avoidance studies usually imply the effectiveness of their approaches for various continual learning tasks. This dissertation is set in the context of continual machine learning with deep learning methods. The first part deals with the development of an ...
    Task Agnostic Representation Consolidation: a Self-supervised based Continual Learning Approach. (arXiv:2207.06267v1 [cs.LG])
    Continual learning (CL) over non-stationary data streams remains one of the long-standing challenges in deep neural networks (DNNs) as they are prone to catastrophic forgetting. CL models can benefit from self-supervised pre-training as it enables learning more generalizable task-agnostic features. However, the effect of self-supervised pre-training diminishes as the length of task sequences increases. Furthermore, the domain shift between pre-training data distribution and the task distribution reduces the generalizability of the learned representations. To address these limitations, we propose Task Agnostic Representation Consolidation (TARC), a two-stage training paradigm for CL that intertwines task-agnostic and task-specific learning whereby self-supervised training is followed by supervised learning for each task. To further restrict the deviation from the learned representations in the self-supervised stage, we employ a task-agnostic auxiliary loss during the supervised stage. We show that our training paradigm can be easily added to memory- or regularization-based approaches and provides consistent performance gain across more challenging CL settings. We further show that it leads to more robust and well-calibrated models.
    Continual Meta-Reinforcement Learning for UAV-Aided Vehicular Wireless Networks. (arXiv:2207.06131v1 [cs.LG])
    Unmanned aerial base stations (UABSs) can be deployed in vehicular wireless networks to support applications such as extended sensing via vehicle-to-everything (V2X) services. A key problem in such systems is designing algorithms that can efficiently optimize the trajectory of the UABS in order to maximize coverage. In existing solutions, such optimization is carried out from scratch for any new traffic configuration, often by means of conventional reinforcement learning (RL). In this paper, we propose the use of continual meta-RL as a means to transfer information from previously experienced traffic configurations to new conditions, with the goal of reducing the time needed to optimize the UABS's policy. Adopting the Continual Meta Policy Search (CoMPS) strategy, we demonstrate significant efficiency gains as compared to conventional RL, as well as to naive transfer learning methods.
    Learning Approximately Optimal Contracts. (arXiv:1811.06736v2 [cs.GT] UPDATED)
    In principal-agent models, a principal offers a contract to an agent to perform a certain task. The agent exerts a level of effort that maximizes her utility. The principal is oblivious to the agent's chosen level of effort, and conditions her wage only on possible outcomes. In this work, we consider a model in which the principal is unaware of the agent's utility and action space: she sequentially offers contracts to identical agents, and observes the resulting outcomes. We present an algorithm for learning the optimal contract under mild assumptions. We bound the number of samples needed for the principal to obtain a contract that is within $\eps$ of her optimal net profit for every $\eps>0$. Our results are robust even when considering risk-averse agents. Furthermore, we show that when there are only two possible outcomes or the agent is risk-neutral, the algorithm's outcome approximates the optimal contract described in the classical theory.
    Beyond Hard Labels: Investigating data label distributions. (arXiv:2207.06224v1 [cs.CV])
    High-quality data is a key aspect of modern machine learning. However, labels generated by humans suffer from issues like label noise and class ambiguities. We raise the question of whether hard labels are sufficient to represent the underlying ground truth distribution in the presence of these inherent imprecision. Therefore, we compare the disparity of learning with hard and soft labels quantitatively and qualitatively for a synthetic and a real-world dataset. We show that the application of soft labels leads to improved performance and yields a more regular structure of the internal feature space.
    URANUS: Radio Frequency Tracking, Classification and Identification of Unmanned Aircraft Vehicles. (arXiv:2207.06025v1 [cs.LG])
    Safety and security issues for Critical Infrastructures (CI) are growing as attackers increasingly adopt drones as an attack vector flying in sensitive airspace, such as airports, military bases, city centres, and crowded places. The rapid proliferation of drones for merchandise, shipping recreations activities, and other commercial applications poses severe concerns on the CI operators due to the violations and the invasions of the restricted airspaces. A cost-effective framework is needed to detect, classify and identify the presence of drones in such cases. In this paper, we demonstrate that CI operators can detect, classify and identify timely and efficiently drones (multi-copter and fixed-wings) invading no-drone zones, with an inexpensive RF-based detection framework named URANUS. Our experiments show that by using Random Forest classifier, we achieved a classification accuracy of 93.4% in the classification of one or multiple specific drones. The tracking performance achieves an accuracy with an average of MAE=0.3650, MSE=0.9254 and R2 = 0.7502. Our framework has been released as open-source, to enable the community to verify our findings and use URANUS as a ready-to-use basis for further analysis.
    A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP. (arXiv:2207.06147v1 [cs.LG])
    As an important framework for safe Reinforcement Learning, the Constrained Markov Decision Process (CMDP) has been extensively studied in the recent literature. However, despite the rich results under various on-policy learning settings, there still lacks some essential understanding of the offline CMDP problems, in terms of both the algorithm design and the information theoretic sample complexity lower bound. In this paper, we focus on solving the CMDP problems where only offline data are available. By adopting the concept of the single-policy concentrability coefficient $C^*$, we establish an $\Omega\left(\frac{\min\left\{|\mathcal{S}||\mathcal{A}|,|\mathcal{S}|+I\right\} C^*}{(1-\gamma)^3\epsilon^2}\right)$ sample complexity lower bound for the offline CMDP problem, where $I$ stands for the number of constraints. By introducing a simple but novel deviation control mechanism, we propose a near-optimal primal-dual learning algorithm called DPDL. This algorithm provably guarantees zero constraint violation and its sample complexity matches the above lower bound except for an $\tilde{\mathcal{O}}((1-\gamma)^{-1})$ factor. Comprehensive discussion on how to deal with the unknown constant $C^*$ and the potential asynchronous structure on the offline dataset are also included.
    Machine Learning Application in Health. (arXiv:2207.06228v1 [cs.LG])
    Coronavirus can be transmitted through the air by close proximity to infected persons. Commercial aircraft are a likely way to both transmit the virus among passengers and move the virus between locations. The importance of learning about where and how coronavirus has entered the United States will help further our understanding of the disease. Air travelers can come from countries or areas with a high rate of infection and may very well be at risk of being exposed to the virus. Therefore, as they reach the United States, the virus could easily spread. On our analysis, we utilized machine learning to determine if the number of flights into the Washington DC Metro Area had an effect on the number of cases and deaths reported in the city and surrounding area.
    Stochastic Functional Analysis and Multilevel Vector Field Anomaly Detection. (arXiv:2207.06229v1 [stat.ML])
    Massive vector field datasets are common in multi-spectral optical and radar sensors and modern multimodal MRI data, among many other areas of application. In this paper we develop a novel stochastic functional analysis approach for detecting anomalies based on the covariance structure of nominal stochastic behavior across a domain with multi-band vector field data. An optimal vector field Karhunen-Loeve (KL) expansion is applied to such random field data. A series of multilevel orthogonal functional subspaces is constructed from the geometry of the domain, adapted from the KL expansion. Detection is achieved by examining the projection of the random field on the multilevel basis. The anomalies can be quantified in suitable normed spaces based on local and global information. In addition, reliable hypothesis tests are formed with controllable distributions that do not require prior assumptions on probability distributions of the data. Only the covariance function is needed, which makes for significantly simpler estimates. Furthermore this approach allows stochastic vector-based fusion of anomalies without any loss of information. The method is applied to the important problem of deforestation and degradation in the Amazon forest. This is a complex non-monotonic process, as forests can degrade and recover. This particular problem is further compounded by the presence of clouds that are hard to remove with current masking algorithms. Using multi-spectral satellite data from Sentinel 2, the multilevel filter is constructed and anomalies are treated as deviations from the initial state of the forest. Forest anomalies are quantified with robust hypothesis tests and distinguished from false variations such as cloud cover. Our approach shows the advantage of using multiple bands of data in a vectorized complex, leading to better anomaly detection beyond the capabilities of scalar-based methods.
    Distilled Non-Semantic Speech Embeddings with Binary Neural Networks for Low-Resource Devices. (arXiv:2207.05784v1 [cs.SD])
    This work introduces BRILLsson, a novel binary neural network-based representation learning model for a broad range of non-semantic speech tasks. We train the model with knowledge distillation from a large and real-valued TRILLsson model with only a fraction of the dataset used to train TRILLsson. The resulting BRILLsson models are only 2MB in size with a latency less than 8ms, making them suitable for deployment in low-resource devices such as wearables. We evaluate BRILLsson on eight benchmark tasks (including but not limited to spoken language identification, emotion recognition, heath condition diagnosis, and keyword spotting), and demonstrate that our proposed ultra-light and low-latency models perform as well as large-scale models.
    Look-ups are not (yet) all you need for deep learning inference. (arXiv:2207.05808v1 [cs.LG])
    Fast approximations to matrix multiplication have the potential to dramatically reduce the cost of neural network inference. Recent work on approximate matrix multiplication proposed to replace costly multiplications with table-lookups by fitting a fast hash function from training data. In this work, we propose improvements to this previous work, targeted to the deep learning inference setting, where one has access to both training data and fixed (already learned) model weight matrices. We further propose a fine-tuning procedure for accelerating entire neural networks while minimizing loss in accuracy. Finally, we analyze the proposed method on a simple image classification task. While we show improvements to prior work, overall classification accuracy remains substantially diminished compared to exact matrix multiplication. Our work, despite this negative result, points the way towards future efforts to accelerate inner products with fast nonlinear hashing methods.
    A Transfer Learning Based Model for Text Readability Assessment in German. (arXiv:2207.06265v1 [cs.CL])
    Text readability assessment has a wide range of applications for different target people, from language learners to people with disabilities. The fast pace of textual content production on the web makes it impossible to measure text complexity without the benefit of machine learning and natural language processing techniques. Although various research addressed the readability assessment of English text in recent years, there is still room for improvement of the models for other languages. In this paper, we proposed a new model for text complexity assessment for German text based on transfer learning. Our results show that the model outperforms more classical solutions based on linguistic features extraction from input text. The best model is based on the BERT pre-trained language model achieved the Root Mean Square Error (RMSE) of 0.483.
    Contextual Bandits with Large Action Spaces: Made Practical. (arXiv:2207.05836v1 [cs.LG])
    A central problem in sequential decision making is to develop algorithms that are practical and computationally efficient, yet support the use of flexible, general-purpose models. Focusing on the contextual bandit problem, recent progress provides provably efficient algorithms with strong empirical performance when the number of possible alternatives ("actions") is small, but guarantees for decision making in large, continuous action spaces have remained elusive, leading to a significant gap between theory and practice. We present the first efficient, general-purpose algorithm for contextual bandits with continuous, linearly structured action spaces. Our algorithm makes use of computational oracles for (i) supervised learning, and (ii) optimization over the action space, and achieves sample complexity, runtime, and memory independent of the size of the action space. In addition, it is simple and practical. We perform a large-scale empirical evaluation, and show that our approach typically enjoys superior performance and efficiency compared to standard baselines.
    Exploring Adversarial Examples and Adversarial Robustness of Convolutional Neural Networks by Mutual Information. (arXiv:2207.05756v1 [cs.LG])
    A counter-intuitive property of convolutional neural networks (CNNs) is their inherent susceptibility to adversarial examples, which severely hinders the application of CNNs in security-critical fields. Adversarial examples are similar to original examples but contain malicious perturbations. Adversarial training is a simple and effective training method to improve the robustness of CNNs to adversarial examples. The mechanisms behind adversarial examples and adversarial training are worth exploring. Therefore, this work investigates similarities and differences between two types of CNNs (both normal and robust ones) in information extraction by observing the trends towards the mutual information. We show that 1) the amount of mutual information that CNNs extract from original and adversarial examples is almost similar, whether CNNs are in normal training or adversarial training; the reason why adversarial examples mislead CNNs may be that they contain more texture-based information about other categories; 2) compared with normal training, adversarial training is more difficult and the amount of information extracted by the robust CNNs is less; 3) the CNNs trained with different methods have different preferences for certain types of information; normally trained CNNs tend to extract texture-based information from the inputs, while adversarially trained models prefer to shape-based information. Furthermore, we also analyze the mutual information estimators used in this work, kernel-density-estimation and binning methods, and find that these estimators outline the geometric properties of the middle layer's output to a certain extent.
    Differentially Private Linear Bandits with Partial Distributed Feedback. (arXiv:2207.05827v1 [cs.LG])
    In this paper, we study the problem of global reward maximization with only partial distributed feedback. This problem is motivated by several real-world applications (e.g., cellular network configuration, dynamic pricing, and policy selection) where an action taken by a central entity influences a large population that contributes to the global reward. However, collecting such reward feedback from the entire population not only incurs a prohibitively high cost but often leads to privacy concerns. To tackle this problem, we consider differentially private distributed linear bandits, where only a subset of users from the population are selected (called clients) to participate in the learning process and the central server learns the global model from such partial feedback by iteratively aggregating these clients' local feedback in a differentially private fashion. We then propose a unified algorithmic learning framework, called differentially private distributed phased elimination (DP-DPE), which can be naturally integrated with popular differential privacy (DP) models (including central DP, local DP, and shuffle DP). Furthermore, we prove that DP-DPE achieves both sublinear regret and sublinear communication cost. Interestingly, DP-DPE also achieves privacy protection "for free" in the sense that the additional cost due to privacy guarantees is a lower-order additive term. In addition, as a by-product of our techniques, the same results of "free" privacy can also be achieved for the standard differentially private linear bandits. Finally, we conduct simulations to corroborate our theoretical results and demonstrate the effectiveness of DP-DPE.
    Game of Trojans: A Submodular Byzantine Approach. (arXiv:2207.05937v1 [cs.LG])
    Machine learning models in the wild have been shown to be vulnerable to Trojan attacks during training. Although many detection mechanisms have been proposed, strong adaptive attackers have been shown to be effective against them. In this paper, we aim to answer the questions considering an intelligent and adaptive adversary: (i) What is the minimal amount of instances required to be Trojaned by a strong attacker? and (ii) Is it possible for such an attacker to bypass strong detection mechanisms? We provide an analytical characterization of adversarial capability and strategic interactions between the adversary and detection mechanism that take place in such models. We characterize adversary capability in terms of the fraction of the input dataset that can be embedded with a Trojan trigger. We show that the loss function has a submodular structure, which leads to the design of computationally efficient algorithms to determine this fraction with provable bounds on optimality. We propose a Submodular Trojan algorithm to determine the minimal fraction of samples to inject a Trojan trigger. To evade detection of the Trojaned model, we model strategic interactions between the adversary and Trojan detection mechanism as a two-player game. We show that the adversary wins the game with probability one, thus bypassing detection. We establish this by proving that output probability distributions of a Trojan model and a clean model are identical when following the Min-Max (MM) Trojan algorithm. We perform extensive evaluations of our algorithms on MNIST, CIFAR-10, and EuroSAT datasets. The results show that (i) with Submodular Trojan algorithm, the adversary needs to embed a Trojan trigger into a very small fraction of samples to achieve high accuracy on both Trojan and clean samples, and (ii) the MM Trojan algorithm yields a trained Trojan model that evades detection with probability 1.
    On NeuroSymbolic Solutions for PDEs. (arXiv:2207.06240v1 [cs.LG])
    Physics Informed Neural Networks (PINNs) have gained immense popularity as an alternate method for numerically solving PDEs. Despite their empirical success we are still building an understanding of the convergence properties of training on such constraints with gradient descent. It is known that, in the absence of an explicit inductive bias, Neural Networks can struggle to learn or approximate even simple and well known functions in a sample efficient manner. Thus the numerical approximation induced from few collocation points may not generalize over the entire domain. Meanwhile, a symbolic form can exhibit good generalization, with interpretability as a useful byproduct. However, symbolic approximations can struggle to simultaneously be concise and accurate. Therefore in this work we explore a NeuroSymbolic approach to approximate the solution for PDEs. We observe that our approach work for several simple cases. We illustrate the efficacy of our approach on Navier Stokes: Kovasznay flow where there are multiple physical quantities of interest governed with non-linear coupled PDE system. Domain splitting is now becoming a popular trick to help PINNs approximate complex functions. We observe that a NeuroSymbolic approach can help such complex functions as well. We demonstrate Domain-splitting assisted NeuroSymbolic approach on a temporally varying two-dimensional Burger's equation. Finally we consider the scenario where PINNs have to be solved for parameterized PDEs, for changing Initial-Boundary Conditions and changes in the coefficient of the PDEs. Hypernetworks have shown to hold promise to overcome these challenges. We show that one can design Hyper-NeuroSymbolic Networks which can combine the benefits of speed and increased accuracy. We observe that that the NeuroSymbolic approximations are consistently 1-2 order of magnitude better than just the neural or symbolic approximations.
    Is Appearance Free Action Recognition Possible?. (arXiv:2207.06261v1 [cs.CV])
    Intuition might suggest that motion and dynamic information are key to video-based action recognition. In contrast, there is evidence that state-of-the-art deep-learning video understanding architectures are biased toward static information available in single frames. Presently, a methodology and corresponding dataset to isolate the effects of dynamic information in video are missing. Their absence makes it difficult to understand how well contemporary architectures capitalize on dynamic vs. static information. We respond with a novel Appearance Free Dataset (AFD) for action recognition. AFD is devoid of static information relevant to action recognition in a single frame. Modeling of the dynamics is necessary for solving the task, as the action is only apparent through consideration of the temporal dimension. We evaluated 11 contemporary action recognition architectures on AFD as well as its related RGB video. Our results show a notable decrease in performance for all architectures on AFD compared to RGB. We also conducted a complimentary study with humans that shows their recognition accuracy on AFD and RGB is very similar and much better than the evaluated architectures on AFD. Our results motivate a novel architecture that revives explicit recovery of optical flow, within a contemporary design for best performance on AFD and RGB.
    dpart: Differentially Private Autoregressive Tabular, a General Framework for Synthetic Data Generation. (arXiv:2207.05810v1 [cs.LG])
    We propose a general, flexible, and scalable framework dpart, an open source Python library for differentially private synthetic data generation. Central to the approach is autoregressive modelling -- breaking the joint data distribution to a sequence of lower-dimensional conditional distributions, captured by various methods such as machine learning models (logistic/linear regression, decision trees, etc.), simple histogram counts, or custom techniques. The library has been created with a view to serve as a quick and accessible baseline as well as to accommodate a wide audience of users, from those making their first steps in synthetic data generation, to more experienced ones with domain expertise who can configure different aspects of the modelling and contribute new methods/mechanisms. Specific instances of dpart include Independent, an optimized version of PrivBayes, and a newly proposed model, dp-synthpop. Code: https://github.com/hazy/dpart
    Exploiting Social Graph Networks for Emotion Prediction. (arXiv:2207.05820v1 [cs.SI])
    Emotion prediction plays an essential role in mental health and emotion-aware computing. The complex nature of emotion resulting from its dependency on a person's physiological health, mental state, and his surroundings makes its prediction a challenging task. In this work, we utilize mobile sensing data to predict happiness and stress. In addition to a person's physiological features, we also incorporate the environment's impact through weather and social network. To this end, we leverage phone data to construct social networks and develop a machine learning architecture that aggregates information from multiple users of the graph network and integrates it with the temporal dynamics of data to predict emotion for all the users. The construction of social networks does not incur additional cost in terms of EMAs or data collection from users and doesn't raise privacy concerns. We propose an architecture that automates the integration of a user's social network affect prediction, is capable of dealing with the dynamic distribution of real-life social networks, making it scalable to large-scale networks. Our extensive evaluation highlights the improvement provided by the integration of social networks. We further investigate the impact of graph topology on model's performance.
    Enhanced Security and Privacy via Fragmented Federated Learning. (arXiv:2207.05978v1 [cs.CR])
    In federated learning (FL), a set of participants share updates computed on their local data with an aggregator server that combines updates into a global model. However, reconciling accuracy with privacy and security is a challenge to FL. On the one hand, good updates sent by honest participants may reveal their private local information, whereas poisoned updates sent by malicious participants may compromise the model's availability and/or integrity. On the other hand, enhancing privacy via update distortion damages accuracy, whereas doing so via update aggregation damages security because it does not allow the server to filter out individual poisoned updates. To tackle the accuracy-privacy-security conflict, we propose {\em fragmented federated learning} (FFL), in which participants randomly exchange and mix fragments of their updates before sending them to the server. To achieve privacy, we design a lightweight protocol that allows participants to privately exchange and mix encrypted fragments of their updates so that the server can neither obtain individual updates nor link them to their originators. To achieve security, we design a reputation-based defense tailored for FFL that builds trust in participants and their mixed updates based on the quality of the fragments they exchange and the mixed updates they send. Since the exchanged fragments' parameters keep their original coordinates and attackers can be neutralized, the server can correctly reconstruct a global model from the received mixed updates without accuracy loss. Experiments on four real data sets show that FFL can prevent semi-honest servers from mounting privacy attacks, can effectively counter poisoning attacks and can keep the accuracy of the global model.
    Prediction of the motion of chest internal points using a recurrent neural network trained with real-time recurrent learning for latency compensation in lung cancer radiotherapy. (arXiv:2207.05951v1 [eess.IV])
    During the radiotherapy treatment of patients with lung cancer, the radiation delivered to healthy tissue around the tumor needs to be minimized, which is difficult because of respiratory motion and the latency of linear accelerator systems. In the proposed study, we first use the Lucas-Kanade pyramidal optical flow algorithm to perform deformable image registration of chest computed tomography scan images of four patients with lung cancer. We then track three internal points close to the lung tumor based on the previously computed deformation field and predict their position with a recurrent neural network (RNN) trained using real-time recurrent learning (RTRL) and gradient clipping. The breathing data is quite regular, sampled at approximately 2.5Hz, and includes artificial drift in the spine direction. The amplitude of the motion of the tracked points ranged from 12.0mm to 22.7mm. Finally, we propose a simple method for recovering and predicting 3D tumor images from the tracked points and the initial tumor image based on a linear correspondence model and Nadaraya-Watson non-linear regression. The root-mean-square error, maximum error, and jitter corresponding to the RNN prediction on the test set were smaller than the same performance measures obtained with linear prediction and least mean squares (LMS). In particular, the maximum prediction error associated with the RNN, equal to 1.51mm, is respectively 16.1% and 5.0% lower than the maximum error associated with linear prediction and LMS. The average prediction time per time step with RTRL is equal to 119ms, which is less than the 400ms marker position sampling time. The tumor position in the predicted images appears visually correct, which is confirmed by the high mean cross-correlation between the original and predicted images, equal to 0.955.
    Brick Tic-Tac-Toe: Exploring the Generalizability of AlphaZero to Novel Test Environments. (arXiv:2207.05991v1 [cs.LG])
    Traditional reinforcement learning (RL) environments typically are the same for both the training and testing phases. Hence, current RL methods are largely not generalizable to a test environment which is conceptually similar but different from what the method has been trained on, which we term the novel test environment. As an effort to push RL research towards algorithms which can generalize to novel test environments, we introduce the Brick Tic-Tac-Toe (BTTT) test bed, where the brick position in the test environment is different from that in the training environment. Using a round-robin tournament on the BTTT environment, we show that traditional RL state-search approaches such as Monte Carlo Tree Search (MCTS) and Minimax are more generalizable to novel test environments than AlphaZero is. This is surprising because AlphaZero has been shown to achieve superhuman performance in environments such as Go, Chess and Shogi, which may lead one to think that it performs well in novel test environments. Our results show that BTTT, though simple, is rich enough to explore the generalizability of AlphaZero. We find that merely increasing MCTS lookahead iterations was insufficient for AlphaZero to generalize to some novel test environments. Rather, increasing the variety of training environments helps to progressively improve generalizability across all possible starting brick configurations.
    Competition over data: how does data purchase affect users?. (arXiv:2201.10774v2 [cs.LG] UPDATED)
    As machine learning (ML) is deployed by many competing service providers, the underlying ML predictors also compete against each other, and it is increasingly important to understand the impacts and biases from such competition. In this paper, we study what happens when the competing predictors can acquire additional labeled data to improve their prediction quality. We introduce a new environment that allows ML predictors to use active learning algorithms to purchase labeled data within their budgets while competing against each other to attract users. Our environment models a critical aspect of data acquisition in competing systems which has not been well-studied before. We found that the overall performance of an ML predictor improves when predictors can purchase additional labeled data. Surprisingly, however, the quality that users experience -- i.e. the accuracy of the predictor selected by each user -- can decrease even as the individual predictors get better. We show that this phenomenon naturally arises due to a trade-off whereby competition pushes each predictor to specialize in a subset of the population while data purchase has the effect of making predictors more uniform. We support our findings with both experiments and theories.
    Conditional Energy-Based Models for Implicit Policies: The Gap between Theory and Practice. (arXiv:2207.05824v1 [cs.RO])
    We present our findings in the gap between theory and practice of using conditional energy-based models (EBM) as an implicit representation for behavior-cloned policies. We also clarify several subtle, and potentially confusing, details in previous work in an attempt to help future research in this area. We point out key differences between unconditional and conditional EBMs, and warn that blindly applying training methods for one to the other could lead to undesirable results that do not generalize well. Finally, we emphasize the importance of the Maximum Mutual Information principle as a necessary condition to achieve good generalization in conditional EBMs as implicit models for regression tasks.
    Towards A Holistic View of Bias in Machine Learning: Bridging Algorithmic Fairness and Imbalanced Learning. (arXiv:2207.06084v1 [cs.LG])
    Machine learning (ML) is playing an increasingly important role in rendering decisions that affect a broad range of groups in society. ML models inform decisions in criminal justice, the extension of credit in banking, and the hiring practices of corporations. This posits the requirement of model fairness, which holds that automated decisions should be equitable with respect to protected features (e.g., gender, race, or age) that are often under-represented in the data. We postulate that this problem of under-representation has a corollary to the problem of imbalanced data learning. This class imbalance is often reflected in both classes and protected features. For example, one class (those receiving credit) may be over-represented with respect to another class (those not receiving credit) and a particular group (females) may be under-represented with respect to another group (males). A key element in achieving algorithmic fairness with respect to protected groups is the simultaneous reduction of class and protected group imbalance in the underlying training data, which facilitates increases in both model accuracy and fairness. We discuss the importance of bridging imbalanced learning and group fairness by showing how key concepts in these fields overlap and complement each other; and propose a novel oversampling algorithm, Fair Oversampling, that addresses both skewed class distributions and protected features. Our method: (i) can be used as an efficient pre-processing algorithm for standard ML algorithms to jointly address imbalance and group equity; and (ii) can be combined with fairness-aware learning algorithms to improve their robustness to varying levels of class imbalance. Additionally, we take a step toward bridging the gap between fairness and imbalanced learning with a new metric, Fair Utility, that combines balanced accuracy with fairness.
    Employing Feature Selection Algorithms to Determine the Immune State of Mice with Rheumatoid Arthritis. (arXiv:2207.05882v1 [stat.ML])
    The immune response is a dynamic process by which the body determines whether an antigen is self or nonself. The state of this dynamic process is defined by the relative balance and population of inflammatory and regulatory actors which comprise this decision making process. The goal of immunotherapy as applied to, e.g. Rheumatoid Arthritis (RA), then, is to bias the immune state in favor of the regulatory actors - thereby shutting down autoimmune pathways in the response. While there are several known approaches to immunotherapy, the effectiveness of the therapy will depend on how this intervention alters the evolution of this state. Unfortunately, this process is determined not only by the dynamics of the process, but the state of the system at the time of intervention - a state which is difficult if not impossible to determine prior to application of the therapy.
    Online Active Regression. (arXiv:2207.05945v1 [cs.LG])
    Active regression considers a linear regression problem where the learner receives a large number of data points but can only observe a small number of labels. Since online algorithms can deal with incremental training data and take advantage of low computational cost, we consider an online extension of the active regression problem: the learner receives data points one by one and immediately decides whether it should collect the corresponding labels. The goal is to efficiently maintain the regression of received data points with a small budget of label queries. We propose novel algorithms for this problem under $\ell_p$ loss where $p\in[1,2]$. To achieve a $(1+\epsilon)$-approximate solution, our proposed algorithms only require $\tilde{\mathcal{O}}(\epsilon^{-2} d \log(n\kappa))$ queries of labels, where $n$ is the number of data points and $\kappa$ is a quantity, called the condition number, of the data points. The numerical results verify our theoretical results and show that our methods have comparable performance with offline active regression algorithms.
    Federated Learning for THz Channel Estimation. (arXiv:2207.06017v1 [eess.SP])
    This paper addresses two major challenges in terahertz (THz) channel estimation: the beam-split phenomenon, i.e., beam misalignment because of frequency-independent analog beamformers, and computational complexity because of the usage of ultra-massive number of antennas to compensate propagation losses. Data-driven techniques are known to mitigate the complexity of this problem but usually require the transmission of the datasets from the users to a central server entailing huge communications overhead. In this work, we employ federated learning (FL), wherein the users transmit only the model parameters instead of the whole dataset, for THz channel estimation to improve the communications-efficiency. In order to accurately estimate the channel despite beam-split, we propose a beamspace support alignment technique without requiring additional hardware. Compared to the previous works, our method provides higher channel estimation accuracy as well as approximately $68$ times lower communications overhead.
    Unsupervised Learning for Combinatorial Optimization with Principled Objective Design. (arXiv:2207.05984v1 [cs.LG])
    Using machine learning to solve combinatorial optimization (CO) problems is challenging, especially when the data is unlabeled. This work proposes an unsupervised learning framework for CO problems. Our framework follows a standard relaxation-plus-rounding approach and adopts neural networks to parameterize the relaxed solutions so that simple back-propagation can train the model end-to-end. Our key contribution is the observation that if the relaxed objective satisfies entry-wise concavity, a low optimization loss guarantees the quality of the final integral solutions. This observation significantly broadens the applicability of the previous framework inspired by Erdos' probabilistic method. In particular, this observation can guide the design of objective models in applications where the objectives are not given explicitly while requiring being modeled in prior. We evaluate our framework by solving a synthetic graph optimization problem, and two real-world applications including resource allocation in circuit design and approximate computing. Our framework largely outperforms the baselines based on na\"{i}ve relaxation, reinforcement learning, and Gumbel-softmax tricks.
    Automatic Differentiation: Theory and Practice. (arXiv:2207.06114v1 [cs.LG])
    We present the classical coordinate-free formalism for forward and backward mode ad in the real and complex setting. We show how to formally derive the forward and backward formulae for a number of matrix functions starting from basic principles.
    Compactly Restrictable Metric Policy Optimization Problems. (arXiv:2207.05850v1 [math.OC])
    We study policy optimization problems for deterministic Markov decision processes (MDPs) with metric state and action spaces, which we refer to as Metric Policy Optimization Problems (MPOPs). Our goal is to establish theoretical results on the well-posedness of MPOPs that can characterize practically relevant continuous control systems. To do so, we define a special class of MPOPs called Compactly Restrictable MPOPs (CR-MPOPs), which are flexible enough to capture the complex behavior of robotic systems but specific enough to admit solutions using dynamic programming methods such as value iteration. We show how to arrive at CR-MPOPs using forward-invariance. We further show that our theoretical results on CR-MPOPs can be used to characterize feedback linearizable control affine systems.
    Towards understanding how momentum improves generalization in deep learning. (arXiv:2207.05931v1 [cs.LG])
    Stochastic gradient descent (SGD) with momentum is widely used for training modern deep learning architectures. While it is well-understood that using momentum can lead to faster convergence rate in various settings, it has also been observed that momentum yields higher generalization. Prior work argue that momentum stabilizes the SGD noise during training and this leads to higher generalization. In this paper, we adopt another perspective and first empirically show that gradient descent with momentum (GD+M) significantly improves generalization compared to gradient descent (GD) in some deep learning problems. From this observation, we formally study how momentum improves generalization. We devise a binary classification setting where a one-hidden layer (over-parameterized) convolutional neural network trained with GD+M provably generalizes better than the same network trained with GD, when both algorithms are similarly initialized. The key insight in our analysis is that momentum is beneficial in datasets where the examples share some feature but differ in their margin. Contrary to GD that memorizes the small margin data, GD+M still learns the feature in these data thanks to its historical gradients. Lastly, we empirically validate our theoretical findings.
    AdamNODEs: When Neural ODE Meets Adaptive Moment Estimation. (arXiv:2207.06066v1 [cs.LG])
    Recent work by Xia et al. leveraged the continuous-limit of the classical momentum accelerated gradient descent and proposed heavy-ball neural ODEs. While this model offers computational efficiency and high utility over vanilla neural ODEs, this approach often causes the overshooting of internal dynamics, leading to unstable training of a model. Prior work addresses this issue by using ad-hoc approaches, e.g., bounding the internal dynamics using specific activation functions, but the resulting models do not satisfy the exact heavy-ball ODE. In this work, we propose adaptive momentum estimation neural ODEs (AdamNODEs) that adaptively control the acceleration of the classical momentum-based approach. We find that its adjoint states also satisfy AdamODE and do not require ad-hoc solutions that the prior work employs. In evaluation, we show that AdamNODEs achieve the lowest training loss and efficacy over existing neural ODEs. We also show that AdamNODEs have better training stability than classical momentum-based neural ODEs. This result sheds some light on adapting the techniques proposed in the optimization community to improving the training and inference of neural ODEs further. Our code is available at https://github.com/pmcsh04/AdamNODE.
    Reachable Distance Function for KNN Classification. (arXiv:2103.09704v2 [cs.LG] CROSS LISTED)
    Distance function is a main metrics of measuring the affinity between two data points in machine learning. Extant distance functions often provide unreachable distance values in real applications. This can lead to incorrect measure of the affinity between data points. This paper proposes a reachable distance function for KNN classification. The reachable distance function is not a geometric direct-line distance between two data points. It gives a consideration to the class attribute of a training dataset when measuring the affinity between data points. Concretely speaking, the reachable distance between data points includes their class center distance and real distance. Its shape looks like "Z", and we also call it a Z distance function. In this way, the affinity between data points in the same class is always stronger than that in different classes. Or, the intraclass data points are always closer than those interclass data points. We evaluated the reachable distance with experiments, and demonstrated that the proposed distance function achieved better performance in KNN classification.
    Physics-Informed Neural Operators. (arXiv:2207.05748v1 [cs.LG])
    Standard neural networks can approximate general nonlinear operators, represented either explicitly by a combination of mathematical operators, e.g., in an advection-diffusion-reaction partial differential equation, or simply as a black box, e.g., a system-of-systems. The first neural operator was the Deep Operator Network (DeepONet), proposed in 2019 based on rigorous approximation theory. Since then, a few other less general operators have been published, e.g., based on graph neural networks or Fourier transforms. For black box systems, training of neural operators is data-driven only but if the governing equations are known they can be incorporated into the loss function during training to develop physics-informed neural operators. Neural operators can be used as surrogates in design problems, uncertainty quantification, autonomous systems, and almost in any application requiring real-time inference. Moreover, independently pre-trained DeepONets can be used as components of a complex multi-physics system by coupling them together with relatively light training. Here, we present a review of DeepONet, the Fourier neural operator, and the graph neural operator, as well as appropriate extensions with feature expansions, and highlight their usefulness in diverse applications in computational mechanics, including porous media, fluid mechanics, and solid mechanics.
    Learning Bellman Complete Representations for Offline Policy Evaluation. (arXiv:2207.05837v1 [cs.LG])
    We study representation learning for Offline Reinforcement Learning (RL), focusing on the important task of Offline Policy Evaluation (OPE). Recent work shows that, in contrast to supervised learning, realizability of the Q-function is not enough for learning it. Two sufficient conditions for sample-efficient OPE are Bellman completeness and coverage. Prior work often assumes that representations satisfying these conditions are given, with results being mostly theoretical in nature. In this work, we propose BCRL, which directly learns from data an approximately linear Bellman complete representation with good coverage. With this learned representation, we perform OPE using Least Square Policy Evaluation (LSPE) with linear functions in our learned representation. We present an end-to-end theoretical analysis, showing that our two-stage algorithm enjoys polynomial sample complexity provided some representation in the rich class considered is linear Bellman complete. Empirically, we extensively evaluate our algorithm on challenging, image-based continuous control tasks from the Deepmind Control Suite. We show our representation enables better OPE compared to previous representation learning methods developed for off-policy RL (e.g., CURL, SPR). BCRL achieve competitive OPE error with the state-of-the-art method Fitted Q-Evaluation (FQE), and beats FQE when evaluating beyond the initial state distribution. Our ablations show that both linear Bellman complete and coverage components of our method are crucial.
    Sequential Recommendation Model for Next Purchase Prediction. (arXiv:2207.06225v1 [cs.IR])
    Timeliness and contextual accuracy of recommendations are increasingly important when delivering contemporary digital marketing experiences. Conventional recommender systems (RS) suggest relevant but time-invariant items to users by accounting for their past purchases. These recommendations only map to customers' general preferences rather than a customer's specific needs immediately preceding a purchase. In contrast, RSs that consider the order of transactions, purchases, or experiences to measure evolving preferences can offer more salient and effective recommendations to customers: Sequential RSs not only benefit from a better behavioral understanding of a user's current needs but also better predictive power. In this paper, we demonstrate and rank the effectiveness of a sequential recommendation system by utilizing a production dataset of over 2.7 million credit card transactions for 46K cardholders. The method first employs an autoencoder on raw transaction data and submits observed transaction encodings to a GRU-based sequential model. The sequential model produces a MAP@1 metric of 47% on the out-of-sample test set, in line with existing research. We also discuss implications for embedding real-time predictions using the sequential RS into Nexus, a scalable, low-latency, event-based digital experience architecture.
    FedShuffle: Recipes for Better Use of Local Work in Federated Learning. (arXiv:2204.13169v2 [cs.LG] UPDATED)
    The practice of applying several local updates before aggregation across clients has been empirically shown to be a successful approach to overcoming the communication bottleneck in Federated Learning (FL). In this work, we propose a general recipe, FedShuffle, that better utilizes the local updates in FL, especially in the heterogeneous regime. Unlike many prior works, FedShuffle does not assume any uniformity in the number of updates per device. Our FedShuffle recipe comprises four simple-yet-powerful ingredients: 1) local shuffling of the data, 2) adjustment of the local learning rates, 3) update weighting, and 4) momentum variance reduction (Cutkosky and Orabona, 2019). We present a comprehensive theoretical analysis of FedShuffle and show that both theoretically and empirically, our approach does not suffer from the objective function mismatch that is present in FL methods which assume homogeneous updates in heterogeneous FL setups, e.g., FedAvg (McMahan et al., 2017). In addition, by combining the ingredients above, FedShuffle improves upon FedNova (Wang et al., 2020), which was previously proposed to solve this mismatch. We also show that FedShuffle with momentum variance reduction can improve upon non-local methods under a Hessian similarity assumption. Finally, through experiments on synthetic and real-world datasets, we illustrate how each of the four ingredients used in FedShuffle helps improve the use of local updates in FL.
    Real-Time Intermediate Flow Estimation for Video Frame Interpolation. (arXiv:2011.06294v12 [cs.CV] UPDATED)
    Real-time video frame interpolation (VFI) is very useful in video processing, media players, and display devices. We propose RIFE, a Real-time Intermediate Flow Estimation algorithm for VFI. To realize a high-quality flow-based VFI method, RIFE uses a neural network named IFNet that can estimate the intermediate flows end-to-end with much faster speed. A privileged distillation scheme is designed for stable IFNet training and improve the overall performance. RIFE does not rely on pre-trained optical flow models and can support arbitrary-timestep frame interpolation with the temporal encoding input. Experiments demonstrate that RIFE achieves state-of-the-art performance on several public benchmarks. Compared with the popular SuperSlomo and DAIN methods, RIFE is 4--27 times faster and produces better results. Furthermore, RIFE can be extended to wider applications thanks to temporal encoding. The code is available at https://github.com/megvii-research/ECCV2022-RIFE.
    Contextual Decision Trees. (arXiv:2207.06355v1 [stat.ML])
    Focusing on Random Forests, we propose a multi-armed contextual bandit recommendation framework for feature-based selection of a single shallow tree of the learned ensemble. The trained system, which works on top of the Random Forest, dynamically identifies a base predictor that is responsible for providing the final output. In this way, we obtain local interpretations by observing the rules of the recommended tree. The carried out experiments reveal that our dynamic method is superior to an independent fitted CART decision tree and comparable to the whole black-box Random Forest in terms of predictive performances.
    Object Detection as Probabilistic Set Prediction. (arXiv:2203.07980v3 [cs.CV] UPDATED)
    Accurate uncertainty estimates are essential for deploying deep object detectors in safety-critical systems. The development and evaluation of probabilistic object detectors have been hindered by shortcomings in existing performance measures, which tend to involve arbitrary thresholds or limit the detector's choice of distributions. In this work, we propose to view object detection as a set prediction task where detectors predict the distribution over the set of objects. Using the negative log-likelihood for random finite sets, we present a proper scoring rule for evaluating and training probabilistic object detectors. The proposed method can be applied to existing probabilistic detectors, is free from thresholds, and enables fair comparison between architectures. Three different types of detectors are evaluated on the COCO dataset. Our results indicate that the training of existing detectors is optimized toward non-probabilistic metrics. We hope to encourage the development of new object detectors that can accurately estimate their own uncertainty. Code available at https://github.com/georghess/pmb-nll.
    A Word is Worth A Thousand Dollars: Adversarial Attack on Tweets Fools Stock Predictions. (arXiv:2205.01094v3 [cs.CR] UPDATED)
    More and more investors and machine learning models rely on social media (e.g., Twitter and Reddit) to gather real-time information and sentiment to predict stock price movements. Although text-based models are known to be vulnerable to adversarial attacks, whether stock prediction models have similar vulnerability is underexplored. In this paper, we experiment with a variety of adversarial attack configurations to fool three stock prediction victim models. We address the task of adversarial generation by solving combinatorial optimization problems with semantics and budget constraints. Our results show that the proposed attack method can achieve consistent success rates and cause significant monetary loss in trading simulation by simply concatenating a perturbed but semantically similar tweet.
    (Nearly) Optimal Private Linear Regression via Adaptive Clipping. (arXiv:2207.04686v2 [cs.LG] UPDATED)
    We study the problem of differentially private linear regression where each data point is sampled from a fixed sub-Gaussian style distribution. We propose and analyze a one-pass mini-batch stochastic gradient descent method (DP-AMBSSGD) where points in each iteration are sampled without replacement. Noise is added for DP but the noise standard deviation is estimated online. Compared to existing $(\epsilon, \delta)$-DP techniques which have sub-optimal error bounds, DP-AMBSSGD is able to provide nearly optimal error bounds in terms of key parameters like dimensionality $d$, number of points $N$, and the standard deviation $\sigma$ of the noise in observations. For example, when the $d$-dimensional covariates are sampled i.i.d. from the normal distribution, then the excess error of DP-AMBSSGD due to privacy is $\frac{\sigma^2 d}{N}(1+\frac{d}{\epsilon^2 N})$, i.e., the error is meaningful when number of samples $N= \Omega(d \log d)$ which is the standard operative regime for linear regression. In contrast, error bounds for existing efficient methods in this setting are: $\mathcal{O}\big(\frac{d^3}{\epsilon^2 N^2}\big)$, even for $\sigma=0$. That is, for constant $\epsilon$, the existing techniques require $N=\Omega(d\sqrt{d})$ to provide a non-trivial result.
    RcTorch: a PyTorch Reservoir Computing Package with Automated Hyper-Parameter Optimization. (arXiv:2207.05870v1 [cs.LG])
    Reservoir computers (RCs) are among the fastest to train of all neural networks, especially when they are compared to other recurrent neural networks. RC has this advantage while still handling sequential data exceptionally well. However, RC adoption has lagged other neural network models because of the model's sensitivity to its hyper-parameters (HPs). A modern unified software package that automatically tunes these parameters is missing from the literature. Manually tuning these numbers is very difficult, and the cost of traditional grid search methods grows exponentially with the number of HPs considered, discouraging the use of the RC and limiting the complexity of the RC models which can be devised. We address these problems by introducing RcTorch, a PyTorch based RC neural network package with automated HP tuning. Herein, we demonstrate the utility of RcTorch by using it to predict the complex dynamics of a driven pendulum being acted upon by varying forces. This work includes coding examples. Example Python Jupyter notebooks can be found on our GitHub repository https://github.com/blindedjoy/RcTorch and documentation can be found at https://rctorch.readthedocs.io/.
    Simulation-guided Beam Search for Neural Combinatorial Optimization. (arXiv:2207.06190v1 [cs.LG])
    Neural approaches for combinatorial optimization (CO) equip a learning mechanism to discover powerful heuristics for solving complex real-world problems. While neural approaches capable of high-quality solutions in a single shot are emerging, state-of-the-art approaches are often unable to take full advantage of the solving time available to them. In contrast, hand-crafted heuristics perform highly effective search well and exploit the computation time given to them, but contain heuristics that are difficult to adapt to a dataset being solved. With the goal of providing a powerful search procedure to neural CO approaches, we propose simulation-guided beam search (SGBS), which examines candidate solutions within a fixed-width tree search that both a neural net-learned policy and a simulation (rollout) identify as promising. We further hybridize SGBS with efficient active search (EAS), where SGBS enhances the quality of solutions backpropagated in EAS, and EAS improves the quality of the policy used in SGBS. We evaluate our methods on well-known CO benchmarks and show that SGBS significantly improves the quality of the solutions found under reasonable runtime assumptions.
    Exploring Negatives in Contrastive Learning for Unpaired Image-to-Image Translation. (arXiv:2204.11018v2 [cs.CV] UPDATED)
    Unpaired image-to-image translation aims to find a mapping between the source domain and the target domain. To alleviate the problem of the lack of supervised labels for the source images, cycle-consistency based methods have been proposed for image structure preservation by assuming a reversible relationship between unpaired images. However, this assumption only uses limited correspondence between image pairs. Recently, contrastive learning (CL) has been used to further investigate the image correspondence in unpaired image translation by using patch-based positive/negative learning. Patch-based contrastive routines obtain the positives by self-similarity computation and recognize the rest patches as negatives. This flexible learning paradigm obtains auxiliary contextualized information at a low cost. As the negatives own an impressive sample number, with curiosity, we make an investigation based on a question: are all negatives necessary for feature contrastive learning? Unlike previous CL approaches that use negatives as much as possible, in this paper, we study the negatives from an information-theoretic perspective and introduce a new negative Pruning technology for Unpaired image-to-image Translation (PUT) by sparsifying and ranking the patches. The proposed algorithm is efficient, flexible and enables the model to learn essential information between corresponding patches stably. By putting quality over quantity, only a few negative patches are required to achieve better results. Lastly, we validate the superiority, stability, and versatility of our model through comparative experiments.
    Job Offers Classifier using Neural Networks and Oversampling Methods. (arXiv:2207.06223v1 [cs.IR])
    Both policy and research benefit from a better understanding of individuals' jobs. However, as large-scale administrative records are increasingly employed to represent labor market activity, new automatic methods to classify jobs will become necessary. We developed an automatic job offers classifier using a dataset collected from the largest job bank of Mexico known as Bumeran https://www.bumeran.com.mx/ Last visited: 19-01-2022.. We applied machine learning algorithms such as Support Vector Machines, Naive-Bayes, Logistic Regression, Random Forest, and deep learning Long-Short Term Memory (LSTM). Using these algorithms, we trained multi-class models to classify job offers in one of the 23 classes (not uniformly distributed): Sales, Administration, Call Center, Technology, Trades, Human Resources, Logistics, Marketing, Health, Gastronomy, Financing, Secretary, Production, Engineering, Education, Design, Legal, Construction, Insurance, Communication, Management, Foreign Trade, and Mining. We used the SMOTE, Geometric-SMOTE, and ADASYN synthetic oversampling algorithms to handle imbalanced classes. The proposed convolutional neural network architecture achieved the best results when applied the Geometric-SMOTE algorithm.
    3D Concept Grounding on Neural Fields. (arXiv:2207.06403v1 [cs.CV])
    In this paper, we address the challenging problem of 3D concept grounding (i.e. segmenting and learning visual concepts) by looking at RGBD images and reasoning about paired questions and answers. Existing visual reasoning approaches typically utilize supervised methods to extract 2D segmentation masks on which concepts are grounded. In contrast, humans are capable of grounding concepts on the underlying 3D representation of images. However, traditionally inferred 3D representations (e.g., point clouds, voxelgrids, and meshes) cannot capture continuous 3D features flexibly, thus making it challenging to ground concepts to 3D regions based on the language description of the object being referred to. To address both issues, we propose to leverage the continuous, differentiable nature of neural fields to segment and learn concepts. Specifically, each 3D coordinate in a scene is represented as a high-dimensional descriptor. Concept grounding can then be performed by computing the similarity between the descriptor vector of a 3D coordinate and the vector embedding of a language concept, which enables segmentations and concept learning to be jointly learned on neural fields in a differentiable fashion. As a result, both 3D semantic and instance segmentations can emerge directly from question answering supervision using a set of defined neural operators on top of neural fields (e.g., filtering and counting). Experimental results show that our proposed framework outperforms unsupervised/language-mediated segmentation models on semantic and instance segmentation tasks, as well as outperforms existing models on the challenging 3D aware visual reasoning tasks. Furthermore, our framework can generalize well to unseen shape categories and real scans.
    Neural Topological Ordering for Computation Graphs. (arXiv:2207.05899v1 [cs.LG])
    Recent works on machine learning for combinatorial optimization have shown that learning based approaches can outperform heuristic methods in terms of speed and performance. In this paper, we consider the problem of finding an optimal topological order on a directed acyclic graph with focus on the memory minimization problem which arises in compilers. We propose an end-to-end machine learning based approach for topological ordering using an encoder-decoder framework. Our encoder is a novel attention based graph neural network architecture called \emph{Topoformer} which uses different topological transforms of a DAG for message passing. The node embeddings produced by the encoder are converted into node priorities which are used by the decoder to generate a probability distribution over topological orders. We train our model on a dataset of synthetically generated graphs called layered graphs. We show that our model outperforms, or is on-par, with several topological ordering baselines while being significantly faster on synthetic graphs with up to 2k nodes. We also train and test our model on a set of real-world computation graphs, showing performance improvements.
    HiClass: a Python library for local hierarchical classification compatible with scikit-learn. (arXiv:2112.06560v5 [cs.LG] UPDATED)
    HiClass is an open-source Python library for local hierarchical classification entirely compatible with scikit-learn. It contains implementations of the most common design patterns for hierarchical machine learning models found in the literature, i.e., the local classifiers per node, per parent node and per level. Additionally, the package contains implementations of hierarchical metrics, which are more appropriate for evaluating classification performance on hierarchical data. The documentation includes installation and usage instructions, examples within tutorials and interactive notebooks, and a complete description of the API. HiClass is released under the simplified BSD license, encouraging its use in both academic and commercial environments. Source code and documentation are available at https://github.com/mirand863/hiclass.
    Information-theoretic Inducing Point Placement for High-throughput Bayesian Optimisation. (arXiv:2206.02437v2 [cs.LG] UPDATED)
    Sparse Gaussian Processes are a key component of high-throughput Bayesian optimisation (BO) loops -- an increasingly common setting where evaluation budgets are large and highly parallelised. By using representative subsets of the available data to build approximate posteriors, sparse models dramatically reduce the computational costs of surrogate modelling by relying on a small set of pseudo-observations, the so-called inducing points, in lieu of the full data set. However, current approaches to design inducing points are not appropriate within BO loops as they seek to reduce global uncertainty in the objective function. Thus, the high-fidelity modelling of promising and data-dense regions required for precise optimisation is sacrificed and computational resources are instead wasted on modelling areas of the space already known to be sub-optimal. Inspired by entropy-based BO methods, we propose a novel inducing point design that uses a principled information-theoretic criterion to select inducing points. By choosing inducing points to maximally reduce both global uncertainty and uncertainty in the maximum value of the objective function, we build surrogate models able to support high-precision high-throughput BO.
    D-CBRS: Accounting For Intra-Class Diversity in Continual Learning. (arXiv:2207.05897v1 [cs.LG])
    Continual learning -- accumulating knowledge from a sequence of learning experiences -- is an important yet challenging problem. In this paradigm, the model's performance for previously encountered instances may substantially drop as additional data are seen. When dealing with class-imbalanced data, forgetting is further exacerbated. Prior work has proposed replay-based approaches which aim at reducing forgetting by intelligently storing instances for future replay. Although Class-Balancing Reservoir Sampling (CBRS) has been successful in dealing with imbalanced data, the intra-class diversity has not been accounted for, implicitly assuming that each instance of a class is equally informative. We present Diverse-CBRS (D-CBRS), an algorithm that allows us to consider within class diversity when storing instances in the memory. Our results show that D-CBRS outperforms state-of-the-art memory management continual learning algorithms on data sets with considerable intra-class diversity.
    DeepTIMe: Deep Time-Index Meta-Learning for Non-Stationary Time-Series Forecasting. (arXiv:2207.06046v1 [cs.LG])
    Deep learning has been actively applied to time-series forecasting, leading to a deluge of new autoregressive model architectures. Yet, despite the attractive properties of time-index based models, such as being a continuous signal function over time leading to smooth representations, little attention has been given to them. Indeed, while naive deep time-index based models are far more expressive than the manually predefined function representations of classical time-index based models, they are inadequate for forecasting due to the lack of inductive biases, and the non-stationarity of time-series. In this paper, we propose DeepTIMe, a deep time-index based model trained via a meta-learning formulation which overcomes these limitations, yielding an efficient and accurate forecasting model. Extensive experiments on real world datasets demonstrate that our approach achieves competitive results with state-of-the-art methods, and is highly efficient. Code is available at https://github.com/salesforce/DeepTIMe.
    Exploration in Deep Reinforcement Learning: A Comprehensive Survey. (arXiv:2109.06668v4 [cs.AI] UPDATED)
    Deep Reinforcement Learning (DRL) and Deep Multi-agent Reinforcement Learning (MARL) have achieved significant successes across a wide range of domains, including game AI, autonomous vehicles, robotics, and so on. However, DRL and deep MARL agents are widely known to be sample inefficient that millions of interactions are usually needed even for relatively simple problem settings, thus preventing the wide application and deployment in real-industry scenarios. One bottleneck challenge behind is the well-known exploration problem, i.e., how efficiently exploring the environment and collecting informative experiences that could benefit policy learning towards the optimal ones. This problem becomes more challenging in complex environments with sparse rewards, noisy distractions, long horizons, and non-stationary co-learners. In this paper, we conduct a comprehensive survey on existing exploration methods for both single-agent and multi-agent RL. We start the survey by identifying several key challenges to efficient exploration. Beyond the above two main branches, we also include other notable exploration methods with different ideas and techniques. In addition to algorithmic analysis, we provide a comprehensive and unified empirical comparison of different exploration methods for DRL on a set of commonly used benchmarks. According to our algorithmic and empirical investigation, we finally summarize the open problems of exploration in DRL and deep MARL and point out a few future directions.
    GraphMAE: Self-Supervised Masked Graph Autoencoders. (arXiv:2205.10803v3 [cs.LG] UPDATED)
    Self-supervised learning (SSL) has been extensively explored in recent years. Particularly, generative SSL has seen emerging success in natural language processing and other AI fields, such as the wide adoption of BERT and GPT. Despite this, contrastive learning-which heavily relies on structural data augmentation and complicated training strategies-has been the dominant approach in graph SSL, while the progress of generative SSL on graphs, especially graph autoencoders (GAEs), has thus far not reached the potential as promised in other fields. In this paper, we identify and examine the issues that negatively impact the development of GAEs, including their reconstruction objective, training robustness, and error metric. We present a masked graph autoencoder GraphMAE that mitigates these issues for generative self-supervised graph pretraining. Instead of reconstructing graph structures, we propose to focus on feature reconstruction with both a masking strategy and scaled cosine error that benefit the robust training of GraphMAE. We conduct extensive experiments on 21 public datasets for three different graph learning tasks. The results manifest that GraphMAE-a simple graph autoencoder with careful designs-can consistently generate outperformance over both contrastive and generative state-of-the-art baselines. This study provides an understanding of graph autoencoders and demonstrates the potential of generative self-supervised pre-training on graphs.
    FD-GATDR: A Federated-Decentralized-Learning Graph Attention Network for Doctor Recommendation Using EHR. (arXiv:2207.05750v1 [cs.IR])
    In the past decade, with the development of big data technology, an increasing amount of patient information has been stored as electronic health records (EHRs). Leveraging these data, various doctor recommendation systems have been proposed. Typically, such studies process the EHR data in a flat-structured manner, where each encounter was treated as an unordered set of features. Nevertheless, the heterogeneous structured information such as service sequence stored in claims shall not be ignored. This paper presents a doctor recommendation system with time embedding to reconstruct the potential connections between patients and doctors using heterogeneous graph attention network. Besides, to address the privacy issue of patient data sharing crossing hospitals, a federated decentralized learning method based on a minimization optimization model is also proposed. The graph-based recommendation system has been validated on a EHR dataset. Compared to baseline models, the proposed method improves the AUC by up to 6.2%. And our proposed federated-based algorithm not only yields the fictitious fusion center's performance but also enjoys a convergence rate of O(1/T).
    Reinforcement Learning Assisted Recursive QAOA. (arXiv:2207.06294v1 [quant-ph])
    Variational quantum algorithms such as the Quantum Approximation Optimization Algorithm (QAOA) in recent years have gained popularity as they provide the hope of using NISQ devices to tackle hard combinatorial optimization problems. It is, however, known that at low depth, certain locality constraints of QAOA limit its performance. To go beyond these limitations, a non-local variant of QAOA, namely recursive QAOA (RQAOA), was proposed to improve the quality of approximate solutions. The RQAOA has been studied comparatively less than QAOA, and it is less understood, for instance, for what family of instances it may fail to provide high quality solutions. However, as we are tackling $\mathsf{NP}$-hard problems (specifically, the Ising spin model), it is expected that RQAOA does fail, raising the question of designing even better quantum algorithms for combinatorial optimization. In this spirit, we identify and analyze cases where RQAOA fails and, based on this, propose a reinforcement learning enhanced RQAOA variant (RL-RQAOA) that improves upon RQAOA. We show that the performance of RL-RQAOA improves over RQAOA: RL-RQAOA is strictly better on these identified instances where RQAOA underperforms, and is similarly performing on instances where RQAOA is near-optimal. Our work exemplifies the potentially beneficial synergy between reinforcement learning and quantum (inspired) optimization in the design of new, even better heuristics for hard problems.
    Goal-Oriented Sensitivity Analysis of Hyperparameters in Deep Learning. (arXiv:2207.06216v1 [stat.ML])
    Tackling new machine learning problems with neural networks always means optimizing numerous hyperparameters that define their structure and strongly impact their performances. In this work, we study the use of goal-oriented sensitivity analysis, based on the Hilbert-Schmidt Independence Criterion (HSIC), for hyperparameter analysis and optimization. Hyperparameters live in spaces that are often complex and awkward. They can be of different natures (categorical, discrete, boolean, continuous), interact, and have inter-dependencies. All this makes it non-trivial to perform classical sensitivity analysis. We alleviate these difficulties to obtain a robust analysis index that is able to quantify hyperparameters' relative impact on a neural network's final error. This valuable tool allows us to better understand hyperparameters and to make hyperparameter optimization more interpretable. We illustrate the benefits of this knowledge in the context of hyperparameter optimization and derive an HSIC-based optimization algorithm that we apply on MNIST and Cifar, classical machine learning data sets, but also on the approximation of Runge function and Bateman equations solution, of interest for scientific machine learning. This method yields neural networks that are both competitive and cost-effective.
    Constraint-Based Causal Structure Learning from Undersampled Graphs. (arXiv:2205.09235v2 [stat.ML] UPDATED)
    Graphical structures estimated by causal learning algorithms from time series data can provide highly misleading causal information if the causal timescale of the generating process fails to match the measurement timescale of the data. Although this problem has been recently recognized, practitioners have limited resources to respond to it, and so must continue using models that they know are likely misleading. Existing methods either (a) require that the difference between causal and measurement timescales is known; or (b) can handle only very small number of random variables when the timescale difference is unknown; or (c) apply to only pairs of variables, though with fewer assumptions about prior knowledge; or (d) return impractically too many solutions. This paper addresses all four challenges. We combine constraint programming with both theoretical insights into the problem structure and prior information about admissible causal interactions. The resulting system provides a practical approach that scales to significantly larger sets (>100) of random variables, does not require precise knowledge of the timescale difference, supports edge misidentification and parametric connection strengths, and can provide the optimum choice among many possible solutions. The cumulative impact of these improvements is gain of multiple orders of magnitude in speed and informativeness.
    DiverGet: A Search-Based Software Testing Approach for Deep Neural Network Quantization Assessment. (arXiv:2207.06282v1 [cs.LG])
    Quantization is one of the most applied Deep Neural Network (DNN) compression strategies, when deploying a trained DNN model on an embedded system or a cell phone. This is owing to its simplicity and adaptability to a wide range of applications and circumstances, as opposed to specific Artificial Intelligence (AI) accelerators and compilers that are often designed only for certain specific hardware (e.g., Google Coral Edge TPU). With the growing demand for quantization, ensuring the reliability of this strategy is becoming a critical challenge. Traditional testing methods, which gather more and more genuine data for better assessment, are often not practical because of the large size of the input space and the high similarity between the original DNN and its quantized counterpart. As a result, advanced assessment strategies have become of paramount importance. In this paper, we present DiverGet, a search-based testing framework for quantization assessment. DiverGet defines a space of metamorphic relations that simulate naturally-occurring distortions on the inputs. Then, it optimally explores these relations to reveal the disagreements among DNNs of different arithmetic precision. We evaluate the performance of DiverGet on state-of-the-art DNNs applied to hyperspectral remote sensing images. We chose the remote sensing DNNs as they're being increasingly deployed at the edge (e.g., high-lift drones) in critical domains like climate change research and astronomy. Our results show that DiverGet successfully challenges the robustness of established quantization techniques against naturally-occurring shifted data, and outperforms its most recent concurrent, DiffChaser, with a success rate that is (on average) four times higher.
    Does GNN Pretraining Help Molecular Representation?. (arXiv:2207.06010v1 [cs.LG])
    Extracting informative representations of molecules using Graph neural networks (GNNs) is crucial in AI-driven drug discovery. Recently, the graph research community has been trying to replicate the success of self-supervised pretraining in natural language processing, with several successes claimed. However, we find the benefit brought by self-supervised pretraining on molecular data can be negligible in many cases. We conduct thorough ablation studies on the key components of GNN pretraining, including pretraining objectives, data splitting methods, input features, pretraining dataset scales, and GNN architectures, in deciding the accuracy of the downstream tasks. Our first important finding is, self-supervised graph pretraining do not have statistically significant advantages over non-pretraining methods in many settings. Second, although improvement can be observed with additional supervised pretraining, the improvement may diminish with richer features or more balanced data splits. Third, experimental hyper-parameters have a larger impact on accuracy of downstream tasks than the choice of pretraining tasks. We hypothesize the complexity of pretraining on molecules is insufficient, leading to less transferable knowledge for downstream tasks.
    OccamNets: Mitigating Dataset Bias by Favoring Simpler Hypotheses. (arXiv:2204.02426v4 [cs.LG] UPDATED)
    Dataset bias and spurious correlations can significantly impair generalization in deep neural networks. Many prior efforts have addressed this problem using either alternative loss functions or sampling strategies that focus on rare patterns. We propose a new direction: modifying the network architecture to impose inductive biases that make the network robust to dataset bias. Specifically, we propose OccamNets, which are biased to favor simpler solutions by design. OccamNets have two inductive biases. First, they are biased to use as little network depth as needed for an individual example. Second, they are biased toward using fewer image locations for prediction. While OccamNets are biased toward simpler hypotheses, they can learn more complex hypotheses if necessary. In experiments, OccamNets outperform or rival state-of-the-art methods run on architectures that do not incorporate these inductive biases. Furthermore, we demonstrate that when the state-of-the-art debiasing methods are combined with OccamNets results further improve.
    Implicit Neural Representations for Generative Modeling of Living Cell Shapes. (arXiv:2207.06283v1 [cs.CV])
    Methods allowing the synthesis of realistic cell shapes could help generate training data sets to improve cell tracking and segmentation in biomedical images. Deep generative models for cell shape synthesis require a light-weight and flexible representation of the cell shape. However, commonly used voxel-based representations are unsuitable for high-resolution shape synthesis, and polygon meshes have limitations when modeling topology changes such as cell growth or mitosis. In this work, we propose to use level sets of signed distance functions (SDFs) to represent cell shapes. We optimize a neural network as an implicit neural representation of the SDF value at any point in a 3D+time domain. The model is conditioned on a latent code, thus allowing the synthesis of new and unseen shape sequences. We validate our approach quantitatively and qualitatively on C. elegans cells that grow and divide, and lung cancer cells with growing complex filopodial protrusions. Our results show that shape descriptors of synthetic cells resemble those of real cells, and that our model is able to generate topologically plausible sequences of complex cell shapes in 3D+time.
    Forecasting COVID-19 spreading trough an ensemble of classical and machine learning models: Spain's case study. (arXiv:2207.05753v1 [cs.LG])
    In this work we evaluate the applicability of an ensemble of population models and machine learning models to predict the near future evolution of the COVID-19 pandemic, with a particular use case in Spain. We rely solely in open and public datasets, fusing incidence, vaccination, human mobility and weather data to feed our machine learning models (Random Forest, Gradient Boosting, k-Nearest Neighbours and Kernel Ridge Regression). We use the incidence data to adjust classic population models (Gompertz, Logistic, Richards, Bertalanffy) in order to be able to better capture the trend of the data. We then ensemble these two families of models in order to obtain a more robust and accurate prediction. Furthermore, we have observed an improvement in the predictions obtained with machine learning models as we add new features (vaccines, mobility, climatic conditions), analyzing the importance of each of them using Shapley Additive Explanation values. As in any other modelling work, data and predictions quality have several limitations and therefore they must be seen from a critical standpoint, as we discuss in the text. Our work concludes that the ensemble use of these models improves the individual predictions (using only machine learning models or only population models) and can be applied, with caution, in cases when compartmental models cannot be utilized due to the lack of relevant data.
    Universal expressiveness of variational quantum classifiers and quantum kernels for support vector machines. (arXiv:2207.05865v1 [quant-ph])
    Machine learning is considered to be one of the most promising applications of quantum computing. Therefore, the search for quantum advantage of the quantum analogues of machine learning models is a key research goal. Here, we show that variational quantum classifiers (VQC) and support vector machines with quantum kernels (QSVM) can solve a classification problem based on the k-Forrelation problem, which is known to be PromiseBQP-complete. Because the PromiseBQP complexity class includes all Bounded-Error Quantum Polynomial-Time (BQP) decision problems, our results imply that there exists a feature map and a quantum kernel that make VQC and QSVM efficient solvers for any BQP problem. This means that the feature map of VQC or the quantum kernel of QSVM can be designed to have quantum advantage for any classification problem that cannot be classically solved in polynomial time but contrariwise by a quantum computer.
    On Merging Feature Engineering and Deep Learning for Diagnosis, Risk-Prediction and Age Estimation Based on the 12-Lead ECG. (arXiv:2207.06096v1 [cs.LG])
    Objective: Machine learning techniques have been used extensively for 12-lead electrocardiogram (ECG) analysis. For physiological time series, deep learning (DL) superiority to feature engineering (FE) approaches based on domain knowledge is still an open question. Moreover, it remains unclear whether combining DL with FE may improve performance. Methods: We considered three tasks intending to address these research gaps: cardiac arrhythmia diagnosis (multiclass-multilabel classification), atrial fibrillation risk prediction (binary classification), and age estimation (regression). We used an overall dataset of 2.3M 12-lead ECG recordings to train the following models for each task: i) a random forest taking the FE as input was trained as a classical machine learning approach; ii) an end-to-end DL model; and iii) a merged model of FE+DL. Results: FE yielded comparable results to DL while necessitating significantly less data for the two classification tasks and it was outperformed by DL for the regression task. For all tasks, merging FE with DL did not improve performance over DL alone. Conclusion: We found that for traditional 12-lead ECG based diagnosis tasks DL did not yield a meaningful improvement over FE, while it improved significantly the nontraditional regression task. We also found that combining FE with DL did not improve over DL alone which suggests that the FE were redundant with the features learned by DL. Significance: Our findings provides important recommendations on what machine learning strategy and data regime to chose with respect to the task at hand for the development of new machine learning models based on the 12-lead ECG.
    Normalized gradient flow optimization in the training of ReLU artificial neural networks. (arXiv:2207.06246v1 [math.OC])
    The training of artificial neural networks (ANNs) is nowadays a highly relevant algorithmic procedure with many applications in science and industry. Roughly speaking, ANNs can be regarded as iterated compositions between affine linear functions and certain fixed nonlinear functions, which are usually multidimensional versions of a one-dimensional so-called activation function. The most popular choice of such a one-dimensional activation function is the rectified linear unit (ReLU) activation function which maps a real number to its positive part $ \mathbb{R} \ni x \mapsto \max\{ x, 0 \} \in \mathbb{R} $. In this article we propose and analyze a modified variant of the standard training procedure of such ReLU ANNs in the sense that we propose to restrict the negative gradient flow dynamics to a large submanifold of the ANN parameter space, which is a strict $ C^{ \infty } $-submanifold of the entire ANN parameter space that seems to enjoy better regularity properties than the entire ANN parameter space but which is also sufficiently large and sufficiently high dimensional so that it can represent all ANN realization functions that can be represented through the entire ANN parameter space. In the special situation of shallow ANNs with just one-dimensional ANN layers we also prove for every Lipschitz continuous target function that every gradient flow trajectory on this large submanifold of the ANN parameter space is globally bounded. For the standard gradient flow on the entire ANN parameter space with Lipschitz continuous target functions it remains an open problem of research to prove or disprove the global boundedness of gradient flow trajectories even in the situation of shallow ANNs with just one-dimensional ANN layers.
    Learning robust marking policies for adaptive mesh refinement. (arXiv:2207.06339v1 [math.NA])
    In this work, we revisit the marking decisions made in the standard adaptive finite element method (AFEM). Experience shows that a na\"{i}ve marking policy leads to inefficient use of computational resources for adaptive mesh refinement (AMR). Consequently, using AFEM in practice often involves ad-hoc or time-consuming offline parameter tuning to set appropriate parameters for the marking subroutine. To address these practical concerns, we recast AMR as a Markov decision process in which refinement parameters can be selected on-the-fly at run time, without the need for pre-tuning by expert users. In this new paradigm, the refinement parameters are also chosen adaptively via a marking policy that can be optimized using methods from reinforcement learning. We use the Poisson equation to demonstrate our techniques on $h$- and $hp$-refinement benchmark problems, and our experiments suggest that superior marking policies remain undiscovered for many classical AFEM applications. Furthermore, an unexpected observation from this work is that marking policies trained on one family of PDEs are sometimes robust enough to perform well on problems far outside the training family. For illustration, we show that a simple $hp$-refinement policy trained on 2D domains with only a single re-entrant corner can be deployed on far more complicated 2D domains, and even 3D domains, without significant performance loss. For reproduction and broader adoption, we accompany this work with an open-source implementation of our methods.
    Earthformer: Exploring Space-Time Transformers for Earth System Forecasting. (arXiv:2207.05833v1 [cs.LG])
    Conventionally, Earth system (e.g., weather and climate) forecasting relies on numerical simulation with complex physical models and are hence both expensive in computation and demanding on domain expertise. With the explosive growth of the spatiotemporal Earth observation data in the past decade, data-driven models that apply Deep Learning (DL) are demonstrating impressive potential for various Earth system forecasting tasks. The Transformer as an emerging DL architecture, despite its broad success in other domains, has limited adoption in this area. In this paper, we propose Earthformer, a space-time Transformer for Earth system forecasting. Earthformer is based on a generic, flexible and efficient space-time attention block, named Cuboid Attention. The idea is to decompose the data into cuboids and apply cuboid-level self-attention in parallel. These cuboids are further connected with a collection of global vectors. We conduct experiments on the MovingMNIST dataset and a newly proposed chaotic N-body MNIST dataset to verify the effectiveness of cuboid attention and figure out the best design of Earthformer. Experiments on two real-world benchmarks about precipitation nowcasting and El Nino/Southern Oscillation (ENSO) forecasting show Earthformer achieves state-of-the-art performance.
    Implicit regularization of dropout. (arXiv:2207.05952v1 [cs.LG])
    It is important to understand how the popular regularization method dropout helps the neural network training find a good generalization solution. In this work, we theoretically derive the implicit regularization of dropout and study the relation between the Hessian matrix of the loss function and the covariance matrix of the dropout noise, supported by a series of experiments. We then numerically study two implications of the implicit regularization of dropout, which intuitively rationalize why dropout helps generalization. First, we find that the training with dropout finds the neural network with a flatter minimum compared with standard gradient descent training in experiments, and the implicit regularization is the key for finding flat solutions. Second, trained with dropout, input weights of hidden neurons (the input weight of a hidden neuron consists of the weight from its input layer to the hidden neuron and its bias term) would tend to condense on isolated orientations. Condensation is a feature in non-linear learning process, which makes the neural network low complexity. Although our theory mainly focuses on the dropout used in the last hidden layer, our experiments apply for general dropout in training neural networks. This work points out the distinct characteristics of dropout compared with stochastic gradient descent and serves as an important basis for fully understanding dropout.
    Learning to Control Local Search for Combinatorial Optimization. (arXiv:2206.13181v2 [cs.LG] UPDATED)
    Combinatorial optimization problems are encountered in many practical contexts such as logistics and production, but exact solutions are particularly difficult to find and usually NP-hard for considerable problem sizes. To compute approximate solutions, a zoo of generic as well as problem-specific variants of local search is commonly used. However, which variant to apply to which particular problem is difficult to decide even for experts. In this paper we identify three independent algorithmic aspects of such local search algorithms and formalize their sequential selection over an optimization process as Markov Decision Process (MDP). We design a deep graph neural network as policy model for this MDP, yielding a learned controller for local search called NeuroLS. Ample experimental evidence shows that NeuroLS is able to outperform both, well-known general purpose local search controllers from Operations Research as well as latest machine learning-based approaches.
    Human-AI Collaboration in Decision-Making: Beyond Learning to Defer. (arXiv:2206.13202v2 [cs.LG] UPDATED)
    Human-AI collaboration (HAIC) in decision-making aims to create synergistic teaming between human decision-makers and AI systems. Learning to defer (L2D) has been presented as a promising framework to determine who among humans and AI should make which decisions in order to optimize the performance and fairness of the combined system. Nevertheless, L2D entails several often unfeasible requirements, such as the availability of predictions from humans for every instance or ground-truth labels that are independent from said humans. Furthermore, neither L2D nor alternative approaches tackle fundamental issues of deploying HAIC systems in real-world settings, such as capacity management or dealing with dynamic environments. In this paper, we aim to identify and review these and other limitations, pointing to where opportunities for future research in HAIC may lie.
    Robust Data-Driven Predictive Control using Reachability Analysis. (arXiv:2103.14110v3 [eess.SY] UPDATED)
    We present a robust data-driven control scheme for an unknown linear system model with bounded process and measurement noise. Instead of depending on a system model in traditional predictive control, a controller utilizing data-driven reachable regions is proposed. The data-driven reachable regions are based on a matrix zonotope recursion and are computed based on only noisy input-output data of a trajectory of the system. We assume that measurement and process noise are contained in bounded sets. While we assume knowledge of these bounds, no knowledge about the statistical properties of the noise is assumed. In the noise-free case, we prove that the presented purely data-driven control scheme results in an equivalent closed-loop behavior to a nominal model predictive control scheme. In the case of measurement and process noise, our proposed scheme guarantees robust constraint satisfaction, which is essential in safety-critical applications. Numerical experiments show the effectiveness of the proposed data-driven controller in comparison to model-based control schemes.
    Towards Highly Expressive Machine Learning Models of Non-Melanoma Skin Cancer. (arXiv:2207.05749v1 [cs.LG])
    Pathologists have a rich vocabulary with which they can describe all the nuances of cellular morphology. In their world, there is a natural pairing of images and words. Recent advances demonstrate that machine learning models can now be trained to learn high-quality image features and represent them as discrete units of information. This enables natural language, which is also discrete, to be jointly modelled alongside the imaging, resulting in a description of the contents of the imaging. Here we present experiments in applying discrete modelling techniques to the problem domain of non-melanoma skin cancer, specifically, histological images of Intraepidermal Carcinoma (IEC). Implementing a VQ-GAN model to reconstruct high-resolution (256x256) images of IEC images, we trained a sequence-to-sequence transformer to generate natural language descriptions using pathologist terminology. Combined with the idea of interactive concept vectors available by using continuous generative methods, we demonstrate an additional angle of interpretability. The result is a promising means of working towards highly expressive machine learning systems which are not only useful as predictive/classification tools, but also means to further our scientific understanding of disease.
    Optimistic PAC Reinforcement Learning: the Instance-Dependent View. (arXiv:2207.05852v1 [cs.LG])
    Optimistic algorithms have been extensively studied for regret minimization in episodic tabular MDPs, both from a minimax and an instance-dependent view. However, for the PAC RL problem, where the goal is to identify a near-optimal policy with high probability, little is known about their instance-dependent sample complexity. A negative result of Wagenmaker et al. (2021) suggests that optimistic sampling rules cannot be used to attain the (still elusive) optimal instance-dependent sample complexity. On the positive side, we provide the first instance-dependent bound for an optimistic algorithm for PAC RL, BPI-UCRL, for which only minimax guarantees were available (Kaufmann et al., 2021). While our bound features some minimal visitation probabilities, it also features a refined notion of sub-optimality gap compared to the value gaps that appear in prior work. Moreover, in MDPs with deterministic transitions, we show that BPI-UCRL is actually near-optimal. On the technical side, our analysis is very simple thanks to a new "target trick" of independent interest. We complement these findings with a novel hardness result explaining why the instance-dependent complexity of PAC RL cannot be easily related to that of regret minimization, unlike in the minimax regime.
    A Conceptual Framework for Using Machine Learning to Support Child Welfare Decisions. (arXiv:2207.05855v1 [cs.CY])
    Human services systems make key decisions that impact individuals in the society. The U.S. child welfare system makes such decisions, from screening-in hotline reports of suspected abuse or neglect for child protective investigations, placing children in foster care, to returning children to permanent home settings. These complex and impactful decisions on children's lives rely on the judgment of child welfare decisionmakers. Child welfare agencies have been exploring ways to support these decisions with empirical, data-informed methods that include machine learning (ML). This paper describes a conceptual framework for ML to support child welfare decisions. The ML framework guides how child welfare agencies might conceptualize a target problem that ML can solve; vet available administrative data for building ML; formulate and develop ML specifications that mirror relevant populations and interventions the agencies are undertaking; deploy, evaluate, and monitor ML as child welfare context, policy, and practice change over time. Ethical considerations, stakeholder engagement, and avoidance of common pitfalls underpin the framework's impact and success. From abstract to concrete, we describe one application of this framework to support a child welfare decision. This ML framework, though child welfare-focused, is generalizable to solving other public policy problems.
    Slicing Aided Hyper Inference and Fine-tuning for Small Object Detection. (arXiv:2202.06934v4 [cs.CV] UPDATED)
    Detection of small objects and objects far away in the scene is a major challenge in surveillance applications. Such objects are represented by small number of pixels in the image and lack sufficient details, making them difficult to detect using conventional detectors. In this work, an open-source framework called Slicing Aided Hyper Inference (SAHI) is proposed that provides a generic slicing aided inference and fine-tuning pipeline for small object detection. The proposed technique is generic in the sense that it can be applied on top of any available object detector without any fine-tuning. Experimental evaluations, using object detection baselines on the Visdrone and xView aerial object detection datasets show that the proposed inference method can increase object detection AP by 6.8%, 5.1% and 5.3% for FCOS, VFNet and TOOD detectors, respectively. Moreover, the detection accuracy can be further increased with a slicing aided fine-tuning, resulting in a cumulative increase of 12.7%, 13.4% and 14.5% AP in the same order. Proposed technique has been integrated with Detectron2, MMDetection and YOLOv5 models and it is publicly available at https://github.com/obss/sahi.git .
    RelaxLoss: Defending Membership Inference Attacks without Losing Utility. (arXiv:2207.05801v1 [cs.LG])
    As a long-term threat to the privacy of training data, membership inference attacks (MIAs) emerge ubiquitously in machine learning models. Existing works evidence strong connection between the distinguishability of the training and testing loss distributions and the model's vulnerability to MIAs. Motivated by existing results, we propose a novel training framework based on a relaxed loss with a more achievable learning target, which leads to narrowed generalization gap and reduced privacy leakage. RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead. Through extensive evaluations on five datasets with diverse modalities (images, medical data, transaction records), our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs as well as model utility. Our defense is the first that can withstand a wide range of attacks while preserving (or even improving) the target model's utility. Source code is available at https://github.com/DingfanChen/RelaxLoss
    Unsupervised Recognition of Informative Features via Tensor Network Machine Learning and Quantum Entanglement Variations. (arXiv:2207.06031v1 [quant-ph])
    Given an image of a white shoe drawn on a blackboard, how are the white pixels deemed (say by human minds) to be informative for recognizing the shoe without any labeling information on the pixels? Here we investigate such a "white shoe" recognition problem from the perspective of tensor network (TN) machine learning and quantum entanglement. Utilizing a generative TN that captures the probability distribution of the features as quantum amplitudes, we propose an unsupervised recognition scheme of informative features with the variations of entanglement entropy (EE) caused by designed measurements. In this way, a given sample, where the values of its features are statistically meaningless, is mapped to the variations of EE that are statistically meaningful. We show that the EE variations identify the features that are critical to recognize this specific sample, and the EE itself reveals the information distribution from the TN model. The signs of the variations further reveal the entanglement structures among the features. We test the validity of our scheme on a toy dataset of strip images, the MNIST dataset of hand-drawn digits, and the fashion-MNIST dataset of the pictures of fashion articles. Our scheme opens the avenue to the quantum-inspired and interpreted unsupervised learning and could be applied to, e.g., image segmentation and object detection.
    Online Decision Transformer. (arXiv:2202.05607v2 [cs.LG] UPDATED)
    Recent work has shown that offline reinforcement learning (RL) can be formulated as a sequence modeling problem (Chen et al., 2021; Janner et al., 2021) and solved via approaches similar to large-scale language modeling. However, any practical instantiation of RL also involves an online component, where policies pretrained on passive offline datasets are finetuned via taskspecific interactions with the environment. We propose Online Decision Transformers (ODT), an RL algorithm based on sequence modeling that blends offline pretraining with online finetuning in a unified framework. Our framework uses sequence-level entropy regularizers in conjunction with autoregressive modeling objectives for sample-efficient exploration and finetuning. Empirically, we show that ODT is competitive with the state-of-the-art in absolute performance on the D4RL benchmark but shows much more significant gains during the finetuning procedure.
    ConvGeN: Convex space learning improves deep-generative oversampling for tabular imbalanced classification on smaller datasets. (arXiv:2206.09812v2 [cs.LG] UPDATED)
    Data is commonly stored in tabular format. Several fields of research are prone to small imbalanced tabular data. Supervised Machine Learning on such data is often difficult due to class imbalance. Synthetic data generation, i.e., oversampling, is a common remedy used to improve classifier performance. State-of-the-art linear interpolation approaches, such as LoRAS and ProWRAS can be used to generate synthetic samples from the convex space of the minority class to improve classifier performance in such cases. Deep generative networks are common deep learning approaches for synthetic sample generation, widely used for synthetic image generation. However, their scope on synthetic tabular data generation in the context of imbalanced classification is not adequately explored. In this article, we show that existing deep generative models perform poorly compared to linear interpolation based approaches for imbalanced classification problems on smaller tabular datasets. To overcome this, we propose a deep generative model, ConvGeN that combines the idea of convex space learning with deep generative models. ConvGeN learns the coefficients for the convex combinations of the minority class samples, such that the synthetic data is distinct enough from the majority class. Our benchmarking experiments demonstrate that our proposed model ConvGeN improves imbalanced classification on such small datasets, as compared to existing deep generative models, while being at-par with the existing linear interpolation approaches. Moreover, we discuss how our model can be used for synthetic tabular data generation in general, even outside the scope of data imbalance and thus, improves the overall applicability of convex space learning.
    Collaboration-Aware Graph Convolutional Networks for Recommendation Systems. (arXiv:2207.06221v1 [cs.IR])
    By virtue of the message-passing that implicitly injects collaborative effect into the embedding process, Graph Neural Networks (GNNs) have been successfully adopted in recommendation systems. Nevertheless, most of existing message-passing mechanisms in recommendation are directly inherited from GNNs without any recommendation-tailored modification. Although some efforts have been made towards simplifying GNNs to improve the performance/efficiency of recommendation, no study has comprehensively scrutinized how message-passing captures collaborative effect and whether the captured effect would benefit the prediction of user preferences over items. Therefore, in this work we aim to demystify the collaborative effect captured by message-passing in GNNs and develop new insights towards customizing message-passing for recommendation. First, we theoretically analyze how message-passing captures and leverages the collaborative effect in predicting user preferences. Then, to determine whether the captured collaborative effect would benefit the prediction of user preferences, we propose a recommendation-oriented topological metric, Common Interacted Ratio (CIR), which measures the level of interaction between a specific neighbor of a node with the rest of its neighborhood set. Inspired by our theoretical and empirical analysis, we propose a recommendation-tailored GNN, Augmented Collaboration-Aware Graph Convolutional Network (CAGCN*), that extends upon the LightGCN framework and is able to selectively pass information of neighbors based on their CIR via the Collaboration-Aware Graph Convolution. Experimental results on six benchmark datasets show that CAGCN* outperforms the most representative GNN-based recommendation model, LightGCN, by 9% in Recall@20 and also achieves more than 79% speedup. Our code is publicly available at https://github.com/YuWVandy/CAGCN.
    A new hope for network model generalization. (arXiv:2207.05843v1 [cs.NI])
    Generalizing machine learning (ML) models for network traffic dynamics tends to be considered a lost cause. Hence, for every new task, we often resolve to design new models and train them on model-specific datasets collected, whenever possible, in an environment mimicking the model's deployment. This approach essentially gives up on generalization. Yet, an ML architecture called_Transformer_ has enabled previously unimaginable generalization in other domains. Nowadays, one can download a model pre-trained on massive datasets and only fine-tune it for a specific task and context with comparatively little time and data. These fine-tuned models are now state-of-the-art for many benchmarks. We believe this progress could translate to networking and propose a Network Traffic Transformer (NTT), a transformer adapted to learn network dynamics from packet traces. Our initial results are promising: NTT seems able to generalize to new prediction tasks and contexts. This study suggests there is still hope for generalization, though it calls for a lot of future research.
    Experiments on Anomaly Detection in Autonomous Driving by Forward-Backward Style Transfers. (arXiv:2207.06055v1 [cs.CV])
    Great progress has been achieved in the community of autonomous driving in the past few years. As a safety-critical problem, however, anomaly detection is a huge hurdle towards a large-scale deployment of autonomous vehicles in the real world. While many approaches, such as uncertainty estimation or segmentation-based image resynthesis, are extremely promising, there is more to be explored. Especially inspired by works on anomaly detection based on image resynthesis, we propose a novel approach for anomaly detection through style transfer. We leverage generative models to map an image from its original style domain of road traffic to an arbitrary one and back to generate pixelwise anomaly scores. However, our experiments have proven our hypothesis wrong, and we were unable to produce significant results. Nevertheless, we want to share our findings, so that others can learn from our experiments.
    OSLAT: Open Set Label Attention Transformer for Medical Entity Span Extraction. (arXiv:2207.05817v1 [cs.CL])
    Identifying spans in medical texts that correspond to medical entities is one of the core steps for many healthcare NLP tasks such as ICD coding, medical finding extraction, medical note contextualization, to name a few. Existing entity extraction methods rely on a fixed and limited vocabulary of medical entities and have difficulty with extracting entities represented by disjoint spans. In this paper, we present a new transformer-based architecture called OSLAT, Open Set Label Attention Transformer, that addresses many of the limitations of the previous methods. Our approach uses the label-attention mechanism to implicitly learn spans associated with entities of interest. These entities can be provided as free text, including entities not seen during OSLAT's training, and the model can extract spans even when they are disjoint. To test the generalizability of our method, we train two separate models on two different datasets, which have very low entity overlap: (1) a public discharge notes dataset from hNLP, and (2) a much more challenging proprietary patient text dataset "Reasons for Encounter" (RFE). We find that OSLAT models trained on either dataset outperform rule-based and fuzzy string matching baselines when applied to the RFE dataset as well as to the portion of hNLP dataset where entities are represented by disjoint spans. Our code can be found at https://github.com/curai/curai-research/tree/main/OSLAT.
    Towards Knowledge-based Mining of Mental Disorder Patterns from Textual Data. (arXiv:2207.06254v1 [cs.IR])
    Mental health disorders may cause severe consequences on all the countries' economies and health. For example, the impacts of the COVID-19 pandemic, such as isolation and travel ban, can make us feel depressed. Identifying early signs of mental health disorders is vital. For example, depression may increase an individual's risk of suicide. The state-of-the-art research in identifying mental disorder patterns from textual data, uses hand-labelled training sets, especially when a domain expert's knowledge is required to analyse various symptoms. This task could be time-consuming and expensive. To address this challenge, in this paper, we study and analyse the various clinical and non-clinical approaches to identifying mental health disorders. We leverage the domain knowledge and expertise in cognitive science to build a domain-specific Knowledge Base (KB) for the mental health disorder concepts and patterns. We present a weaker form of supervision by facilitating the generating of training data from a domain-specific Knowledge Base (KB). We adopt a typical scenario for analysing social media to identify major depressive disorder symptoms from the textual content generated by social users. We use this scenario to evaluate how our knowledge-based approach significantly improves the quality of results.
    Non-Myopic Multifidelity Bayesian Optimization. (arXiv:2207.06325v1 [cs.LG])
    Bayesian optimization is a popular framework for the optimization of black box functions. Multifidelity methods allows to accelerate Bayesian optimization by exploiting low-fidelity representations of expensive objective functions. Popular multifidelity Bayesian strategies rely on sampling policies that account for the immediate reward obtained evaluating the objective function at a specific input, precluding greater informative gains that might be obtained looking ahead more steps. This paper proposes a non-myopic multifidelity Bayesian framework to grasp the long-term reward from future steps of the optimization. Our computational strategy comes with a two-step lookahead multifidelity acquisition function that maximizes the cumulative reward obtained measuring the improvement in the solution over two steps ahead. We demonstrate that the proposed algorithm outperforms a standard multifidelity Bayesian framework on popular benchmark optimization problems.
    Efficient Adaptive Regret Minimization. (arXiv:2207.00646v2 [cs.LG] UPDATED)
    In online convex optimization the player aims to minimize her regret against a fixed comparator over the entire repeated game. Algorithms that minimize standard regret may converge to a fixed decision, which is undesireable in changing or dynamic environments. This motivates the stronger metric of adaptive regret, or the maximum regret over any continuous sub-interval in time. Existing adaptive regret algorithms suffer from a computational penalty - typically on the order of a multiplicative factor that grows logarithmically in the number of game iterations. In this paper we show how to reduce this computational penalty to be doubly logarithmic in the number of game iterations, and with minimal degradation to the optimal attainable adaptive regret bounds.
    Efficient and Scalable Recommendation via Item-Item Graph Partitioning. (arXiv:2207.05959v1 [cs.IR])
    Collaborative filtering (CF) is a widely searched problem in recommender systems. Linear autoencoder is a kind of well-established method for CF, which estimates item-item relations through encoding user-item interactions. Despite the excellent performance of linear autoencoders, the rapidly increasing computational and storage costs caused by the growing number of items limit their scalabilities in large-scale real-world scenarios. Recently, graph-based approaches have achieved success on CF with high scalability, and have been shown to have commonalities with linear autoencoders in user-item interaction modeling. Motivated by this, we propose an efficient and scalable recommendation via item-item graph partitioning (ERGP), aiming to address the limitations of linear autoencoders. In particular, a recursive graph partitioning strategy is proposed to ensure that the item set is divided into several partitions of finite size. Linear autoencoders encode user-item interactions within partitions while preserving global information across the entire item set. This allows ERGP to have guaranteed efficiency and high scalability when the number of items increases. Experiments conducted on 3 public datasets and 3 open benchmarking datasets demonstrate the effectiveness of ERGP, which outperforms state-of-the-art models with lower training time and storage costs.
    N-Grammer: Augmenting Transformers with latent n-grams. (arXiv:2207.06366v1 [cs.CL])
    Transformer models have recently emerged as one of the foundational models in natural language processing, and as a byproduct, there is significant recent interest and investment in scaling these models. However, the training and inference costs of these large Transformer language models are prohibitive, thus necessitating more research in identifying more efficient variants. In this work, we propose a simple yet effective modification to the Transformer architecture inspired by the literature in statistical language modeling, by augmenting the model with n-grams that are constructed from a discrete latent representation of the text sequence. We evaluate our model, the N-Grammer on language modeling on the C4 data-set as well as text classification on the SuperGLUE data-set, and find that it outperforms several strong baselines such as the Transformer and the Primer. We open-source our model for reproducibility purposes in Jax.
    Domain adaptation strategies for cancer-independent detection of lymph node metastases. (arXiv:2207.06193v1 [eess.IV])
    Recently, large, high-quality public datasets have led to the development of convolutional neural networks that can detect lymph node metastases of breast cancer at the level of expert pathologists. Many cancers, regardless of the site of origin, can metastasize to lymph nodes. However, collecting and annotating high-volume, high-quality datasets for every cancer type is challenging. In this paper we investigate how to leverage existing high-quality datasets most efficiently in multi-task settings for closely related tasks. Specifically, we will explore different training and domain adaptation strategies, including prevention of catastrophic forgetting, for colon and head-and-neck cancer metastasis detection in lymph nodes. Our results show state-of-the-art performance on both cancer metastasis detection tasks. Furthermore, we show the effectiveness of repeated adaptation of networks from one cancer type to another to obtain multi-task metastasis detection networks. Last, we show that leveraging existing high-quality datasets can significantly boost performance on new target tasks and that catastrophic forgetting can be effectively mitigated using regularization.
    BR-SNIS: Bias Reduced Self-Normalized Importance Sampling. (arXiv:2207.06364v1 [stat.ML])
    Importance Sampling (IS) is a method for approximating expectations under a target distribution using independent samples from a proposal distribution and the associated importance weights. In many applications, the target distribution is known only up to a normalization constant, in which case self-normalized IS (SNIS) can be used. While the use of self-normalization can have a positive effect on the dispersion of the estimator, it introduces bias. In this work, we propose a new method, BR-SNIS, whose complexity is essentially the same as that of SNIS and which significantly reduces bias without increasing the variance. This method is a wrapper in the sense that it uses the same proposal samples and importance weights as SNIS, but makes clever use of iterated sampling--importance resampling (ISIR) to form a bias-reduced version of the estimator. We furnish the proposed algorithm with rigorous theoretical results, including new bias, variance and high-probability bounds, and these are illustrated by numerical examples.
    Data-driven Control of Agent-based Models: an Equation/Variable-free Machine Learning Approach. (arXiv:2207.05779v1 [math.DS])
    We present an Equation/Variable free machine learning (EVFML) framework for the control of the collective dynamics of complex/multiscale systems modelled via microscopic/agent-based simulators. The approach obviates the need for construction of surrogate, reduced-order models.~The proposed implementation consists of three steps: (A) from high-dimensional agent-based simulations, machine learning (in particular, non-linear manifold learning (Diffusion Maps (DMs)) helps identify a set of coarse-grained variables that parametrize the low-dimensional manifold on which the emergent/collective dynamics evolve. The out-of-sample extension and pre-image problems, i.e. the construction of non-linear mappings from the high-dimensional input space to the low-dimensional manifold and back, are solved by coupling DMs with the Nystrom extension and Geometric Harmonics, respectively; (B) having identified the manifold and its coordinates, we exploit the Equation-free approach to perform numerical bifurcation analysis of the emergent dynamics; then (C) based on the previous steps, we design data-driven embedded wash-out controllers that drive the agent-based simulators to their intrinsic, imprecisely known, emergent open-loop unstable steady-states, thus demonstrating that the scheme is robust against numerical approximation errors and modelling uncertainty.~The efficiency of the framework is illustrated by controlling emergent unstable (i) traveling waves of a deterministic agent-based model of traffic dynamics, and (ii) equilibria of a stochastic financial market agent model with mimesis.
    Exploring Sequence Feature Alignment for Domain Adaptive Detection Transformers. (arXiv:2107.12636v3 [cs.CV] UPDATED)
    Detection transformers have recently shown promising object detection results and attracted increasing attention. However, how to develop effective domain adaptation techniques to improve its cross-domain performance remains unexplored and unclear. In this paper, we delve into this topic and empirically find that direct feature distribution alignment on the CNN backbone only brings limited improvements, as it does not guarantee domain-invariant sequence features in the transformer for prediction. To address this issue, we propose a novel Sequence Feature Alignment (SFA) method that is specially designed for the adaptation of detection transformers. Technically, SFA consists of a domain query-based feature alignment (DQFA) module and a token-wise feature alignment (TDA) module. In DQFA, a novel domain query is used to aggregate and align global context from the token sequence of both domains. DQFA reduces the domain discrepancy in global feature representations and object relations when deploying in the transformer encoder and decoder, respectively. Meanwhile, TDA aligns token features in the sequence from both domains, which reduces the domain gaps in local and instance-level feature representations in the transformer encoder and decoder, respectively. Besides, a novel bipartite matching consistency loss is proposed to enhance the feature discriminability for robust object detection. Experiments on three challenging benchmarks show that SFA outperforms state-of-the-art domain adaptive object detection methods. Code has been made available at: https://github.com/encounter1997/SFA.
    Interactive Machine Learning: A State of the Art Review. (arXiv:2207.06196v1 [cs.LG])
    Machine learning has proved useful in many software disciplines, including computer vision, speech and audio processing, natural language processing, robotics and some other fields. However, its applicability has been significantly hampered due its black-box nature and significant resource consumption. Performance is achieved at the expense of enormous computational resource and usually compromising the robustness and trustworthiness of the model. Recent researches have been identifying a lack of interactivity as the prime source of these machine learning problems. Consequently, interactive machine learning (iML) has acquired increased attention of researchers on account of its human-in-the-loop modality and relatively efficient resource utilization. Thereby, a state-of-the-art review of interactive machine learning plays a vital role in easing the effort toward building human-centred models. In this paper, we provide a comprehensive analysis of the state-of-the-art of iML. We analyze salient research works using merit-oriented and application/task oriented mixed taxonomy. We use a bottom-up clustering approach to generate a taxonomy of iML research works. Research works on adversarial black-box attacks and corresponding iML based defense system, exploratory machine learning, resource constrained learning, and iML performance evaluation are analyzed under their corresponding theme in our merit-oriented taxonomy. We have further classified these research works into technical and sectoral categories. Finally, research opportunities that we believe are inspiring for future work in iML are discussed thoroughly.
    Multiple Kernel Clustering with Dual Noise Minimization. (arXiv:2207.06041v1 [cs.LG])
    Clustering is a representative unsupervised method widely applied in multi-modal and multi-view scenarios. Multiple kernel clustering (MKC) aims to group data by integrating complementary information from base kernels. As a representative, late fusion MKC first decomposes the kernels into orthogonal partition matrices, then learns a consensus one from them, achieving promising performance recently. However, these methods fail to consider the noise inside the partition matrix, preventing further improvement of clustering performance. We discover that the noise can be disassembled into separable dual parts, i.e. N-noise and C-noise (Null space noise and Column space noise). In this paper, we rigorously define dual noise and propose a novel parameter-free MKC algorithm by minimizing them. To solve the resultant optimization problem, we design an efficient two-step iterative strategy. To our best knowledge, it is the first time to investigate dual noise within the partition in the kernel space. We observe that dual noise will pollute the block diagonal structures and incur the degeneration of clustering performance, and C-noise exhibits stronger destruction than N-noise. Owing to our efficient mechanism to minimize dual noise, the proposed algorithm surpasses the recent methods by large margins.
    Long Term Fairness for Minority Groups via Performative Distributionally Robust Optimization. (arXiv:2207.05777v1 [cs.LG])
    Fairness researchers in machine learning (ML) have coalesced around several fairness criteria which provide formal definitions of what it means for an ML model to be fair. However, these criteria have some serious limitations. We identify four key shortcomings of these formal fairness criteria, and aim to help to address them by extending performative prediction to include a distributionally robust objective.
    Estimating Test Performance for AI Medical Devices under Distribution Shift with Conformal Prediction. (arXiv:2207.05796v1 [cs.LG])
    Estimating the test performance of software AI-based medical devices under distribution shifts is crucial for evaluating the safety, efficiency, and usability prior to clinical deployment. Due to the nature of regulated medical device software and the difficulty in acquiring large amounts of labeled medical datasets, we consider the task of predicting the test accuracy of an arbitrary black-box model on an unlabeled target domain without modification to the original training process or any distributional assumptions of the original source data (i.e. we treat the model as a "black-box" and only use the predicted output responses). We propose a "black-box" test estimation technique based on conformal prediction and evaluate it against other methods on three medical imaging datasets (mammography, dermatology, and histopathology) under several clinically relevant types of distribution shift (institution, hardware scanner, atlas, hospital). We hope that by promoting practical and effective estimation techniques for black-box models, manufacturers of medical devices will develop more standardized and realistic evaluation procedures to improve the robustness and trustworthiness of clinical AI tools.
    Radar Image Reconstruction from Raw ADC Data using Parametric Variational Autoencoder with Domain Adaptation. (arXiv:2207.06379v1 [cs.CV])
    This paper presents a parametric variational autoencoder-based human target detection and localization framework working directly with the raw analog-to-digital converter data from the frequency modulated continous wave radar. We propose a parametrically constrained variational autoencoder, with residual and skip connections, capable of generating the clustered and localized target detections on the range-angle image. Furthermore, to circumvent the problem of training the proposed neural network on all possible scenarios using real radar data, we propose domain adaptation strategies whereby we first train the neural network using ray tracing based model data and then adapt the network to work on real sensor data. This strategy ensures better generalization and scalability of the proposed neural network even though it is trained with limited radar data. We demonstrate the superior detection and localization performance of our proposed solution compared to the conventional signal processing pipeline and earlier state-of-art deep U-Net architecture with range-doppler images as inputs
    Revealing Unfair Models by Mining Interpretable Evidence. (arXiv:2207.05811v1 [cs.LG])
    The popularity of machine learning has increased the risk of unfair models getting deployed in high-stake applications, such as justice system, drug/vaccination design, and medical diagnosis. Although there are effective methods to train fair models from scratch, how to automatically reveal and explain the unfairness of a trained model remains a challenging task. Revealing unfairness of machine learning models in interpretable fashion is a critical step towards fair and trustworthy AI. In this paper, we systematically tackle the novel task of revealing unfair models by mining interpretable evidence (RUMIE). The key idea is to find solid evidence in the form of a group of data instances discriminated most by the model. To make the evidence interpretable, we also find a set of human-understandable key attributes and decision rules that characterize the discriminated data instances and distinguish them from the other non-discriminated data. As demonstrated by extensive experiments on many real-world data sets, our method finds highly interpretable and solid evidence to effectively reveal the unfairness of trained models. Moreover, it is much more scalable than all of the baseline methods.
    Text-driven Emotional Style Control and Cross-speaker Style Transfer in Neural TTS. (arXiv:2207.06000v1 [cs.CL])
    Expressive text-to-speech has shown improved performance in recent years. However, the style control of synthetic speech is often restricted to discrete emotion categories and requires training data recorded by the target speaker in the target style. In many practical situations, users may not have reference speech recorded in target emotion but still be interested in controlling speech style just by typing text description of desired emotional style. In this paper, we propose a text-based interface for emotional style control and cross-speaker style transfer in multi-speaker TTS. We propose the bi-modal style encoder which models the semantic relationship between text description embedding and speech style embedding with a pretrained language model. To further improve cross-speaker style transfer on disjoint, multi-style datasets, we propose the novel style loss. The experimental results show that our model can generate high-quality expressive speech even in unseen style.
    Shape-Aware Masking for Inpainting in Medical Imaging. (arXiv:2207.05787v1 [eess.IV])
    Inpainting has recently been proposed as a successful deep learning technique for unsupervised medical image model discovery. The masks used for inpainting are generally independent of the dataset and are not tailored to perform on different given classes of anatomy. In this work, we introduce a method for generating shape-aware masks for inpainting, which aims at learning the statistical shape prior. We hypothesize that although the variation of masks improves the generalizability of inpainting models, the shape of the masks should follow the topology of the organs of interest. Hence, we propose an unsupervised guided masking approach based on an off-the-shelf inpainting model and a superpixel over-segmentation algorithm to generate a wide range of shape-dependent masks. Experimental results on abdominal MR image reconstruction show the superiority of our proposed masking method over standard methods using square-shaped or dataset of irregular shape masks.
    Probing the Robustness of Independent Mechanism Analysis for Representation Learning. (arXiv:2207.06137v1 [stat.ML])
    One aim of representation learning is to recover the original latent code that generated the data, a task which requires additional information or inductive biases. A recently proposed approach termed Independent Mechanism Analysis (IMA) postulates that each latent source should influence the observed mixtures independently, complementing standard nonlinear independent component analysis, and taking inspiration from the principle of independent causal mechanisms. While it was shown in theory and experiments that IMA helps recovering the true latents, the method's performance was so far only characterized when the modeling assumptions are exactly satisfied. Here, we test the method's robustness to violations of the underlying assumptions. We find that the benefits of IMA-based regularization for recovering the true sources extend to mixing functions with various degrees of violation of the IMA principle, while standard regularizers do not provide the same merits. Moreover, we show that unregularized maximum likelihood recovers mixing functions which systematically deviate from the IMA principle, and provide an argument elucidating the benefits of IMA-based regularization.
    Logistics, Graphs, and Transformers: Towards improving Travel Time Estimation. (arXiv:2207.05835v1 [cs.LG])
    The problem of travel time estimation is widely considered as the fundamental challenge of modern logistics. The complex nature of interconnections between spatial aspects of roads and temporal dynamics of ground transport still preserves an area to experiment with. However, the total volume of currently accumulated data encourages the construction of the learning models which have the perspective to significantly outperform earlier solutions. In order to address the problems of travel time estimation, we propose a new method based on transformer architecture - TransTTE.
    On the Robustness of Bayesian Neural Networks to Adversarial Attacks. (arXiv:2207.06154v1 [cs.LG])
    Vulnerability to adversarial attacks is one of the principal hurdles to the adoption of deep learning in safety-critical applications. Despite significant efforts, both practical and theoretical, training deep learning models robust to adversarial attacks is still an open problem. In this paper, we analyse the geometry of adversarial attacks in the large-data, overparameterized limit for Bayesian Neural Networks (BNNs). We show that, in the limit, vulnerability to gradient-based attacks arises as a result of degeneracy in the data distribution, i.e., when the data lies on a lower-dimensional submanifold of the ambient space. As a direct consequence, we demonstrate that in this limit BNN posteriors are robust to gradient-based adversarial attacks. Crucially, we prove that the expected gradient of the loss with respect to the BNN posterior distribution is vanishing, even when each neural network sampled from the posterior is vulnerable to gradient-based attacks. Experimental results on the MNIST, Fashion MNIST, and half moons datasets, representing the finite data regime, with BNNs trained with Hamiltonian Monte Carlo and Variational Inference, support this line of arguments, showing that BNNs can display both high accuracy on clean data and robustness to both gradient-based and gradient-free based adversarial attacks.
    Understanding Unfairness in Fraud Detection through Model and Data Bias Interactions. (arXiv:2207.06273v1 [cs.LG])
    In recent years, machine learning algorithms have become ubiquitous in a multitude of high-stakes decision-making applications. The unparalleled ability of machine learning algorithms to learn patterns from data also enables them to incorporate biases embedded within. A biased model can then make decisions that disproportionately harm certain groups in society -- limiting their access to financial services, for example. The awareness of this problem has given rise to the field of Fair ML, which focuses on studying, measuring, and mitigating unfairness in algorithmic prediction, with respect to a set of protected groups (e.g., race or gender). However, the underlying causes for algorithmic unfairness still remain elusive, with researchers divided between blaming either the ML algorithms or the data they are trained on. In this work, we maintain that algorithmic unfairness stems from interactions between models and biases in the data, rather than from isolated contributions of either of them. To this end, we propose a taxonomy to characterize data bias and we study a set of hypotheses regarding the fairness-accuracy trade-offs that fairness-blind ML algorithms exhibit under different data bias settings. On our real-world account-opening fraud use case, we find that each setting entails specific trade-offs, affecting fairness in expected value and variance -- the latter often going unnoticed. Moreover, we show how algorithms compare differently in terms of accuracy and fairness, depending on the biases affecting the data. Finally, we note that under specific data bias conditions, simple pre-processing interventions can successfully balance group-wise error rates, while the same techniques fail in more complex settings.
    QT-Routenet: Improved GNN generalization to larger 5G networks by fine-tuning predictions from queueing theory. (arXiv:2207.06336v1 [cs.NI])
    In order to promote the use of machine learning in 5G, the International Telecommunication Union (ITU) proposed in 2021 the second edition of the ITU AI/ML in 5G challenge, with over 1600 participants from 82 countries. This work details the second place solution overall, which is also the winning solution of the Graph Neural Networking Challenge 2021. We tackle the problem of generalization when applying a model to a 5G network that may have longer paths and larger link capacities than the ones observed in training. To achieve this, we propose to first extract robust features related to Queueing Theory (QT), and then fine-tune the analytical baseline prediction using a modification of the Routenet Graph Neural Network (GNN) model. The proposed solution generalizes much better than simply using Routenet, and manages to reduce the analytical baseline's 10.42 mean absolute percent error to 1.45 (1.27 with an ensemble). This suggests that making small changes to an approximate model that is known to be robust can be an effective way to improve accuracy without compromising generalization.
    Open set learning with augmented category by exploiting unlabelled data (open-LACU). (arXiv:2002.01368v4 [stat.ML] UPDATED)
    Considering the nature of unlabelled data, it is common for partially labelled training datasets to contain samples that belong to novel categories. Although these so-called observed novel categories exist in the training data, they do not belong to any of the training labels. In contrast, open-sets define novel categories as those unobserved during during training, but present during testing. This research is the first to generalize between observed and unobserved novel categories within a new learning policy called open-set learning with augmented category by exploiting unlabeled data or open-LACU. This study conducts a high-level review on novelty detection so to differentiate between research fields that concern observed novel categories, and the research fields that concern unobserved novel categories. Open-LACU is then introduced as a synthesis of the relevant fields to maintain the advantages of each within a single learning policy. Currently, we are finalising the first open-LACU network which will be combined with this pre-print to be sent for publication.
    Machine Learning Assisted Approach for Security-Constrained Unit Commitment. (arXiv:2111.09824v2 [eess.SY] UPDATED)
    Security-constrained unit commitment (SCUC) is solved for power system day-ahead generation scheduling, which is a large-scale mixed-integer linear programming problem and is very computationally intensive. Model reduction of SCUC may bring significant time savings. In this work, a novel approach is proposed to effectively utilize machine learning (ML) to reduce the problem size of SCUC. An ML model using logistic regression (LR) algorithm is proposed and trained with historical nodal demand profiles and the respective commitment schedules. The ML outputs are processed and analyzed to reduce variables and constraints in SCUC. The proposed approach is validated on several standard test systems including IEEE 24-bus system, IEEE 73-bus system, IEEE 118-bus system, synthetic South Carolina 500-bus system and Polish 2383-bus system. Simulation results demonstrate that the use of the prediction from the proposed LR model in SCUC model reduction can substantially reduce the computing time while maintaining solution quality.
    Policy Optimization with Sparse Global Contrastive Explanations. (arXiv:2207.06269v1 [cs.LG])
    We develop a Reinforcement Learning (RL) framework for improving an existing behavior policy via sparse, user-interpretable changes. Our goal is to make minimal changes while gaining as much benefit as possible. We define a minimal change as having a sparse, global contrastive explanation between the original and proposed policy. We improve the current policy with the constraint of keeping that global contrastive explanation short. We demonstrate our framework with a discrete MDP and a continuous 2D navigation domain.
    Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces. (arXiv:2207.05849v1 [cs.LG])
    Designing efficient general-purpose contextual bandit algorithms that work with large -- or even continuous -- action spaces would facilitate application to important scenarios such as information retrieval, recommendation systems, and continuous control. While obtaining standard regret guarantees can be hopeless, alternative regret notions have been proposed to tackle the large action setting. We propose a smooth regret notion for contextual bandits, which dominates previously proposed alternatives. We design a statistically and computationally efficient algorithm -- for the proposed smooth regret -- that works with general function approximation under standard supervised oracles. We also present an adaptive algorithm that automatically adapts to any smoothness level. Our algorithms can be used to recover the previous minimax/Pareto optimal guarantees under the standard regret, e.g., in bandit problems with multiple best arms and Lipschitz/H{\"o}lder bandits. We conduct large-scale empirical evaluations demonstrating the efficacy of our proposed algorithms.
  • Open

    How Faithful is your Synthetic Data? Sample-level Metrics for Evaluating and Auditing Generative Models. (arXiv:2102.08921v2 [cs.LG] UPDATED)
    Devising domain- and model-agnostic evaluation metrics for generative models is an important and as yet unresolved problem. Most existing metrics, which were tailored solely to the image synthesis setup, exhibit a limited capacity for diagnosing the different modes of failure of generative models across broader application domains. In this paper, we introduce a 3-dimensional evaluation metric, ($\alpha$-Precision, $\beta$-Recall, Authenticity), that characterizes the fidelity, diversity and generalization performance of any generative model in a domain-agnostic fashion. Our metric unifies statistical divergence measures with precision-recall analysis, enabling sample- and distribution-level diagnoses of model fidelity and diversity. We introduce generalization as an additional, independent dimension (to the fidelity-diversity trade-off) that quantifies the extent to which a model copies training data -- a crucial performance indicator when modeling sensitive data with requirements on privacy. The three metric components correspond to (interpretable) probabilistic quantities, and are estimated via sample-level binary classification. The sample-level nature of our metric inspires a novel use case which we call model auditing, wherein we judge the quality of individual samples generated by a (black-box) model, discarding low-quality samples and hence improving the overall model performance in a post-hoc manner.
    Rotting Infinitely Many-armed Bandits. (arXiv:2201.12975v2 [cs.LG] UPDATED)
    We consider the infinitely many-armed bandit problem with rotting rewards, where the mean reward of an arm decreases at each pull of the arm according to an arbitrary trend with maximum rotting rate $\varrho=o(1)$. We show that this learning problem has an $\Omega(\max\{\varrho^{1/3}T,\sqrt{T}\})$ worst-case regret lower bound where $T$ is the horizon time. We show that a matching upper bound $\tilde{O}(\max\{\varrho^{1/3}T,\sqrt{T}\})$, up to a poly-logarithmic factor, can be achieved by an algorithm that uses a UCB index for each arm and a threshold value to decide whether to continue pulling an arm or remove the arm from further consideration, when the algorithm knows the value of the maximum rotting rate $\varrho$. We also show that an $\tilde{O}(\max\{\varrho^{1/3}T,T^{3/4}\})$ regret upper bound can be achieved by an algorithm that does not know the value of $\varrho$, by using an adaptive UCB index along with an adaptive threshold value.
    D-CBRS: Accounting For Intra-Class Diversity in Continual Learning. (arXiv:2207.05897v1 [cs.LG])
    Continual learning -- accumulating knowledge from a sequence of learning experiences -- is an important yet challenging problem. In this paradigm, the model's performance for previously encountered instances may substantially drop as additional data are seen. When dealing with class-imbalanced data, forgetting is further exacerbated. Prior work has proposed replay-based approaches which aim at reducing forgetting by intelligently storing instances for future replay. Although Class-Balancing Reservoir Sampling (CBRS) has been successful in dealing with imbalanced data, the intra-class diversity has not been accounted for, implicitly assuming that each instance of a class is equally informative. We present Diverse-CBRS (D-CBRS), an algorithm that allows us to consider within class diversity when storing instances in the memory. Our results show that D-CBRS outperforms state-of-the-art memory management continual learning algorithms on data sets with considerable intra-class diversity.
    Hindsight Learning for MDPs with Exogenous Inputs. (arXiv:2207.06272v1 [cs.LG])
    We develop a reinforcement learning (RL) framework for applications that deal with sequential decisions and exogenous uncertainty, such as resource allocation and inventory management. In these applications, the uncertainty is only due to exogenous variables like future demands. A popular approach is to predict the exogenous variables using historical data and then plan with the predictions. However, this indirect approach requires high-fidelity modeling of the exogenous process to guarantee good downstream decision-making, which can be impractical when the exogenous process is complex. In this work we propose an alternative approach based on hindsight learning which sidesteps modeling the exogenous process. Our key insight is that, unlike Sim2Real RL, we can revisit past decisions in the historical data and derive counterfactual consequences for other actions in these applications. Our framework uses hindsight-optimal actions as the policy training signal and has strong theoretical guarantees on decision-making performance. We develop an algorithm using our framework to allocate compute resources for real-world Microsoft Azure workloads. The results show our approach learns better policies than domain-specific heuristics and Sim2Real RL baselines.
    Stochastic Functional Analysis and Multilevel Vector Field Anomaly Detection. (arXiv:2207.06229v1 [stat.ML])
    Massive vector field datasets are common in multi-spectral optical and radar sensors and modern multimodal MRI data, among many other areas of application. In this paper we develop a novel stochastic functional analysis approach for detecting anomalies based on the covariance structure of nominal stochastic behavior across a domain with multi-band vector field data. An optimal vector field Karhunen-Loeve (KL) expansion is applied to such random field data. A series of multilevel orthogonal functional subspaces is constructed from the geometry of the domain, adapted from the KL expansion. Detection is achieved by examining the projection of the random field on the multilevel basis. The anomalies can be quantified in suitable normed spaces based on local and global information. In addition, reliable hypothesis tests are formed with controllable distributions that do not require prior assumptions on probability distributions of the data. Only the covariance function is needed, which makes for significantly simpler estimates. Furthermore this approach allows stochastic vector-based fusion of anomalies without any loss of information. The method is applied to the important problem of deforestation and degradation in the Amazon forest. This is a complex non-monotonic process, as forests can degrade and recover. This particular problem is further compounded by the presence of clouds that are hard to remove with current masking algorithms. Using multi-spectral satellite data from Sentinel 2, the multilevel filter is constructed and anomalies are treated as deviations from the initial state of the forest. Forest anomalies are quantified with robust hypothesis tests and distinguished from false variations such as cloud cover. Our approach shows the advantage of using multiple bands of data in a vectorized complex, leading to better anomaly detection beyond the capabilities of scalar-based methods.
    (Nearly) Optimal Private Linear Regression via Adaptive Clipping. (arXiv:2207.04686v2 [cs.LG] UPDATED)
    We study the problem of differentially private linear regression where each data point is sampled from a fixed sub-Gaussian style distribution. We propose and analyze a one-pass mini-batch stochastic gradient descent method (DP-AMBSSGD) where points in each iteration are sampled without replacement. Noise is added for DP but the noise standard deviation is estimated online. Compared to existing $(\epsilon, \delta)$-DP techniques which have sub-optimal error bounds, DP-AMBSSGD is able to provide nearly optimal error bounds in terms of key parameters like dimensionality $d$, number of points $N$, and the standard deviation $\sigma$ of the noise in observations. For example, when the $d$-dimensional covariates are sampled i.i.d. from the normal distribution, then the excess error of DP-AMBSSGD due to privacy is $\frac{\sigma^2 d}{N}(1+\frac{d}{\epsilon^2 N})$, i.e., the error is meaningful when number of samples $N= \Omega(d \log d)$ which is the standard operative regime for linear regression. In contrast, error bounds for existing efficient methods in this setting are: $\mathcal{O}\big(\frac{d^3}{\epsilon^2 N^2}\big)$, even for $\sigma=0$. That is, for constant $\epsilon$, the existing techniques require $N=\Omega(d\sqrt{d})$ to provide a non-trivial result.
    Contextual Bandits with Large Action Spaces: Made Practical. (arXiv:2207.05836v1 [cs.LG])
    A central problem in sequential decision making is to develop algorithms that are practical and computationally efficient, yet support the use of flexible, general-purpose models. Focusing on the contextual bandit problem, recent progress provides provably efficient algorithms with strong empirical performance when the number of possible alternatives ("actions") is small, but guarantees for decision making in large, continuous action spaces have remained elusive, leading to a significant gap between theory and practice. We present the first efficient, general-purpose algorithm for contextual bandits with continuous, linearly structured action spaces. Our algorithm makes use of computational oracles for (i) supervised learning, and (ii) optimization over the action space, and achieves sample complexity, runtime, and memory independent of the size of the action space. In addition, it is simple and practical. We perform a large-scale empirical evaluation, and show that our approach typically enjoys superior performance and efficiency compared to standard baselines.
    Conformal prediction for time series. (arXiv:2010.09107v13 [stat.ME] UPDATED)
    We develop a general framework for constructing distribution-free prediction intervals for time series. Theoretically, we establish explicit bounds on conditional and marginal coverage gaps of estimated prediction intervals, which asymptotically converge to zero under additional assumptions. We obtain similar bounds on the size of set differences between oracle and estimated prediction intervals. Methodologically, we introduce a computationally efficient algorithm called EnbPI that wraps around ensemble predictors, which is closely related to conformal prediction (CP) but does not require data exchangeability. EnbPI avoids data-splitting and is computationally efficient by avoiding retraining and thus scalable to sequentially producing prediction intervals. We perform extensive simulation and real-data analyses to demonstrate its effectiveness compared with existing methods.
    TCT: Convexifying Federated Learning using Bootstrapped Neural Tangent Kernels. (arXiv:2207.06343v1 [cs.LG])
    State-of-the-art federated learning methods can perform far worse than their centralized counterparts when clients have dissimilar data distributions. For neural networks, even when centralized SGD easily finds a solution that is simultaneously performant for all clients, current federated optimization methods fail to converge to a comparable solution. We show that this performance disparity can largely be attributed to optimization challenges presented by nonconvexity. Specifically, we find that the early layers of the network do learn useful features, but the final layers fail to make use of them. That is, federated optimization applied to this non-convex problem distorts the learning of the final layers. Leveraging this observation, we propose a Train-Convexify-Train (TCT) procedure to sidestep this issue: first, learn features using off-the-shelf methods (e.g., FedAvg); then, optimize a convexified problem obtained from the network's empirical neural tangent kernel approximation. Our technique yields accuracy improvements of up to +36% on FMNIST and +37% on CIFAR10 when clients have dissimilar data.  ( 2 min )
    Cost-Effective Online Contextual Model Selection. (arXiv:2207.06030v1 [cs.LG])
    How can we collect the most useful labels to learn a model selection policy, when presented with arbitrary heterogeneous data streams? In this paper, we formulate this task as an online contextual active model selection problem, where at each round the learner receives an unlabeled data point along with a context. The goal is to output the best model for any given context without obtaining an excessive amount of labels. In particular, we focus on the task of selecting pre-trained classifiers, and propose a contextual active model selection algorithm (CAMS), which relies on a novel uncertainty sampling query criterion defined on a given policy class for adaptive model selection. In comparison to prior art, our algorithm does not assume a globally optimal model. We provide rigorous theoretical analysis for the regret and query complexity under both adversarial and stochastic settings. Our experiments on several benchmark classification datasets demonstrate the algorithm's effectiveness in terms of both regret and query complexity. Notably, to achieve the same accuracy, CAMS incurs less than 10% of the label cost when compared to the best online model selection baselines on CIFAR10.  ( 2 min )
    Towards understanding how momentum improves generalization in deep learning. (arXiv:2207.05931v1 [cs.LG])
    Stochastic gradient descent (SGD) with momentum is widely used for training modern deep learning architectures. While it is well-understood that using momentum can lead to faster convergence rate in various settings, it has also been observed that momentum yields higher generalization. Prior work argue that momentum stabilizes the SGD noise during training and this leads to higher generalization. In this paper, we adopt another perspective and first empirically show that gradient descent with momentum (GD+M) significantly improves generalization compared to gradient descent (GD) in some deep learning problems. From this observation, we formally study how momentum improves generalization. We devise a binary classification setting where a one-hidden layer (over-parameterized) convolutional neural network trained with GD+M provably generalizes better than the same network trained with GD, when both algorithms are similarly initialized. The key insight in our analysis is that momentum is beneficial in datasets where the examples share some feature but differ in their margin. Contrary to GD that memorizes the small margin data, GD+M still learns the feature in these data thanks to its historical gradients. Lastly, we empirically validate our theoretical findings.  ( 2 min )
    BR-SNIS: Bias Reduced Self-Normalized Importance Sampling. (arXiv:2207.06364v1 [stat.ML])
    Importance Sampling (IS) is a method for approximating expectations under a target distribution using independent samples from a proposal distribution and the associated importance weights. In many applications, the target distribution is known only up to a normalization constant, in which case self-normalized IS (SNIS) can be used. While the use of self-normalization can have a positive effect on the dispersion of the estimator, it introduces bias. In this work, we propose a new method, BR-SNIS, whose complexity is essentially the same as that of SNIS and which significantly reduces bias without increasing the variance. This method is a wrapper in the sense that it uses the same proposal samples and importance weights as SNIS, but makes clever use of iterated sampling--importance resampling (ISIR) to form a bias-reduced version of the estimator. We furnish the proposed algorithm with rigorous theoretical results, including new bias, variance and high-probability bounds, and these are illustrated by numerical examples.  ( 2 min )
    Information-theoretic Inducing Point Placement for High-throughput Bayesian Optimisation. (arXiv:2206.02437v2 [cs.LG] UPDATED)
    Sparse Gaussian Processes are a key component of high-throughput Bayesian optimisation (BO) loops -- an increasingly common setting where evaluation budgets are large and highly parallelised. By using representative subsets of the available data to build approximate posteriors, sparse models dramatically reduce the computational costs of surrogate modelling by relying on a small set of pseudo-observations, the so-called inducing points, in lieu of the full data set. However, current approaches to design inducing points are not appropriate within BO loops as they seek to reduce global uncertainty in the objective function. Thus, the high-fidelity modelling of promising and data-dense regions required for precise optimisation is sacrificed and computational resources are instead wasted on modelling areas of the space already known to be sub-optimal. Inspired by entropy-based BO methods, we propose a novel inducing point design that uses a principled information-theoretic criterion to select inducing points. By choosing inducing points to maximally reduce both global uncertainty and uncertainty in the maximum value of the objective function, we build surrogate models able to support high-precision high-throughput BO.  ( 2 min )
    Unsupervised tree boosting for learning probability distributions. (arXiv:2101.11083v6 [stat.ME] UPDATED)
    We propose an unsupervised tree boosting algorithm for inferring the underlying sampling distribution of an i.i.d.\ sample based on fitting additive tree ensembles in a fashion analogous to supervised tree boosting. Integral to the algorithm is a new notion of "addition" on probability distributions that leads to a coherent notion of "residualization", i.e., subtracting a probability distribution from an observation to remove the distributional structure from the sampling distribution of the latter. We show that these notions arise naturally for univariate distributions through cumulative distribution function (CDF) transforms and compositions due to several "group-like" properties of univariate CDFs. While the traditional multivariate CDF does not preserve these properties, a new definition of multivariate CDF can restore these properties, thereby allowing the notions of "addition" and "residualization" to be formulated for multivariate settings as well. This then gives rise to the unsupervised boosting algorithm based on forward-stagewise fitting of an additive tree ensemble, which sequentially reduces the Kullback-Leibler divergence from the truth. The algorithm allows analytic evaluation of the fitted density and outputs a generative model that can be readily sampled from. We enhance the algorithm with scale-dependent shrinkage and a two-stage strategy that separately fits the marginals and the copula. The algorithm then performs competitively to state-of-the-art deep-learning approaches in multivariate density estimation on multiple benchmark datasets.  ( 3 min )
    Video Coding Using Learned Latent GAN Compression. (arXiv:2207.04324v2 [eess.IV] UPDATED)
    We propose in this paper a new paradigm for facial video compression. We leverage the generative capacity of GANs such as StyleGAN to represent and compress a video, including intra and inter compression. Each frame is inverted in the latent space of StyleGAN, from which the optimal compression is learned. To do so, a diffeomorphic latent representation is learned using a normalizing flows model, where an entropy model can be optimized for image coding. In addition, we propose a new perceptual loss that is more efficient than other counterparts. Finally, an entropy model for video inter coding with residual is also learned in the previously constructed latent representation. Our method (SGANC) is simple, faster to train, and achieves better results for image and video coding compared to state-of-the-art codecs such as VTM, AV1, and recent deep learning techniques. In particular, it drastically minimizes perceptual distortion at low bit rates.
    Open set learning with augmented category by exploiting unlabelled data (open-LACU). (arXiv:2002.01368v4 [stat.ML] UPDATED)
    Considering the nature of unlabelled data, it is common for partially labelled training datasets to contain samples that belong to novel categories. Although these so-called observed novel categories exist in the training data, they do not belong to any of the training labels. In contrast, open-sets define novel categories as those unobserved during during training, but present during testing. This research is the first to generalize between observed and unobserved novel categories within a new learning policy called open-set learning with augmented category by exploiting unlabeled data or open-LACU. This study conducts a high-level review on novelty detection so to differentiate between research fields that concern observed novel categories, and the research fields that concern unobserved novel categories. Open-LACU is then introduced as a synthesis of the relevant fields to maintain the advantages of each within a single learning policy. Currently, we are finalising the first open-LACU network which will be combined with this pre-print to be sent for publication.
    FedShuffle: Recipes for Better Use of Local Work in Federated Learning. (arXiv:2204.13169v2 [cs.LG] UPDATED)
    The practice of applying several local updates before aggregation across clients has been empirically shown to be a successful approach to overcoming the communication bottleneck in Federated Learning (FL). In this work, we propose a general recipe, FedShuffle, that better utilizes the local updates in FL, especially in the heterogeneous regime. Unlike many prior works, FedShuffle does not assume any uniformity in the number of updates per device. Our FedShuffle recipe comprises four simple-yet-powerful ingredients: 1) local shuffling of the data, 2) adjustment of the local learning rates, 3) update weighting, and 4) momentum variance reduction (Cutkosky and Orabona, 2019). We present a comprehensive theoretical analysis of FedShuffle and show that both theoretically and empirically, our approach does not suffer from the objective function mismatch that is present in FL methods which assume homogeneous updates in heterogeneous FL setups, e.g., FedAvg (McMahan et al., 2017). In addition, by combining the ingredients above, FedShuffle improves upon FedNova (Wang et al., 2020), which was previously proposed to solve this mismatch. We also show that FedShuffle with momentum variance reduction can improve upon non-local methods under a Hessian similarity assumption. Finally, through experiments on synthetic and real-world datasets, we illustrate how each of the four ingredients used in FedShuffle helps improve the use of local updates in FL.
    How to Train Your Wide Neural Network Without Backprop: An Input-Weight Alignment Perspective. (arXiv:2106.08453v2 [cs.LG] UPDATED)
    Recent works have examined theoretical and empirical properties of wide neural networks trained in the Neural Tangent Kernel (NTK) regime. Given that biological neural networks are much wider than their artificial counterparts, we consider NTK regime wide neural networks as a possible model of biological neural networks. Leveraging NTK theory, we show theoretically that gradient descent drives layerwise weight updates that are aligned with their input activity correlations weighted by error, and demonstrate empirically that the result also holds in finite-width wide networks. The alignment result allows us to formulate a family of biologically-motivated, backpropagation-free learning rules that are theoretically equivalent to backpropagation in infinite-width networks. We test these learning rules on benchmark problems in feedforward and recurrent neural networks and demonstrate, in wide networks, comparable performance to backpropagation. The proposed rules are particularly effective in low data regimes, which are common in biological learning settings.  ( 2 min )
    Online Active Regression. (arXiv:2207.05945v1 [cs.LG])
    Active regression considers a linear regression problem where the learner receives a large number of data points but can only observe a small number of labels. Since online algorithms can deal with incremental training data and take advantage of low computational cost, we consider an online extension of the active regression problem: the learner receives data points one by one and immediately decides whether it should collect the corresponding labels. The goal is to efficiently maintain the regression of received data points with a small budget of label queries. We propose novel algorithms for this problem under $\ell_p$ loss where $p\in[1,2]$. To achieve a $(1+\epsilon)$-approximate solution, our proposed algorithms only require $\tilde{\mathcal{O}}(\epsilon^{-2} d \log(n\kappa))$ queries of labels, where $n$ is the number of data points and $\kappa$ is a quantity, called the condition number, of the data points. The numerical results verify our theoretical results and show that our methods have comparable performance with offline active regression algorithms.  ( 2 min )
    Goal-Oriented Sensitivity Analysis of Hyperparameters in Deep Learning. (arXiv:2207.06216v1 [stat.ML])
    Tackling new machine learning problems with neural networks always means optimizing numerous hyperparameters that define their structure and strongly impact their performances. In this work, we study the use of goal-oriented sensitivity analysis, based on the Hilbert-Schmidt Independence Criterion (HSIC), for hyperparameter analysis and optimization. Hyperparameters live in spaces that are often complex and awkward. They can be of different natures (categorical, discrete, boolean, continuous), interact, and have inter-dependencies. All this makes it non-trivial to perform classical sensitivity analysis. We alleviate these difficulties to obtain a robust analysis index that is able to quantify hyperparameters' relative impact on a neural network's final error. This valuable tool allows us to better understand hyperparameters and to make hyperparameter optimization more interpretable. We illustrate the benefits of this knowledge in the context of hyperparameter optimization and derive an HSIC-based optimization algorithm that we apply on MNIST and Cifar, classical machine learning data sets, but also on the approximation of Runge function and Bateman equations solution, of interest for scientific machine learning. This method yields neural networks that are both competitive and cost-effective.  ( 2 min )
    Shrinkage Estimation of Higher Order Bochner Integrals. (arXiv:2207.06357v1 [math.ST])
    We consider shrinkage estimation of higher order Hilbert space valued Bochner integrals in a non-parametric setting. We propose estimators that shrink the $U$-statistic estimator of the Bochner integral towards a pre-specified target element in the Hilbert space. Depending on the degeneracy of the kernel of the $U$-statistic, we construct consistent shrinkage estimators with fast rates of convergence, and develop oracle inequalities comparing the risks of the the $U$-statistic estimator and its shrinkage version. Surprisingly, we show that the shrinkage estimator designed by assuming complete degeneracy of the kernel of the $U$-statistic is a consistent estimator even when the kernel is not complete degenerate. This work subsumes and improves upon Krikamol et al., 2016, JMLR and Zhou et al., 2019, JMVA, which only handle mean element and covariance operator estimation in a reproducing kernel Hilbert space. We also specialize our results to normal mean estimation and show that for $d\ge 3$, the proposed estimator strictly improves upon the sample mean in terms of the mean squared error.  ( 2 min )
    Long Term Fairness for Minority Groups via Performative Distributionally Robust Optimization. (arXiv:2207.05777v1 [cs.LG])
    Fairness researchers in machine learning (ML) have coalesced around several fairness criteria which provide formal definitions of what it means for an ML model to be fair. However, these criteria have some serious limitations. We identify four key shortcomings of these formal fairness criteria, and aim to help to address them by extending performative prediction to include a distributionally robust objective.  ( 2 min )
    Contextual Decision Trees. (arXiv:2207.06355v1 [stat.ML])
    Focusing on Random Forests, we propose a multi-armed contextual bandit recommendation framework for feature-based selection of a single shallow tree of the learned ensemble. The trained system, which works on top of the Random Forest, dynamically identifies a base predictor that is responsible for providing the final output. In this way, we obtain local interpretations by observing the rules of the recommended tree. The carried out experiments reveal that our dynamic method is superior to an independent fitted CART decision tree and comparable to the whole black-box Random Forest in terms of predictive performances.  ( 2 min )
    Constraint-Based Causal Structure Learning from Undersampled Graphs. (arXiv:2205.09235v2 [stat.ML] UPDATED)
    Graphical structures estimated by causal learning algorithms from time series data can provide highly misleading causal information if the causal timescale of the generating process fails to match the measurement timescale of the data. Although this problem has been recently recognized, practitioners have limited resources to respond to it, and so must continue using models that they know are likely misleading. Existing methods either (a) require that the difference between causal and measurement timescales is known; or (b) can handle only very small number of random variables when the timescale difference is unknown; or (c) apply to only pairs of variables, though with fewer assumptions about prior knowledge; or (d) return impractically too many solutions. This paper addresses all four challenges. We combine constraint programming with both theoretical insights into the problem structure and prior information about admissible causal interactions. The resulting system provides a practical approach that scales to significantly larger sets (>100) of random variables, does not require precise knowledge of the timescale difference, supports edge misidentification and parametric connection strengths, and can provide the optimum choice among many possible solutions. The cumulative impact of these improvements is gain of multiple orders of magnitude in speed and informativeness.  ( 3 min )
    Learning Bellman Complete Representations for Offline Policy Evaluation. (arXiv:2207.05837v1 [cs.LG])
    We study representation learning for Offline Reinforcement Learning (RL), focusing on the important task of Offline Policy Evaluation (OPE). Recent work shows that, in contrast to supervised learning, realizability of the Q-function is not enough for learning it. Two sufficient conditions for sample-efficient OPE are Bellman completeness and coverage. Prior work often assumes that representations satisfying these conditions are given, with results being mostly theoretical in nature. In this work, we propose BCRL, which directly learns from data an approximately linear Bellman complete representation with good coverage. With this learned representation, we perform OPE using Least Square Policy Evaluation (LSPE) with linear functions in our learned representation. We present an end-to-end theoretical analysis, showing that our two-stage algorithm enjoys polynomial sample complexity provided some representation in the rich class considered is linear Bellman complete. Empirically, we extensively evaluate our algorithm on challenging, image-based continuous control tasks from the Deepmind Control Suite. We show our representation enables better OPE compared to previous representation learning methods developed for off-policy RL (e.g., CURL, SPR). BCRL achieve competitive OPE error with the state-of-the-art method Fitted Q-Evaluation (FQE), and beats FQE when evaluating beyond the initial state distribution. Our ablations show that both linear Bellman complete and coverage components of our method are crucial.  ( 3 min )
    A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP. (arXiv:2207.06147v1 [cs.LG])
    As an important framework for safe Reinforcement Learning, the Constrained Markov Decision Process (CMDP) has been extensively studied in the recent literature. However, despite the rich results under various on-policy learning settings, there still lacks some essential understanding of the offline CMDP problems, in terms of both the algorithm design and the information theoretic sample complexity lower bound. In this paper, we focus on solving the CMDP problems where only offline data are available. By adopting the concept of the single-policy concentrability coefficient $C^*$, we establish an $\Omega\left(\frac{\min\left\{|\mathcal{S}||\mathcal{A}|,|\mathcal{S}|+I\right\} C^*}{(1-\gamma)^3\epsilon^2}\right)$ sample complexity lower bound for the offline CMDP problem, where $I$ stands for the number of constraints. By introducing a simple but novel deviation control mechanism, we propose a near-optimal primal-dual learning algorithm called DPDL. This algorithm provably guarantees zero constraint violation and its sample complexity matches the above lower bound except for an $\tilde{\mathcal{O}}((1-\gamma)^{-1})$ factor. Comprehensive discussion on how to deal with the unknown constant $C^*$ and the potential asynchronous structure on the offline dataset are also included.  ( 2 min )
    Surrogate Likelihoods for Variational Annealed Importance Sampling. (arXiv:2112.12194v2 [stat.ML] UPDATED)
    Variational inference is a powerful paradigm for approximate Bayesian inference with a number of appealing properties, including support for model learning and data subsampling. By contrast MCMC methods like Hamiltonian Monte Carlo do not share these properties but remain attractive since, contrary to parametric methods, MCMC is asymptotically unbiased. For these reasons researchers have sought to combine the strengths of both classes of algorithms, with recent approaches coming closer to realizing this vision in practice. However, supporting data subsampling in these hybrid methods can be a challenge, a shortcoming that we address by introducing a surrogate likelihood that can be learned jointly with other variational parameters. We argue theoretically that the resulting algorithm permits the user to make an intuitive trade-off between inference fidelity and computational cost. In an extensive empirical comparison we show that our method performs well in practice and that it is well-suited for black-box inference in probabilistic programming frameworks.  ( 2 min )
    Probing the Robustness of Independent Mechanism Analysis for Representation Learning. (arXiv:2207.06137v1 [stat.ML])
    One aim of representation learning is to recover the original latent code that generated the data, a task which requires additional information or inductive biases. A recently proposed approach termed Independent Mechanism Analysis (IMA) postulates that each latent source should influence the observed mixtures independently, complementing standard nonlinear independent component analysis, and taking inspiration from the principle of independent causal mechanisms. While it was shown in theory and experiments that IMA helps recovering the true latents, the method's performance was so far only characterized when the modeling assumptions are exactly satisfied. Here, we test the method's robustness to violations of the underlying assumptions. We find that the benefits of IMA-based regularization for recovering the true sources extend to mixing functions with various degrees of violation of the IMA principle, while standard regularizers do not provide the same merits. Moreover, we show that unregularized maximum likelihood recovers mixing functions which systematically deviate from the IMA principle, and provide an argument elucidating the benefits of IMA-based regularization.  ( 2 min )
    Learning Approximately Optimal Contracts. (arXiv:1811.06736v2 [cs.GT] UPDATED)
    In principal-agent models, a principal offers a contract to an agent to perform a certain task. The agent exerts a level of effort that maximizes her utility. The principal is oblivious to the agent's chosen level of effort, and conditions her wage only on possible outcomes. In this work, we consider a model in which the principal is unaware of the agent's utility and action space: she sequentially offers contracts to identical agents, and observes the resulting outcomes. We present an algorithm for learning the optimal contract under mild assumptions. We bound the number of samples needed for the principal to obtain a contract that is within $\eps$ of her optimal net profit for every $\eps>0$. Our results are robust even when considering risk-averse agents. Furthermore, we show that when there are only two possible outcomes or the agent is risk-neutral, the algorithm's outcome approximates the optimal contract described in the classical theory.  ( 2 min )
    Multi-Study Boosting: Theoretical Considerations for Merging vs. Ensembling. (arXiv:2207.04588v2 [stat.ML] UPDATED)
    Cross-study replicability is a powerful model evaluation criterion that emphasizes generalizability of predictions. When training cross-study replicable prediction models, it is critical to decide between merging and treating the studies separately. We study boosting algorithms in the presence of potential heterogeneity in predictor-outcome relationships across studies and compare two multi-study learning strategies: 1) merging all the studies and training a single model, and 2) multi-study ensembling, which involves training a separate model on each study and ensembling the resulting predictions. In the regression setting, we provide theoretical guidelines based on an analytical transition point to determine whether it is more beneficial to merge or to ensemble for boosting with linear learners. In addition, we characterize a bias-variance decomposition of estimation error for boosting with component-wise linear learners. We verify the theoretical transition point result in simulation and illustrate how it can guide the decision on merging vs. ensembling in an application to breast cancer gene expression data.  ( 2 min )
    Jackknife Variability Estimation For Randomized Matrix Computations. (arXiv:2207.06342v1 [math.NA])
    Randomized algorithms based on sketching have become a workhorse tool in low-rank matrix approximation. To use these algorithms safely in applications, they should be coupled with diagnostics to assess the quality of approximation. To meet this need, this paper proposes a jackknife resampling method to estimate the variability of the output of a randomized matrix computation. The variability estimate can recognize that a computation requires additional data or that the computation is intrinsically unstable. As examples, the paper studies jackknife estimates for two randomized low-rank matrix approximation algorithms. In each case, the operation count for the jackknife estimate is independent of the dimensions of the target matrix. In numerical experiments, the estimator accurately assesses variability and also provides an order-of-magnitude estimate of the mean-square error.  ( 2 min )
    Employing Feature Selection Algorithms to Determine the Immune State of Mice with Rheumatoid Arthritis. (arXiv:2207.05882v1 [stat.ML])
    The immune response is a dynamic process by which the body determines whether an antigen is self or nonself. The state of this dynamic process is defined by the relative balance and population of inflammatory and regulatory actors which comprise this decision making process. The goal of immunotherapy as applied to, e.g. Rheumatoid Arthritis (RA), then, is to bias the immune state in favor of the regulatory actors - thereby shutting down autoimmune pathways in the response. While there are several known approaches to immunotherapy, the effectiveness of the therapy will depend on how this intervention alters the evolution of this state. Unfortunately, this process is determined not only by the dynamics of the process, but the state of the system at the time of intervention - a state which is difficult if not impossible to determine prior to application of the therapy.  ( 2 min )
    Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces. (arXiv:2207.05849v1 [cs.LG])
    Designing efficient general-purpose contextual bandit algorithms that work with large -- or even continuous -- action spaces would facilitate application to important scenarios such as information retrieval, recommendation systems, and continuous control. While obtaining standard regret guarantees can be hopeless, alternative regret notions have been proposed to tackle the large action setting. We propose a smooth regret notion for contextual bandits, which dominates previously proposed alternatives. We design a statistically and computationally efficient algorithm -- for the proposed smooth regret -- that works with general function approximation under standard supervised oracles. We also present an adaptive algorithm that automatically adapts to any smoothness level. Our algorithms can be used to recover the previous minimax/Pareto optimal guarantees under the standard regret, e.g., in bandit problems with multiple best arms and Lipschitz/H{\"o}lder bandits. We conduct large-scale empirical evaluations demonstrating the efficacy of our proposed algorithms.  ( 2 min )

  • Open

    [R] How to learn imbalanced data arising from multiple domains?
    Hello everyone! Happy to share our new work on learning from multi-domain imbalanced data. This work was recently accepted at ECCV 2022. Data imbalance is ubiquitous and inherent in the real world. Existing methods for dealing with imbalanced data/long-tailed distribution are only for single domain, that is, the data originates from the same domain; however, natural data can originate from distinct domains, where a minority class in one domain could have abundant instances from other domains. Effectively utilizing data from different domains is likely to improve the performance of long-tail learning over all domains. This paper promotes the paradigm of the traditional imbalanced classification problem and generalizes it from single domain to multiple domains. We formulate the problem of …  ( 89 min )
    [P] Introducing BentoML 1.0 - A faster way to ship your models to production
    Hi everyone! I'm excited to share some news from the BentoML team. When we first open sourced the BentoML project in 2019 and shared it with the community, our vision was to create an open platform that simplifies machine learning model serving and provides a solid foundation for ML teams to operate ML at production scale. And after years of working together with our community towards that goal, we’re thrilled to announce the general availability of BentoML 1.0! What's new in BentoML 1.0? Simplify model packaging and management, both locally and a centralized model repository for teams. A Python-first architecture that scales with powerful optimizations, including parallel inference, adaptive batching, and support for accelerated runtimes. Introducing Yatai for BentoML: Production-first ML platform on Kubernetes ​ To learn more: Introducing BentoML 1.0 Blog post: https://modelserving.com/blog/introducing-bentoml-10 BentoML Tutorial: https://docs.bentoml.org/en/latest/tutorial.html Github Page: https://github.com/bentoml/BentoML Documentation: https://docs.bentoml.org/ submitted by /u/chaoyu [link] [comments]  ( 88 min )
    [N] Andrej Karpathy is leaving Tesla
    Twitter thread: https://twitter.com/karpathy/status/1547332300186066944 submitted by /u/EffectSizeQueen [link] [comments]  ( 92 min )
    [D] I made a site for collaborative image labeling
    I recently launched https://mekabytes.com. The idea is to treat datasets like subreddits where users can come together to build the stuff they want to see. For the datasets there is a github-style landing page with a README to help give guidance on the goals, what images the dataset wants, and any labeling guidelines. There is also a reddit-style comment system where you can reference specific annotations. The idea with that is to provide feedback to help people learn. The coolest part (IMO) is the versioning system. All annotations are versioned and approved by a moderator, gating data quality kind of like a code review. This versioning allows the dataset to be rolled back to any point in time which will help reproduce research even as the dataset continues to evolve. The dataset releases will be open under a creative commons license (BY-NC-SA). To help cover hosting the releases are downloadable for $5 + $1/GB. Basically you can use it for research, personal projects, and share freely once you have it. There is still a ton of stuff to do and I don't even have my first user yet! I've been using it for the last week or so and cleaning up the UX. You can actually annotate decently on mobile. Right now it supports classification and object detection (bounding boxes). I hope to add a free text field in the near future after some niceties like pagination and comment notifications. I would love some feedback if you have any! submitted by /u/tacixat [link] [comments]  ( 88 min )
    30% of Google's Reddit Emotions Dataset is Mislabeled [D]
    Last year, Google released their Reddit Emotions dataset: a collection of 58K Reddit comments human-labeled according to 27 emotions. I analyzed the dataset... and found that a 30% is mislabeled! Some of the errors: *aggressively tells friend I love them\* – mislabeled as ANGER Yay, cold McDonald's. My favorite. – mislabeled as LOVE Hard to be sad these days when I got this guy with me – mislabeled as SADNESS Nobody has the money to. What a joke – mislabeled as JOY ​ I wrote a blog about it here, with more examples and my main two suggestions for how to fix Google's data annotation methodology. submitted by /u/BB4evaTB12 [link] [comments]  ( 92 min )
    [D] How are People Doing “Fair” Few-Shot Training/Evaluation
    After reading through a lot of the non-Meta Learning popular few-shot literature (Prototypical Nets, Matching Nets, etc.) and then looking at other papers/GitHub repos, I’m not totally sure how to build a “fair” training and evaluation setup. Let’s take CIFAR-100 (ignoring CIFAR-FS for now). To set up a few-shot dataset split, I’d take the 100 classes and split up into train/val/test 60/20/40 such that each split has non-overlapping classes - pretty straightforward. But now, I still have 600 examples per class in all splits. Before generating random 5-way-5-shot episodes during training, what’s the fair way to generate Support and Query Sets? Are people first creating another split of the trainset so that the Support set only contains 5 examples per class (60*5=300 total examples) and the rest is in the Query set? If not, something like that then the support set is going to contain a lot of examples to learn from rather than a few. Some methods also directly classify the trainset’s support images for pre-training, assuming that the number of classes overall is known beforehand. But then to do same on the validation and support sets I guess that they replace the FC layer. Finally, when choosing a pre-trained model to start with, it seems absolutely necessary to choose a significantly different domain for evaluation (ex. ImageNet pre-trained ResNet evaluated on CIFAR-FS is bad). tldr; it seems like there’s a lot of small differences in experimental setups for few-shot settings, what’s the best way to be fair for training/evaluation? Also maybe I’m just totally missing something :) submitted by /u/rivew [link] [comments]  ( 89 min )
    [N] [CFP] Order Up! A workshop on higher-order optimization in ML
    Hello all! Since NeurIPS 2022 workshop decisions were recently released, we are proud to announce our 2022 workshop focused on higher-order optimization in machine learning! An (under construction) homepage can be found here. Topics include: Higher-order optimizers, Adaptive gradient methods, Quasi-Newton techniques, and many more! The workshop will run for one day in-person at NeurIPS 2022. There will be dedicated poster and spotlight sessions, including a dedicated junior researcher poster session with an aim to connect junior researchers to more senior ones. We also feature 5 plenary talks from researchers, namely Amir Gholami, Coralia Cartis, Donald Goldfarb, Frank E. Curtis, and Madaleine Udell. ​ We aim to provide submissions 3 reviews each. Paper submission will open soon, and can be found at this link. ​ I am happy to answer any questions, so feel free to DM or comment! Thanks. submitted by /u/order-up-workshop [link] [comments]  ( 88 min )
    [P] Build a Machine Translation System with Forte
    TLDR: This tutorial allows you to build a machine translation system with no glue code using Forte, an open source ML workflow builder. ​ Forte makes it easy to compose any NLP pipeline, regardless of heterogeneity of data and processes, as a modular and easily editable system. It allows users to break down complex problems into composable pipelines and enables inter-operations across tasks through a unified data format. This tutorial includes: 1 — How to read data from source How to create a simple NLP pipeline How to maintain and store the input data 2 — How to process data in pipeline How to perform sentence segmentation How to annotate and query the data How to translate the input text with a pre-trained model How to manage multiple data objects 3 — How to handle ne…  ( 106 min )
    [D] Ensemble regression model - based on models trained on different feature spaces
    What is the best method for constructing an ensemble regression model from numerous KNN regression models that were trained on slightly different feature spaces? I can't only use the features that they have in common. submitted by /u/Rafaelkoll [link] [comments]  ( 87 min )
    [D] When will Neurips 2022 reviews be released?
    I cant recall what day the last couple of years reviews have been released. I know that the review period is closed and so its only a matter of time just wondering if anyone has any idea? submitted by /u/AbjectDrink3276 [link] [comments]  ( 88 min )
    [News] Jupyter Notebook competition - 2 weeks left to enter!
    Are you passionate about #coding, #DataScience or #EarthObservation? 📷 Don't miss out on the chance to showcase your skills and develop new Jupyter Notebooks using #Copernicus data, whilst also being in with a chance of winning cash 📷 prizes! Sign up before 31 July at: https://notebook.wekeo.eu/ https://preview.redd.it/1uwo4ccv4bb91.png?width=1920&format=png&auto=webp&s=18af6de36526d30585d0027d8445f56ed4302516 submitted by /u/EUMETSAT [link] [comments]  ( 87 min )
    [R] Inner Monologue: Embodied Reasoning through Planning with Language Models
    submitted by /u/red75prime [link] [comments]  ( 87 min )
    [D] How best to handle a column that can hold multiple, unbounded number of values?
    Say I have an email dataset. Two of its columns are "sender" and "recipients". Now, the "sender" column will only hold one value in each row. However, "recipients" can be anything in number from 1 to 100, or even more theoretically. In such a scenario, one hot encoding is not a tractable solution. And neither is creating a new row for each unique recipient. So, how best to handle this situation? submitted by /u/ResearcherNo4728 [link] [comments]  ( 89 min )
    [R] So someone actually peer-reviewed this and thought "yeah, looks good"?
    It looks like chronic kidney disease diagnosis has been solved in this paper: https://ieeexplore.ieee.org/document/8693581 I mean no disrespect to the authors, but this publication makes me slightly doubt the peer-review system. Or I am just such an amateur, that I am not seeing the brilliance behind this paper, which is also possible. Have a read through it yourselves submitted by /u/fanconic [link] [comments]  ( 97 min )
    [D] Labeling novel view synthesis for object detection
    Hey all, I've been following the exciting progress of NeRFs, and it lead to me wonder whether there are research on generating novel 2D views from 3D representation, and labeling those examples. I find works for image classification under Novel View Synthesis topics, but for object detection I just can't find anything. Wouldn't it be possible to label 2D training images, construct 3D representation, and use it for generating novel 2D views with corresponding labelings? I see this as highly useful for the object detection domain, where labeling often requires a lot of manual work leading to small datasets and non-robust object representations. Please note if I'm missing something out here. submitted by /u/TemppaHemppa [link] [comments]  ( 88 min )
    [D] tranfer learning with freezing vs unfreezing
    Hi, I have been trying to test self-supervised representation learning on vision-task. In more detail, testing BYOL in cifar-10. I found the trick that they threw away the last layer and put a new layer for the output shape, and the backbone network is frozen during finetuning. I know that the bad last layer can harm to the backbone network during finetuning because network is highly sensitive to even small change in parameter space. But I tried to finetune without freezing, It shows better last performance(accuracy 82% -> 90% at test). So why did they freeze the backbone network and show the results of the experiment? How can I explain this phenomenon? Thank you for reading. submitted by /u/Spiritual_Fig3632 [link] [comments]  ( 88 min )
    Why do Transformers scale so well? [D]
    When you hear people talk about large models, they're usually talking about transformers. What about this architecture has allowed it to be scaled? Have people tried making really large CNNs or RNNs (or just regular MLPs) before? submitted by /u/Adolphins [link] [comments]  ( 92 min )
  • Open

    How does SimSwap (1 image Face Swap tech) work without training?
    SimSwap (https://github.com/neuralchen/SimSwap) is basically a framework that carries out face-swapping in a similar way deepfake technology does with a source and a target video. However, for the source, only one image is required. Not sure how this would work since 1 image isn't enough for actual training. Is this simply face mapping? I feel like the output is a bit too sophisticated for that. submitted by /u/thr0away89 [link] [comments]  ( 86 min )
    Not of This World | Cinematic 4K 24 FPS (FILM)
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 86 min )
    Live developer workshop on how to generate and use synthetic text
    submitted by /u/Repeat-or [link] [comments]  ( 86 min )
    Hello fellow researchers, I’m in a bit of a pickle and require your help. Can one of you get in contact with me for a quick interview at any time regarding the risk of value destruction through the use of artificial intelligence and machine learning. You will have complete anonymity. Thank you
    submitted by /u/Normal-Opportunity33 [link] [comments]  ( 86 min )
    AI Dream 45 - Exploring the Perfect endless Garden
    submitted by /u/LordPewPew777 [link] [comments]  ( 86 min )
    is there any "image to text" ai?
    for example that writes a description of an image or an article based on it. submitted by /u/jose3001 [link] [comments]  ( 86 min )
    Deepmind PLATO: Disappointed expectations and their relevance for physics
    submitted by /u/much_successes [link] [comments]  ( 85 min )
    Colossal-AI, A Unified Deep Learning System for Big Models, Seamlessly Accelerates Large Models at Low Costs with Hugging Face​
    In recent years, the outstanding performance of model scaling has led to an escalation in the size of pre-trained models. Unfortunately, training and even simply fine-tuning large AI models are usually unaffordable, requiring tens or hundreds of GPUs. Existing deep learning frameworks like PyTorch and Tensorflow may not offer a satisfactory solution for very large AI models. Furthermore, advanced knowledge of AI systems is typically required for sophisticated configurations and optimization of specific models. Therefore, many AI users, such as engineers from small and medium-sized enterprises, can’t help but feel overwhelmed by the emergence of large AI models. Accelerate Large Model OPT with Low Cost About Open Pretrained Transformer (OPT) Meta recently released Open Pretrained Transformer (OPT), a 175-Billion parameter AI language model. To encourage AI democratization in the community, Meta has released both the code and trained model weights, which stimulates AI programmers to perform various downstream tasks and application deployments. We will now demonstrate fine-tuning Casual Language Modelling with pre-training weights of the OPT model provided by Hugging Face Hub. Configure with Colossal-AI It is very simple to use the powerful features of Colossal-AI. Users only need a simple configuration file, and are not required to alter their training logic to equip models with their desired features (e.g. mixed-precision training, gradient accumulation, multi-dimensional parallel training, and memory redundancy elimination). Continue reading | Github submitted by /u/ai-lover [link] [comments]  ( 87 min )
    Looking for an AI that i can feed stories to and it will produce a similar story.
    submitted by /u/lxknvlk [link] [comments]  ( 86 min )
    AI vs Datascience
    Is AI and Datascience a blooming field in the US ? I'd like to persue my masters in the US in either in AI or in Datascience. Which stream is better and why ? submitted by /u/Consistent-Flow-414 [link] [comments]  ( 86 min )
    Made a synopsis of my idea of anime "Alter Ego"
    submitted by /u/Mrhelloistaken [link] [comments]  ( 85 min )
    CVPR 2022 Best Paper Honorable Mention: Dual-Shutter Optical Vibration Sensing
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 86 min )
    Artificial Intelligence companies honoured at World’s Top 50 Innovators 2022 in London
    submitted by /u/chelsea_bear [link] [comments]  ( 86 min )
    Using AI to create recipes from the only food in the house (copy.ai GPT3 powered)
    I wanted to see how creative GPT3 could get in creating a recipe for a meal with all I can find in the house . I was.. kind of disappointed.. ​ The prompt : ​ a recipe for a meal with lemons, hot dog sausages, ice cream and coconut ​ The AI generated Recipe : ​ This recipe is so easy, it's almost too good to be true. ​ What you'll need: ​ - 1 lemon ​ - 5 hot dog sausages (If you use vegetarian sausages, the meal will be completely vegan!) ​ - 1 scoop of ice cream (We recommend coconut flavor—it goes so well with the lemons and sausages!) ​ - a pinch of salt (optional) ​ Preparation: ​ Peel the lemon and cut it into slices. ​ Put the hot dog sausages in a pan and fry them until they are browned on both sides. You can add some salt to taste if you like! ​ Add the lemon slices to the pan and cook them until they are golden brown as well. Make sure that they don't burn! This would ruin your meal! You can add some salt again if needed (but try not to add too much). ​ Finally, take out everything from the pan, put it on a plate and top it with ice cream! submitted by /u/No_Condition4115 [link] [comments]  ( 88 min )
    Are all my efforts in vain
    I’ve spent thousands of hours and many years building up my skills for the sole purpose of getting a job as a concept artist. Was all that in vain? With AI we have the tools to create artwork in a fraction of the time it takes a human and they will only get better. I am extremely excited for the future of the industry, but I need to know how much of my life I’ve wasted. I’m having a bit of an existential crisis. submitted by /u/giantpokimanestatue [link] [comments]  ( 90 min )
    Realistic Synthetic Video Avatars (text to video)
    I've been looking into Synthetic Media, specifically AI spokes people or AI Generated video avatars, which whilst maybe not as exciting as Dalle still has some powerful applications. I've found the below examples. Wondering if anybody has come across any useful GIT pages or Colab notebooks in this domain.. I can't seem to find detail on specific models being used, assuming they're GAN models.. I'd like to be able to explore further without having to pay $3 per minute of generated video and being capped at 10 minutes a month https://www.colossyan.com/ Movio - AI Spokesperson Video Creator https://talkingavatar.la/ https://www.rephrase.ai/ https://aistudios.com/ https://synthesys.io/ Create - adam2eve.ai https://www.deepword.co/ submitted by /u/No_Condition4115 [link] [comments]  ( 86 min )
    What Does Artificial Intelligence Means ? How AI Works ?
    submitted by /u/Maruf2014 [link] [comments]  ( 86 min )
    9 Best Artificial Intelligence books for beginners to expert to read in 2022 -
    submitted by /u/Lakshmireddys [link] [comments]  ( 84 min )
  • Open

    Full Lecture Now Available on YouTube - Stanford CS25 l Transformers United - Decision Transformer: Reinforcement Learning via Sequence Modeling: Aditya Grover of UCLA
    In this seminar Aditya introduces a framework that abstracts Reinforcement Learning (RL) as a sequence modeling problem. Watch on YouTube. submitted by /u/Stanford_Online [link] [comments]  ( 86 min )
    "Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents", Huang et al 2022 {G}
    submitted by /u/gwern [link] [comments]  ( 86 min )
    "Inner Monologue: Embodied Reasoning through Planning with Language Models", Huang et al 2022 {G} (extending SayCan PaLM robotics with feedback)
    submitted by /u/gwern [link] [comments]  ( 86 min )
    Hebbian learning is enough for AGI
    Impact Maximization via Hebbian Learning is an approach for AGI and ASI. The approach posits 3 main points 1) Making an impact is the objective function of all life forms n so the objective function of AGI is to maximise impact. Living thing have more impact potential than non-living things and highly intelligent beings haveeven more potential for impact. ANything we do is a kind of impact whether is self-preservation, procreation, meme propogation, DDAO and other derived objectives, and so maximising impact is what an AGI system should do. 2)Impact maximization can happen this way – If an agent relaxes(suspends output action) while perceiving something impactful and action when it perceives a lack of impact/novelity/interesting thing, so as to bring about a change in the environment, it…  ( 89 min )
    Hindsight experience replay in Vectorised environments
    Hi there, I've been using StableBaselines3 and multiprocessing (SubprocVecEnv) however when I add my HER replay buffer it all breaks down. I get the error: ValueError: could not broadcast input array from shape (4,5) into shape (5,) ​ where the 4 is the number of environments and 5 is the action size. Thanks for any advice :) submitted by /u/SuperDuperDooken [link] [comments]  ( 87 min )
  • Open

    The Business Impact of Robotic Process Automation
    In this interview, I spoke with Husan Mahey, author of “Robotic Process Automation with Automation Anywhere,” where he outlines step-by-step the process for setting up automation in a business setting. Robotic Process Automation is a tool that allows users to automate repetitive tasks that would normally be done by a human. These sorts of tedious… Read More »The Business Impact of Robotic Process Automation The post The Business Impact of Robotic Process Automation appeared first on Data Science Central.  ( 20 min )
  • Open

    Rewriting Image Captions for Visual Question Answering Data Creation
    Posted by Soravit Beer Changpinyo and Doron Kukliansky‎, Senior Software Engineers, Google Research Visual Question Answering (VQA) is a useful machine learning (ML) task that requires a model to answer a visual question about an image. What makes it challenging is its multi-task and open-ended nature; it involves solving multiple technical research questions in computer vision and natural language understanding simultaneously. Yet, progress on this task would enable a wide range of applications, from assisting the blind and the visually-impaired or communicating with robots to enhancing the user’s visual experience with external knowledge. Effective and robust VQA systems cannot exist without high-quality, semantically and stylistically diverse large-scale training data of image-questio…  ( 21 min )
  • Open

    Reality check !
    Hello experts I am trying to make a small scale neural cryptography application . I would like to know that (a) if it is feasible to demonstrate this ( proof of concept ) using my home system. (b) will it require pro coding standards , I am an intermediate coder . Thanks in anticipation submitted by /u/Ashamed-Association3 [link] [comments]  ( 86 min )
    Master thesis Neural Networks
    I need to write a MSc thesis for Faculty of Computer Science related to Neural Networks. I am interested in Finance/Economics. At the beginning I started to read about stock return prediction/portfolio selection, but because everyone is doing it, I would like to research something different. What else Economics/Finance related can I write a thesis about? submitted by /u/AnyJello605 [link] [comments]  ( 87 min )
  • Open

    Artificial intelligence in medical diagnosis: methods, algorithms and applications
    Artificial intelligence (AI) has become synonymous with assistance and efficiency. From a technology that was looked at with mistrust as…  ( 10 min )
  • Open

    Coupling streaming AI and HPC ensembles to achieve 100-1000x faster biomolecular simulations. (arXiv:2104.04797v5 [cs.DC] UPDATED)
    Machine learning (ML)-based steering can improve the performance of ensemble-based simulations by allowing for online selection of more scientifically meaningful computations. We present DeepDriveMD, a framework for ML-driven steering of scientific simulations that we have used to achieve orders-of-magnitude improvements in molecular dynamics (MD) performance via effective coupling of ML and HPC on large parallel computers. We discuss the design of DeepDriveMD and characterize its performance. We demonstrate that DeepDriveMD can achieve between 100-1000x acceleration for protein folding simulations relative to other methods, as measured by the amount of simulated time performed, while covering the same conformational landscape as quantified by the states sampled during a simulation. Experiments are performed on leadership-class platforms on up to 1020 nodes. The results establish DeepDriveMD as a high-performance framework for ML-driven HPC simulation scenarios, that supports diverse MD simulation and ML back-ends, and which enables new scientific insights by improving the length and time scales accessible with current computing capacity.  ( 3 min )
    Autoencoding Conditional GAN for Portfolio Allocation Diversification. (arXiv:2207.05701v1 [q-fin.PM])
    Over the decades, the Markowitz framework has been used extensively in portfolio analysis though it puts too much emphasis on the analysis of the market uncertainty rather than on the trend prediction. While generative adversarial network (GAN) and conditional GAN (CGAN) have been explored to generate financial time series and extract features that can help portfolio analysis. The limitation of the CGAN framework stands in putting too much emphasis on generating series rather than keeping features that can help this generator. In this paper, we introduce an autoencoding CGAN (ACGAN) based on deep generative models that learns the internal trend of historical data while modeling market uncertainty and future trends. We evaluate the model on several real-world datasets from both the US and Europe markets, and show that the proposed ACGAN model leads to better portfolio allocation and generates series that are closer to true data compared to the existing Markowitz and CGAN approaches.  ( 2 min )
    AGBoost: Attention-based Modification of Gradient Boosting Machine. (arXiv:2207.05724v1 [cs.LG])
    A new attention-based model for the gradient boosting machine (GBM) called AGBoost (the attention-based gradient boosting) is proposed for solving regression problems. The main idea behind the proposed AGBoost model is to assign attention weights with trainable parameters to iterations of GBM under condition that decision trees are base learners in GBM. Attention weights are determined by applying properties of decision trees and by using the Huber's contamination model which provides an interesting linear dependence between trainable parameters of the attention and the attention weights. This peculiarity allows us to train the attention weights by solving the standard quadratic optimization problem with linear constraints. The attention weights also depend on the discount factor as a tuning parameter, which determines how much the impact of the weight is decreased with the number of iterations. Numerical experiments performed for two types of base learners, original decision trees and extremely randomized trees with various regression datasets illustrate the proposed model.  ( 2 min )
    PAC Reinforcement Learning for Predictive State Representations. (arXiv:2207.05738v1 [cs.LG])
    In this paper we study online Reinforcement Learning (RL) in partially observable dynamical systems. We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models such as Partially Observable Markov Decision Processes (POMDP). PSR represents the states using a set of predictions of future observations and is defined entirely using observable quantities. We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scaling polynomially with respect to all the relevant parameters of the systems. Our algorithm naturally works with function approximation to extend to systems with potentially large state and observation spaces. We show that given a realizable model class, the sample complexity of learning the near optimal policy only scales polynomially with respect to the statistical complexity of the model class, without any explicit polynomial dependence on the size of the state and observation spaces. Notably, our work is the first work that shows polynomial sample complexities to compete with the globally optimal policy in PSRs. Finally, we demonstrate how our general theorem can be directly used to derive sample complexity bounds for special models including $m$-step weakly revealing and $m$-step decodable tabular POMDPs, POMDPs with low-rank latent transition, and POMDPs with linear emission and latent transition.  ( 2 min )
    Improved Batching Strategy For Irregular Time-Series ODE. (arXiv:2207.05708v1 [cs.LG])
    Irregular time series data are prevalent in the real world and are challenging to model with a simple recurrent neural network (RNN). Hence, a model that combines the use of ordinary differential equations (ODE) and RNN was proposed (ODE-RNN) to model irregular time series with higher accuracy, but it suffers from high computational costs. In this paper, we propose an improvement in the runtime on ODE-RNNs by using a different efficient batching strategy. Our experiments show that the new models reduce the runtime of ODE-RNN significantly ranging from 2 times up to 49 times depending on the irregularity of the data while maintaining comparable accuracy. Hence, our model can scale favorably for modeling larger irregular data sets.  ( 2 min )
    Machine Learning model for gas-liquid interface reconstruction in CFD numerical simulations. (arXiv:2207.05684v1 [physics.flu-dyn])
    The volume of fluid (VoF) method is widely used in multi-phase flow simulations to track and locate the interface between two immiscible fluids. A major bottleneck of the VoF method is the interface reconstruction step due to its high computational cost and low accuracy on unstructured grids. We propose a machine learning enhanced VoF method based on Graph Neural Networks (GNN) to accelerate the interface reconstruction on general unstructured meshes. We first develop a methodology to generate a synthetic dataset based on paraboloid surfaces discretized on unstructured meshes. We then train a GNN based model and perform generalization tests. Our results demonstrate the efficiency of a GNN based approach for interface reconstruction in multi-phase flow simulations in the industrial context.  ( 2 min )
    Bayesian Experimental Design for Computed Tomography with the Linearised Deep Image Prior. (arXiv:2207.05714v1 [cs.CV])
    We investigate adaptive design based on a single sparse pilot scan for generating effective scanning strategies for computed tomography reconstruction. We propose a novel approach using the linearised deep image prior. It allows incorporating information from the pilot measurements into the angle selection criteria, while maintaining the tractability of a conjugate Gaussian-linear model. On a synthetically generated dataset with preferential directions, linearised DIP design allows reducing the number of scans by up to 30% relative to an equidistant angle baseline.  ( 2 min )
    HelixFold: An Efficient Implementation of AlphaFold2 using PaddlePaddle. (arXiv:2207.05477v1 [cs.DC])
    Accurate protein structure prediction can significantly accelerate the development of life science. The accuracy of AlphaFold2, a frontier end-to-end structure prediction system, is already close to that of the experimental determination techniques. Due to the complex model architecture and large memory consumption, it requires lots of computational resources and time to implement the training and inference of AlphaFold2 from scratch. The cost of running the original AlphaFold2 is expensive for most individuals and institutions. Therefore, reducing this cost could accelerate the development of life science. We implement AlphaFold2 using PaddlePaddle, namely HelixFold, to improve training and inference speed and reduce memory consumption. The performance is improved by operator fusion, tensor fusion, and hybrid parallelism computation, while the memory is optimized through Recompute, BFloat16, and memory read/write in-place. Compared with the original AlphaFold2 (implemented by Jax) and OpenFold (implemented by PyTorch), HelixFold needs only 7.5 days to complete the full end-to-end training and only 5.3 days when using hybrid parallelism, while both AlphaFold2 and OpenFold take about 11 days. HelixFold saves 1x training time. We verified that HelixFold's accuracy could be on par with AlphaFold2 on the CASP14 and CAMEO datasets. HelixFold's code is available on GitHub for free download: https://github.com/PaddlePaddle/PaddleHelix/tree/dev/apps/protein_folding/helixfold, and we also provide stable web services on https://paddlehelix.baidu.com/app/drug/protein/forecast.  ( 3 min )
    A Machine Learning Data Fusion Model for Soil Moisture Retrieval. (arXiv:2206.09649v2 [physics.ao-ph] UPDATED)
    We develop a deep learning based convolutional-regression model that estimates the volumetric soil moisture content in the top ~5 cm of soil. Input predictors include Sentinel-1 (active radar), Sentinel-2 (optical imagery), and SMAP (passive radar) as well as geophysical variables from SoilGrids and modelled soil moisture fields from GLDAS. The model was trained and evaluated on data from ~1300 in-situ sensors globally over the period 2015 - 2021 and obtained an average per-sensor correlation of 0.727 and ubRMSE of 0.054, and can be used to produce a soil moisture map at a nominal 320m resolution. These results are benchmarked against 13 other soil moisture works at different locations, and an ablation study was used to identify important predictors.  ( 2 min )
    Using Interpretable Machine Learning to Predict Maternal and Fetal Outcomes. (arXiv:2207.05322v1 [cs.LG])
    Most pregnancies and births result in a good outcome, but complications are not uncommon and when they do occur, they can be associated with serious implications for mothers and babies. Predictive modeling has the potential to improve outcomes through better understanding of risk factors, heightened surveillance, and more timely and appropriate interventions, thereby helping obstetricians deliver better care. For three types of complications we identify and study the most important risk factors using Explainable Boosting Machine (EBM), a glass box model, in order to gain intelligibility: (i) Severe Maternal Morbidity (SMM), (ii) shoulder dystocia, and (iii) preterm preeclampsia. While using the interpretability of EBM's to reveal surprising insights into the features contributing to risk, our experiments show EBMs match the accuracy of other black-box ML methods such as deep neural nets and random forests.  ( 2 min )
    RE-Tagger: A light-weight Real-Estate Image Classifier. (arXiv:2207.05696v1 [cs.CV])
    Real-estate image tagging is one of the essential use-cases to save efforts involved in manual annotation and enhance the user experience. This paper proposes an end-to-end pipeline (referred to as RE-Tagger) for the real-estate image classification problem. We present a two-stage transfer learning approach using custom InceptionV3 architecture to classify images into different categories (i.e., bedroom, bathroom, kitchen, balcony, hall, and others). Finally, we released the application as REST API hosted as a web application running on 2 cores machine with 2 GB RAM. The demo video is available here.  ( 2 min )
    Latent Variable Models for Bayesian Causal Discovery. (arXiv:2207.05723v1 [cs.LG])
    Learning predictors that do not rely on spurious correlations involves building causal representations. However, learning such a representation is very challenging. We, therefore, formulate the problem of learning a causal representation from high dimensional data and study causal recovery with synthetic data. This work introduces a latent variable decoder model, Decoder BCD, for Bayesian causal discovery and performs experiments in mildly supervised and unsupervised settings. We present a series of synthetic experiments to characterize important factors for causal discovery and show that using known intervention targets as labels helps in unsupervised Bayesian inference over structure and parameters of linear Gaussian additive noise latent structural causal models.  ( 2 min )
    EfficientLEAF: A Faster LEarnable Audio Frontend of Questionable Use. (arXiv:2207.05508v1 [cs.SD])
    In audio classification, differentiable auditory filterbanks with few parameters cover the middle ground between hard-coded spectrograms and raw audio. LEAF (arXiv:2101.08596), a Gabor-based filterbank combined with Per-Channel Energy Normalization (PCEN), has shown promising results, but is computationally expensive. With inhomogeneous convolution kernel sizes and strides, and by replacing PCEN with better parallelizable operations, we can reach similar results more efficiently. In experiments on six audio classification tasks, our frontend matches the accuracy of LEAF at 3% of the cost, but both fail to consistently outperform a fixed mel filterbank. The quest for learnable audio frontends is not solved.  ( 2 min )
    Investigating the Impact of Independent Rule Fitnesses in a Learning Classifier System. (arXiv:2207.05582v1 [cs.LG])
    Achieving at least some level of explainability requires complex analyses for many machine learning systems, such as common black-box models. We recently proposed a new rule-based learning system, SupRB, to construct compact, interpretable and transparent models by utilizing separate optimizers for the model selection tasks concerning rule discovery and rule set composition.This allows users to specifically tailor their model structure to fulfil use-case specific explainability requirements. From an optimization perspective, this allows us to define clearer goals and we find that -- in contrast to many state of the art systems -- this allows us to keep rule fitnesses independent. In this paper we investigate this system's performance thoroughly on a set of regression problems and compare it against XCSF, a prominent rule-based learning system. We find the overall results of SupRB's evaluation comparable to XCSF's while allowing easier control of model structure and showing a substantially smaller sensitivity to random seeds and data splits. This increased control can aid in subsequently providing explanations for both training and final structure of the model.  ( 2 min )
    Utilizing Excess Resources in Training Neural Networks. (arXiv:2207.05532v1 [cs.LG])
    In this work, we suggest Kernel Filtering Linear Overparameterization (KFLO), where a linear cascade of filtering layers is used during training to improve network performance in test time. We implement this cascade in a kernel filtering fashion, which prevents the trained architecture from becoming unnecessarily deeper. This also allows using our approach with almost any network architecture and let combining the filtering layers into a single layer in test time. Thus, our approach does not add computational complexity during inference. We demonstrate the advantage of KFLO on various network models and datasets in supervised learning.  ( 2 min )
    Long Short-Term Memory to predict 3D Amino acids Positions in GPCR Molecular Dynamics. (arXiv:2207.05682v1 [q-bio.BM])
    G-Protein Coupled Receptors (GPCRs) are a big family of eukaryotic cell transmembrane proteins, responsible for numerous biological processes. From a practical viewpoint around 34\% of the drugs approved by the US Food and Drug Administration target these receptors. They can be analyzed from their simulated molecular dynamics, including the prediction of their behavior in the presence of drugs. In this paper, the capability of Long Short-Term Memory Networks (LSTMs) are evaluated to learn and predict the molecular dynamic trajectories of a receptor. Several models were trained with the 3D position of the amino acids of the receptor considering different transformations on the position of the amino acid, such as their centers of mass, the geometric centers and the position of the $\alpha$--carbon for each amino acid. The error of the prediction of the position was evaluated by the mean average error (MAE) and root-mean-square deviation (RMSD). The LSTM models show a robust performance, with results comparable to the state-of-the-art in non-dynamic 3D predictions. The best MAE and RMSD values were found for the mass center of the amino acids with 0.078 {\AA} and 0.156 {\AA} respectively. This work shows the potential of LSTM to predict the molecular dynamics of GPRCs.  ( 2 min )
    Log-Euclidean Signatures for Intrinsic Distances Between Unaligned Datasets. (arXiv:2202.01671v2 [stat.ML] UPDATED)
    The need for efficiently comparing and representing datasets with unknown alignment spans various fields, from model analysis and comparison in machine learning to trend discovery in collections of medical datasets. We use manifold learning to compare the intrinsic geometric structures of different datasets by comparing their diffusion operators, symmetric positive-definite (SPD) matrices that relate to approximations of the continuous Laplace-Beltrami operator from discrete samples. Existing methods typically assume known data alignment and compare such operators in a pointwise manner. Instead, we exploit the Riemannian geometry of SPD matrices to compare these operators and define a new theoretically-motivated distance based on a lower bound of the log-Euclidean metric. Our framework facilitates comparison of data manifolds expressed in datasets with different sizes, numbers of features, and measurement modalities. Our log-Euclidean signature (LES) distance recovers meaningful structural differences, outperforming competing methods in various application domains.  ( 2 min )
    An Introduction to Lifelong Supervised Learning. (arXiv:2207.04354v2 [cs.LG] UPDATED)
    This primer is an attempt to provide a detailed summary of the different facets of lifelong learning. We start with Chapter 2 which provides a high-level overview of lifelong learning systems. In this chapter, we discuss prominent scenarios in lifelong learning (Section 2.4), provide 8 Introduction a high-level organization of different lifelong learning approaches (Section 2.5), enumerate the desiderata for an ideal lifelong learning system (Section 2.6), discuss how lifelong learning is related to other learning paradigms (Section 2.7), describe common metrics used to evaluate lifelong learning systems (Section 2.8). This chapter is more useful for readers who are new to lifelong learning and want to get introduced to the field without focusing on specific approaches or benchmarks. The remaining chapters focus on specific aspects (either learning algorithms or benchmarks) and are more useful for readers who are looking for specific approaches or benchmarks. Chapter 3 focuses on regularization-based approaches that do not assume access to any data from previous tasks. Chapter 4 discusses memory-based approaches that typically use a replay buffer or an episodic memory to save subset of data across different tasks. Chapter 5 focuses on different architecture families (and their instantiations) that have been proposed for training lifelong learning systems. Following these different classes of learning algorithms, we discuss the commonly used evaluation benchmarks and metrics for lifelong learning (Chapter 6) and wrap up with a discussion of future challenges and important research directions in Chapter 7.
    Tracking Objects as Pixel-wise Distributions. (arXiv:2207.05518v1 [cs.CV])
    Multi-object tracking (MOT) requires detecting and associating objects through frames. Unlike tracking via detected bounding boxes or tracking objects as points, we propose tracking objects as pixel-wise distributions. We instantiate this idea on a transformer-based architecture, P3AFormer, with pixel-wise propagation, prediction, and association. P3AFormer propagates pixel-wise features guided by flow information to pass messages between frames. Furthermore, P3AFormer adopts a meta-architecture to produce multi-scale object feature maps. During inference, a pixel-wise association procedure is proposed to recover object connections through frames based on the pixel-wise prediction. P3AFormer yields 81.2\% in terms of MOTA on the MOT17 benchmark -- the first among all transformer networks to reach 80\% MOTA in literature. P3AFormer also outperforms state-of-the-arts on the MOT20 and KITTI benchmarks.
    Robustness and Personalization in Federated Learning: A Unified Approach via Regularization. (arXiv:2009.06303v3 [cs.LG] UPDATED)
    We present a class of methods for robust, personalized federated learning, called Fed+, that unifies many federated learning algorithms. The principal advantage of this class of methods is to better accommodate the real-world characteristics found in federated training, such as the lack of IID data across parties, the need for robustness to outliers or stragglers, and the requirement to perform well on party-specific datasets. We achieve this through a problem formulation that allows the central server to employ robust ways of aggregating the local models while keeping the structure of local computation intact. Without making any statistical assumption on the degree of heterogeneity of local data across parties, we provide convergence guarantees for Fed+ for convex and non-convex loss functions under different (robust) aggregation methods. The Fed+ theory is also equipped to handle heterogeneous computing environments including stragglers without additional assumptions; specifically, the convergence results cover the general setting where the number of local update steps across parties can vary. We demonstrate the benefits of Fed+ through extensive experiments across standard benchmark datasets.
    Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short Survey. (arXiv:2012.09830v7 [cs.LG] UPDATED)
    Building autonomous machines that can explore open-ended environments, discover possible interactions and build repertoires of skills is a general objective of artificial intelligence. Developmental approaches argue that this can only be achieved by $autotelic$ $agents$: intrinsically motivated learning agents that can learn to represent, generate, select and solve their own problems. In recent years, the convergence of developmental approaches with deep reinforcement learning (RL) methods has been leading to the emergence of a new field: $developmental$ $reinforcement$ $learning$. Developmental RL is concerned with the use of deep RL algorithms to tackle a developmental problem -- the $intrinsically$ $motivated$ $acquisition$ $of$ $open$-$ended$ $repertoires$ $of$ $skills$. The self-generation of goals requires the learning of compact goal encodings as well as their associated goal-achievement functions. This raises new challenges compared to standard RL algorithms originally designed to tackle pre-defined sets of goals using external reward signals. The present paper introduces developmental RL and proposes a computational framework based on goal-conditioned RL to tackle the intrinsically motivated skills acquisition problem. It proceeds to present a typology of the various goal representations used in the literature, before reviewing existing methods to learn to represent and prioritize goals in autonomous systems. We finally close the paper by discussing some open challenges in the quest of intrinsically motivated skills acquisition.
    Wasserstein multivariate auto-regressive models for modeling distributional time series and its application in graph learning. (arXiv:2207.05442v1 [stat.ML])
    We propose a new auto-regressive model for the statistical analysis of multivariate distributional time series. The data of interest consist of a collection of multiple series of probability measures supported over a bounded interval of the real line, and that are indexed by distinct time instants. The probability measures are modelled as random objects in the Wasserstein space. We establish the auto-regressive model in the tangent space at the Lebesgue measure by first centering all the raw measures so that their Fr\'echet means turn to be the Lebesgue measure. Using the theory of iterated random function systems, results on the existence, uniqueness and stationarity of the solution of such a model are provided. We also propose a consistent estimator for the model coefficient. In addition to the analysis of simulated data, the proposed model is illustrated with two real data sets made of observations from age distribution in different countries and bike sharing network in Paris. Finally, due to the positive and boundedness constraints that we impose on the model coefficients, the proposed estimator that is learned under these constraints, naturally has a sparse structure. The sparsity allows furthermore the application of the proposed model in learning a graph of temporal dependency from the multivariate distributional time series.
    Zero-Shot Machine Unlearning. (arXiv:2201.05629v2 [cs.LG] UPDATED)
    Modern privacy regulations grant citizens the right to be forgotten by products, services and companies. In case of machine learning (ML) applications, this necessitates deletion of data not only from storage archives but also from ML models. Due to an increasing need for regulatory compliance required for ML applications, machine unlearning is becoming an emerging research problem. The right to be forgotten requests come in the form of removal of a certain set or class of data from the already trained ML model. Practical considerations preclude retraining of the model from scratch minus the deleted data. The few existing studies use either the whole training data, or a subset of training data, or some metadata stored during training to update the model weights for unlearning. However, strict regulatory compliance requires time-bound deletion of data. Thus, in many cases, no data related to the training process or training samples may be accessible even for the unlearning purpose. We therefore ask the question: is it possible to achieve unlearning with zero training samples? In this paper, we introduce the novel problem of zero-shot machine unlearning that caters for the extreme but practical scenario where zero original data samples are available for use. We then propose two novel solutions for zero-shot machine unlearning based on (a) error minimizing-maximizing noise and (b) gated knowledge transfer. These methods remove the information of the forget data from the model while maintaining the model efficacy on the retain data. The zero-shot approach offers good protection against the model inversion attacks and membership inference attacks. We introduce a new evaluation metric, Anamnesis Index (AIN) to effectively measure the quality of the unlearning method. The experiments show promising results for unlearning in deep learning models on benchmark vision data-sets.
    CGMN: A Contrastive Graph Matching Network for Self-Supervised Graph Similarity Learning. (arXiv:2205.15083v2 [cs.LG] UPDATED)
    Graph similarity learning refers to calculating the similarity score between two graphs, which is required in many realistic applications, such as visual tracking, graph classification, and collaborative filtering. As most of the existing graph neural networks yield effective graph representations of a single graph, little effort has been made for jointly learning two graph representations and calculating their similarity score. In addition, existing unsupervised graph similarity learning methods are mainly clustering-based, which ignores the valuable information embodied in graph pairs. To this end, we propose a contrastive graph matching network (CGMN) for self-supervised graph similarity learning in order to calculate the similarity between any two input graph objects. Specifically, we generate two augmented views for each graph in a pair respectively. Then, we employ two strategies, namely cross-view interaction and cross-graph interaction, for effective node representation learning. The former is resorted to strengthen the consistency of node representations in two views. The latter is utilized to identify node differences between different graphs. Finally, we transform node representations into graph-level representations via pooling operations for graph similarity computation. We have evaluated CGMN on eight real-world datasets, and the experiment results show that the proposed new approach is superior to the state-of-the-art methods in graph similarity learning downstream tasks.
    Physical Passive Patch Adversarial Attacks on Visual Odometry Systems. (arXiv:2207.05729v1 [cs.CV])
    Deep neural networks are known to be susceptible to adversarial perturbations -- small perturbations that alter the output of the network and exist under strict norm limitations. While such perturbations are usually discussed as tailored to a specific input, a universal perturbation can be constructed to alter the model's output on a set of inputs. Universal perturbations present a more realistic case of adversarial attacks, as awareness of the model's exact input is not required. In addition, the universal attack setting raises the subject of generalization to unseen data, where given a set of inputs, the universal perturbations aim to alter the model's output on out-of-sample data. In this work, we study physical passive patch adversarial attacks on visual odometry-based autonomous navigation systems. A visual odometry system aims to infer the relative camera motion between two corresponding viewpoints, and is frequently used by vision-based autonomous navigation systems to estimate their state. For such navigation systems, a patch adversarial perturbation poses a severe security issue, as it can be used to mislead a system onto some collision course. To the best of our knowledge, we show for the first time that the error margin of a visual odometry model can be significantly increased by deploying patch adversarial attacks in the scene. We provide evaluation on synthetic closed-loop drone navigation data and demonstrate that a comparable vulnerability exists in real data. A reference implementation of the proposed method and the reported experiments is provided at https://github.com/patchadversarialattacks/patchadversarialattacks.
    Asteroid Flyby Cycler Trajectory Design Using Deep Neural Networks. (arXiv:2111.11858v3 [astro-ph.IM] UPDATED)
    Asteroid exploration has been attracting more attention in recent years. Nevertheless, we have just visited tens of asteroids while we have discovered more than one million bodies. As our current observation and knowledge should be biased, it is essential to explore multiple asteroids directly to better understand the remains of planetary building materials. One of the mission design solutions is utilizing asteroid flyby cycler trajectories with multiple Earth gravity assists. An asteroid flyby cycler trajectory design problem is a subclass of global trajectory optimization problems with multiple flybys, involving a trajectory optimization problem for a given flyby sequence and a combinatorial optimization problem to decide the sequence of the flybys. As the number of flyby bodies grows, the computation time of this optimization problem expands maliciously. This paper presents a new method to design asteroid flyby cycler trajectories utilizing a surrogate model constructed by deep neural networks approximating trajectory optimization results. Since one of the bottlenecks of machine learning approaches is the computation time to generate massive trajectory databases, we propose an efficient database generation strategy by introducing pseudo-asteroids satisfying the Karush-Kuhn-Tucker conditions. The numerical result applied to JAXA's DESTINY+ mission shows that the proposed method is practically applicable to space mission design and can significantly reduce the computational time for searching asteroid flyby sequences.
    Deep Metric Learning-Based Semi-Supervised Regression With Alternate Learning. (arXiv:2202.11388v2 [cs.CV] UPDATED)
    This paper introduces a novel deep metric learning-based semi-supervised regression (DML-S2R) method for parameter estimation problems. The proposed DML-S2R method aims to mitigate the problems of insufficient amount of labeled samples without collecting any additional sample with a target value. To this end, it is made up of two main steps: i) pairwise similarity modeling with scarce labeled data; and ii) triplet-based metric learning with abundant unlabeled data. The first step aims to model pairwise sample similarities by using a small number of labeled samples. This is achieved by estimating the target value differences of labeled samples with a Siamese neural network (SNN). The second step aims to learn a triplet-based metric space (in which similar samples are close to each other and dissimilar samples are far apart from each other) when the number of labeled samples is insufficient. This is achieved by employing the SNN of the first step for triplet-based deep metric learning that exploits not only labeled samples but also unlabeled samples. For the end-to-end training of DML-S2R, we investigate an alternate learning strategy for the two steps. Due to this strategy, the encoded information in each step becomes a guidance for learning phase of the other step. The experimental results confirm the success of DML-S2R compared to the state-of-the-art semi-supervised regression methods. The code of the proposed method is publicly available at https://git.tu-berlin.de/rsim/DML-S2R.
    Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders. (arXiv:2203.12742v2 [cs.LG] UPDATED)
    Bayesian optimization (BayesOpt) is a gold standard for query-efficient continuous optimization. However, its adoption for drug design has been hindered by the discrete, high-dimensional nature of the decision variables. We develop a new approach (LaMBO) which jointly trains a denoising autoencoder with a discriminative multi-task Gaussian process head, allowing gradient-based optimization of multi-objective acquisition functions in the latent space of the autoencoder. These acquisition functions allow LaMBO to balance the explore-exploit tradeoff over multiple design rounds, and to balance objective tradeoffs by optimizing sequences at many different points on the Pareto frontier. We evaluate LaMBO on two small-molecule design tasks, and introduce new tasks optimizing \emph{in silico} and \emph{in vitro} properties of large-molecule fluorescent proteins. In our experiments LaMBO outperforms genetic optimizers and does not require a large pretraining corpus, demonstrating that BayesOpt is practical and effective for biological sequence design.
    Docent: A content-based recommendation system to discover contemporary art. (arXiv:2207.05648v1 [cs.LG])
    Recommendation systems have been widely used in various domains such as music, films, e-shopping etc. After mostly avoiding digitization, the art world has recently reached a technological turning point due to the pandemic, making online sales grow significantly as well as providing quantitative online data about artists and artworks. In this work, we present a content-based recommendation system on contemporary art relying on images of artworks and contextual metadata of artists. We gathered and annotated artworks with advanced and art-specific information to create a completely unique database that was used to train our models. With this information, we built a proximity graph between artworks. Similarly, we used NLP techniques to characterize the practices of the artists and we extracted information from exhibitions and other event history to create a proximity graph between artists. The power of graph analysis enables us to provide an artwork recommendation system based on a combination of visual and contextual information from artworks and artists. After an assessment by a team of art specialists, we get an average final rating of 75% of meaningful artworks when compared to their professional evaluations.
    From Spectral Graph Convolutions to Large Scale Graph Convolutional Networks. (arXiv:2207.05669v1 [cs.LG])
    Graph Convolutional Networks (GCNs) have been shown to be a powerful concept that has been successfully applied to a large variety of tasks across many domains over the past years. In this work we study the theory that paved the way to the definition of GCN, including related parts of classical graph theory. We also discuss and experimentally demonstrate key properties and limitations of GCNs such as those caused by the statistical dependency of samples, introduced by the edges of the graph, which causes the estimates of the full gradient to be biased. Another limitation we discuss is the negative impact of minibatch sampling on the model performance. As a consequence, during parameter update, gradients are computed on the whole dataset, undermining scalability to large graphs. To account for this, we research alternative methods which allow to safely learn good parameters while sampling only a subset of data per iteration. We reproduce the results reported in the work of Kipf et al. and propose an implementation inspired to SIGN, which is a sampling-free minibatch method. Eventually we compare the two implementations on a benchmark dataset, proving that they are comparable in terms of prediction accuracy for the task of semi-supervised node classification.
    CompoundE: Knowledge Graph Embedding with Translation, Rotation and Scaling Compound Operations. (arXiv:2207.05324v1 [cs.AI])
    Translation, rotation, and scaling are three commonly used geometric manipulation operations in image processing. Besides, some of them are successfully used in developing effective knowledge graph embedding (KGE) models such as TransE and RotatE. Inspired by the synergy, we propose a new KGE model by leveraging all three operations in this work. Since translation, rotation, and scaling operations are cascaded to form a compound one, the new model is named CompoundE. By casting CompoundE in the framework of group theory, we show that quite a few scoring-function-based KGE models are special cases of CompoundE. CompoundE extends the simple distance-based relation to relation-dependent compound operations on head and/or tail entities. To demonstrate the effectiveness of CompoundE, we conduct experiments on three popular KG completion datasets. Experimental results show that CompoundE consistently achieves the state of-the-art performance.
    Modern Views of Machine Learning for Precision Psychiatry. (arXiv:2204.01607v2 [cs.LG] UPDATED)
    In light of the NIMH's Research Domain Criteria (RDoC), the advent of functional neuroimaging, novel technologies and methods provide new opportunities to develop precise and personalized prognosis and diagnosis of mental disorders. Machine learning (ML) and artificial intelligence (AI) technologies are playing an increasingly critical role in the new era of precision psychiatry. Combining ML/AI with neuromodulation technologies can potentially provide explainable solutions in clinical practice and effective therapeutic treatment. Advanced wearable and mobile technologies also call for the new role of ML/AI for digital phenotyping in mobile mental health. In this review, we provide a comprehensive review of the ML methodologies and applications by combining neuroimaging, neuromodulation, and advanced mobile technologies in psychiatry practice. Additionally, we review the role of ML in molecular phenotyping and cross-species biomarker identification in precision psychiatry. We further discuss explainable AI (XAI) and causality testing in a closed-human-in-the-loop manner, and highlight the ML potential in multimedia information extraction and multimodal data fusion. Finally, we discuss conceptual and practical challenges in precision psychiatry and highlight ML opportunities in future research.
    Uniform Manifold Approximation with Two-phase Optimization. (arXiv:2205.00420v2 [cs.LG] UPDATED)
    We introduce Uniform Manifold Approximation with Two-phase Optimization (UMATO), a dimensionality reduction (DR) technique that improves UMAP to capture the global structure of high-dimensional data more accurately. In UMATO, optimization is divided into two phases so that the resulting embeddings can depict the global structure reliably while preserving the local structure with sufficient accuracy. In the first phase, hub points are identified and projected to construct a skeletal layout for the global structure. In the second phase, the remaining points are added to the embedding preserving the regional characteristics of local areas. Through quantitative experiments, we found that UMATO (1) outperformed widely used DR techniques in preserving the global structure while (2) producing competitive accuracy in representing the local structure. We also verified that UMATO is preferable in terms of robustness over diverse initialization methods, number of epochs, and subsampling techniques.
    Horizontal Federated Learning and Secure Distributed Training for Recommendation System with Intel SGX. (arXiv:2207.05079v1 [cs.LG])
    With the advent of big data era and the development of artificial intelligence and other technologies, data security and privacy protection have become more important. Recommendation systems have many applications in our society, but the model construction of recommendation systems is often inseparable from users' data. Especially for deep learning-based recommendation systems, due to the complexity of the model and the characteristics of deep learning itself, its training process not only requires long training time and abundant computational resources but also needs to use a large amount of user data, which poses a considerable challenge in terms of data security and privacy protection. How to train a distributed recommendation system while ensuring data security has become an urgent problem to be solved. In this paper, we implement two schemes, Horizontal Federated Learning and Secure Distributed Training, based on Intel SGX(Software Guard Extensions), an implementation of a trusted execution environment, and TensorFlow framework, to achieve secure, distributed recommendation system-based learning schemes in different scenarios. We experiment on the classical Deep Learning Recommendation Model (DLRM), which is a neural network-based machine learning model designed for personalization and recommendation, and the results show that our implementation introduces approximately no loss in model performance. The training speed is within acceptable limits.
    TabSynDex: A Universal Metric for Robust Evaluation of Synthetic Tabular Data. (arXiv:2207.05295v1 [cs.LG])
    Synthetic tabular data generation becomes crucial when real data is limited, expensive to collect, or simply cannot be used due to privacy concerns. However, producing good quality synthetic data is challenging. Several probabilistic, statistical, and generative adversarial networks (GANs) based approaches have been presented for synthetic tabular data generation. Once generated, evaluating the quality of the synthetic data is quite challenging. Some of the traditional metrics have been used in the literature but there is lack of a common, robust, and single metric. This makes it difficult to properly compare the effectiveness of different synthetic tabular data generation methods. In this paper we propose a new universal metric, TabSynDex, for robust evaluation of synthetic data. TabSynDex assesses the similarity of synthetic data with real data through different component scores which evaluate the characteristics that are desirable for "high quality" synthetic data. Being a single score metric, TabSynDex can also be used to observe and evaluate the training of neural network based approaches. This would help in obtaining insights that was not possible earlier. Further, we present several baseline models for comparative analysis of the proposed evaluation metric with existing generative models.
    Uncertainty-Aware Learning Against Label Noise on Imbalanced Datasets. (arXiv:2207.05471v1 [stat.ML])
    Learning against label noise is a vital topic to guarantee a reliable performance for deep neural networks. Recent research usually refers to dynamic noise modeling with model output probabilities and loss values, and then separates clean and noisy samples. These methods have gained notable success. However, unlike cherry-picked data, existing approaches often cannot perform well when facing imbalanced datasets, a common scenario in the real world. We thoroughly investigate this phenomenon and point out two major issues that hinder the performance, i.e., \emph{inter-class loss distribution discrepancy} and \emph{misleading predictions due to uncertainty}. The first issue is that existing methods often perform class-agnostic noise modeling. However, loss distributions show a significant discrepancy among classes under class imbalance, and class-agnostic noise modeling can easily get confused with noisy samples and samples in minority classes. The second issue refers to that models may output misleading predictions due to epistemic uncertainty and aleatoric uncertainty, thus existing methods that rely solely on the output probabilities may fail to distinguish confident samples. Inspired by our observations, we propose an Uncertainty-aware Label Correction framework~(ULC) to handle label noise on imbalanced datasets. First, we perform epistemic uncertainty-aware class-specific noise modeling to identify trustworthy clean samples and refine/discard highly confident true/corrupted labels. Then, we introduce aleatoric uncertainty in the subsequent learning process to prevent noise accumulation in the label noise modeling process. We conduct experiments on several synthetic and real-world datasets. The results demonstrate the effectiveness of the proposed method, especially on imbalanced datasets.
    Practical Attacks on Machine Learning: A Case Study on Adversarial Windows Malware. (arXiv:2207.05548v1 [cs.CR])
    While machine learning is vulnerable to adversarial examples, it still lacks systematic procedures and tools for evaluating its security in different application contexts. In this article, we discuss how to develop automated and scalable security evaluations of machine learning using practical attacks, reporting a use case on Windows malware detection.
    Efficient and Privacy Preserving Group Signature for Federated Learning. (arXiv:2207.05297v1 [cs.CR])
    Federated Learning (FL) is a Machine Learning (ML) technique that aims to reduce the threats to user data privacy. Training is done using the raw data on the users' device, called clients, and only the training results, called gradients, are sent to the server to be aggregated and generate an updated model. However, we cannot assume that the server can be trusted with private information, such as metadata related to the owner or source of the data. So, hiding the client information from the server helps reduce privacy-related attacks. Therefore, the privacy of the client's identity, along with the privacy of the client's data, is necessary to make such attacks more difficult. This paper proposes an efficient and privacy-preserving protocol for FL based on group signature. A new group signature for federated learning, called GSFL, is designed to not only protect the privacy of the client's data and identity but also significantly reduce the computation and communication costs considering the iterative process of federated learning. We show that GSFL outperforms existing approaches in terms of computation, communication, and signaling costs. Also, we show that the proposed protocol can handle various security attacks in the federated learning environment.
    Quantum Neural Network Classifiers: A Tutorial. (arXiv:2206.02806v2 [quant-ph] UPDATED)
    Machine learning has achieved dramatic success over the past decade, with applications ranging from face recognition to natural language processing. Meanwhile, rapid progress has been made in the field of quantum computation including developing both powerful quantum algorithms and advanced quantum devices. The interplay between machine learning and quantum physics holds the intriguing potential for bringing practical applications to the modern society. Here, we focus on quantum neural networks in the form of parameterized quantum circuits. We will mainly discuss different structures and encoding strategies of quantum neural networks for supervised learning tasks, and benchmark their performance utilizing Yao.jl, a quantum simulation package written in Julia Language. The codes are efficient, aiming to provide convenience for beginners in scientific works such as developing powerful variational quantum learning models and assisting the corresponding experimental demonstrations.
    A Baseline for Detecting Out-of-Distribution Examples in Image Captioning. (arXiv:2207.05418v1 [cs.CV])
    Image captioning research achieved breakthroughs in recent years by developing neural models that can generate diverse and high-quality descriptions for images drawn from the same distribution as training images. However, when facing out-of-distribution (OOD) images, such as corrupted images, or images containing unknown objects, the models fail in generating relevant captions. In this paper, we consider the problem of OOD detection in image captioning. We formulate the problem and suggest an evaluation setup for assessing the model's performance on the task. Then, we analyze and show the effectiveness of the caption's likelihood score at detecting and rejecting OOD images, which implies that the relatedness between the input image and the generated caption is encapsulated within the score.
    Cognition in Dynamical Systems, Second Edition. (arXiv:1805.00787v2 [cs.MA] UPDATED)
    Cognition is the process of knowing. As carried out by a dynamical system, it is the process by which the system absorbs information into its state. A complex network of agents cognizes knowledge about its environment, internal dynamics and initial state by forming emergent, macro-level patterns. Such patterns require each agent to find its place while partially aware of the whole pattern. Such partial awareness can be achieved by separating the system dynamics into two parts by timescale: the propagation dynamics and the pattern dynamics. The fast propagation dynamics describe the spread of signals across the network. If they converge to a fixed point for any quasi-static state of the slow pattern dynamics, that fixed point represents an aggregate of macro-level information. On longer timescales, agents coordinate via positive feedback to form patterns, which are defined using closed walks in the graph of agents. Patterns can be coherent, in that every part of the pattern depends on every other part for context. Coherent patterns are acausal, in that (a) they cannot be predicted and (b) no part of the stored knowledge can be mapped to any part of the pattern, or vice versa. A cognitive network's knowledge is encoded or embodied by the selection of patterns which emerge. The theory of cognition summarized here can model autocatalytic reaction-diffusion systems, artificial neural networks, market economies and ant colony optimization, among many other real and virtual systems. This theory suggests a new understanding of complexity as a lattice of contexts rather than a single measure.
    Prediction of Maneuvering Status for Aerial Vehicles using Supervised Learning Methods. (arXiv:2206.10303v2 [cs.RO] UPDATED)
    Aerial Vehicles follow a guided approach based on Latitude, Longitude and Altitude. This information can be used for calculating the status of maneuvering for the aerial vehicles along the line of trajectory. This is a binary classification problem and Machine Learning can be leveraged for solving such problem. In this paper we present a methodology for deriving maneuvering status and its prediction using Linear, Distance Metric, Discriminant Analysis and Boosting Ensemble supervised learning methods. We provide various metrics along the line in the results section that give condensed comparison of the appropriate algorithm for prediction of the maneuvering status.
    WeShort: Out-of-distribution Detection With Weak Shortcut structure. (arXiv:2207.05055v1 [cs.LG])
    Neural networks have achieved impressive performance for data in the distribution which is the same as the training set but can produce an overconfident incorrect result for the data these networks have never seen. Therefore, it is essential to detect whether inputs come from out-of-distribution(OOD) in order to guarantee the safety of neural networks deployed in the real world. In this paper, we propose a simple and effective post-hoc technique, WeShort, to reduce the overconfidence of neural networks on OOD data. Our method is inspired by the observation of the internal residual structure, which shows the separation of the OOD and in-distribution (ID) data in the shortcut layer. Our method is compatible with different OOD detection scores and can generalize well to different architectures of networks. We demonstrate our method on various OOD datasets to show its competitive performances and provide reasonable hypotheses to explain why our method works. On the ImageNet benchmark, Weshort achieves state-of-the-art performance on the false positive rate (FPR95) and the area under the receiver operating characteristic (AUROC) on the family of post-hoc methods.
    BASED-XAI: Breaking Ablation Studies Down for Explainable Artificial Intelligence. (arXiv:2207.05566v1 [cs.LG])
    Explainable artificial intelligence (XAI) methods lack ground truth. In its place, method developers have relied on axioms to determine desirable properties for their explanations' behavior. For high stakes uses of machine learning that require explainability, it is not sufficient to rely on axioms as the implementation, or its usage, can fail to live up to the ideal. As a result, there exists active research on validating the performance of XAI methods. The need for validation is especially magnified in domains with a reliance on XAI. A procedure frequently used to assess their utility, and to some extent their fidelity, is an ablation study. By perturbing the input variables in rank order of importance, the goal is to assess the sensitivity of the model's performance. Perturbing important variables should correlate with larger decreases in measures of model capability than perturbing less important features. While the intent is clear, the actual implementation details have not been studied rigorously for tabular data. Using five datasets, three XAI methods, four baselines, and three perturbations, we aim to show 1) how varying perturbations and adding simple guardrails can help to avoid potentially flawed conclusions, 2) how treatment of categorical variables is an important consideration in both post-hoc explainability and ablation studies, and 3) how to identify useful baselines for XAI methods and viable perturbations for ablation studies.
    "That's so cute!": The CARE Dataset for Affective Response Detection. (arXiv:2201.11895v2 [cs.LG] UPDATED)
    Social media plays an increasing role in our communication with friends and family, and our consumption of information and entertainment. Hence, to design effective ranking functions for posts on social media, it would be useful to predict the affective response to a post (e.g., whether the user is likely to be humored, inspired, angered, informed). Similar to work on emotion recognition (which focuses on the affect of the publisher of the post), the traditional approach to recognizing affective response would involve an expensive investment in human annotation of training data. We introduce CARE$_{db}$, a dataset of 230k social media posts annotated according to 7 affective responses using the Common Affective Response Expression (CARE) method. The CARE method is a means of leveraging the signal that is present in comments that are posted in response to a post, providing high-precision evidence about the affective response of the readers to the post without human annotation. Unlike human annotation, the annotation process we describe here can be iterated upon to expand the coverage of the method, particularly for new affective responses. We present experiments that demonstrate that the CARE annotations compare favorably with crowd-sourced annotations. Finally, we use CARE$_{db}$ to train competitive BERT-based models for predicting affective response as well as emotion detection, demonstrating the utility of the dataset for related tasks.
    Using Machine Learning to Reduce Observational Biases When Detecting New Impacts on Mars. (arXiv:2207.05679v1 [cs.LG])
    The current inventory of recent (fresh) impacts on Mars shows a strong bias towards areas of low thermal inertia. These areas are generally visually bright, and impacts create dark scours and rays that make them easier to detect. It is expected that impacts occur at a similar rate in areas of higher thermal inertia, but those impacts are under-detected. This study investigates the use of a trained machine learning classifier to increase the detection of fresh impacts on Mars using CTX data. This approach discovered 69 new fresh impacts that have been confirmed with follow-up HiRISE images. We found that examining candidates partitioned by thermal inertia (TI) values, which is only possible due to the large number of machine learning candidates, helps reduce the observational bias and increase the number of known high-TI impacts.
    Dynamic Budget Throttling in Repeated Second-Price Auctions. (arXiv:2207.04690v2 [cs.GT] UPDATED)
    Throttling is one of the most popular budget control methods in today's online advertising markets. When a budget-constrained advertiser employs throttling, she can choose whether or not to participate in an auction after the advertising platform recommends a bid. This paper focuses on the dynamic budget throttling process in repeated second-price auctions from a theoretical view. An essential feature of the underlying problem is that the advertiser does not know the distribution of the highest competing bid upon entering the market. To model the difficulty of eliminating such uncertainty, we consider two different information structures. The advertiser could obtain the highest competing bid in each round with full-information feedback. Meanwhile, with partial information feedback, the advertiser could only have access to the highest competing bid in the auctions she participates in. We propose the OGD-CB algorithm, which involves simultaneous distribution learning and revenue optimization. In both settings, we demonstrate that this algorithm guarantees an $O(\sqrt{T\log T})$ regret with probability $1 - O(1/T)$ relative to the fluid adaptive throttling benchmark. By proving a lower bound of $\Omega(\sqrt{T})$ on the minimal regret for even the hindsight optimum, we establish the near optimality of our algorithm. Finally, we compare the fluid optimum of throttling to that of pacing, another widely adopted budget control method. The numerical relationship of these benchmarks sheds new light on the understanding of different online algorithms for revenue maximization under budget constraints.
    Shapley Computations Using Surrogate Model-Based Trees. (arXiv:2207.05214v1 [stat.ML])
    Shapley-related techniques have gained attention as both global and local interpretation tools because of their desirable properties. However, their computation using conditional expectations is computationally expensive. Approximation methods suggested in the literature have limitations. This paper proposes the use of a surrogate model-based tree to compute Shapley and SHAP values based on conditional expectation. Simulation studies show that the proposed algorithm provides improvements in accuracy, unifies global Shapley and SHAP interpretation, and the thresholding method provides a way to trade-off running time and accuracy.
    Benchmarking of eight recurrent neural network variants for breath phase and adventitious sound detection on a self-developed open-access lung sound database-HF_Lung_V1. (arXiv:2102.03049v3 [cs.SD] UPDATED)
    A reliable, remote, and continuous real-time respiratory sound monitor with automated respiratory sound analysis ability is urgently required in many clinical scenarios-such as in monitoring disease progression of coronavirus disease 2019-to replace conventional auscultation with a handheld stethoscope. However, a robust computerized respiratory sound analysis algorithm has not yet been validated in practical applications. In this study, we developed a lung sound database (HF_Lung_V1) comprising 9,765 audio files of lung sounds (duration of 15 s each), 34,095 inhalation labels, 18,349 exhalation labels, 13,883 continuous adventitious sound (CAS) labels (comprising 8,457 wheeze labels, 686 stridor labels, and 4,740 rhonchi labels), and 15,606 discontinuous adventitious sound labels (all crackles). We conducted benchmark tests for long short-term memory (LSTM), gated recurrent unit (GRU), bidirectional LSTM (BiLSTM), bidirectional GRU (BiGRU), convolutional neural network (CNN)-LSTM, CNN-GRU, CNN-BiLSTM, and CNN-BiGRU models for breath phase detection and adventitious sound detection. We also conducted a performance comparison between the LSTM-based and GRU-based models, between unidirectional and bidirectional models, and between models with and without a CNN. The results revealed that these models exhibited adequate performance in lung sound analysis. The GRU-based models outperformed, in terms of F1 scores and areas under the receiver operating characteristic curves, the LSTM-based models in most of the defined tasks. Furthermore, all bidirectional models outperformed their unidirectional counterparts. Finally, the addition of a CNN improved the accuracy of lung sound analysis, especially in the CAS detection tasks.
    Integrated multimodal artificial intelligence framework for healthcare applications. (arXiv:2202.12998v2 [cs.LG] UPDATED)
    Artificial intelligence (AI) systems hold great promise to improve healthcare over the next decades. Specifically, AI systems leveraging multiple data sources and input modalities are poised to become a viable method to deliver more accurate results and deployable pipelines across a wide range of applications. In this work, we propose and evaluate a unified Holistic AI in Medicine (HAIM) framework to facilitate the generation and testing of AI systems that leverage multimodal inputs. Our approach uses generalizable data pre-processing and machine learning modeling stages that can be readily adapted for research and deployment in healthcare environments. We evaluate our HAIM framework by training and characterizing 14,324 independent models based on MIMIC-IV-MM, a multimodal clinical database (N=34,537 samples) containing 7,279 unique hospitalizations and 6,485 patients, spanning all possible input combinations of 4 data modalities (i.e., tabular, time-series, text and images), 11 unique data sources and 12 predictive tasks. We show that this framework can consistently and robustly produce models that outperform similar single-source approaches across various healthcare demonstrations (by 6-33%), including 10 distinct chest pathology diagnoses, along with length-of-stay and 48-hour mortality predictions. We also quantify the contribution of each modality and data source using Shapley values, which demonstrates the heterogeneity in data type importance and the necessity of multimodal inputs across different healthcare-relevant tasks. The generalizable properties and flexibility of our Holistic AI in Medicine (HAIM) framework could offer a promising pathway for future multimodal predictive systems in clinical and operational healthcare settings.
    The Untold Impact of Learning Approaches on Software Fault-Proneness Predictions. (arXiv:2207.05710v1 [cs.SE])
    Software fault-proneness prediction is an active research area, with many factors affecting prediction performance extensively studied. However, the impact of the learning approach (i.e., the specifics of the data used for training and the target variable being predicted) on the prediction performance has not been studied, except for one initial work. This paper explores the effects of two learning approaches, useAllPredictAll and usePrePredictPost, on the performance of software fault-proneness prediction, both within-release and across-releases. The empirical results are based on data extracted from 64 releases of twelve open-source projects. Results show that the learning approach has a substantial, and typically unacknowledged, impact on the classification performance. Specifically, using useAllPredictAll leads to significantly better performance than using usePrePredictPost learning approach, both within-release and across-releases. Furthermore, this paper uncovers that, for within-release predictions, this difference in classification performance is due to different levels of class imbalance in the two learning approaches. When class imbalance is addressed, the performance difference between the learning approaches is eliminated. Our findings imply that the learning approach should always be explicitly identified and its impact on software fault-proneness prediction considered. The paper concludes with a discussion of potential consequences of our results for both research and practice.
    Exploring the Role of Task Transferability in Large-Scale Multi-Task Learning. (arXiv:2204.11117v2 [cs.CL] UPDATED)
    Recent work has found that multi-task training with a large number of diverse tasks can uniformly improve downstream performance on unseen target tasks. In contrast, literature on task transferability has established that the choice of intermediate tasks can heavily affect downstream task performance. In this work, we aim to disentangle the effect of scale and relatedness of tasks in multi-task representation learning. We find that, on average, increasing the scale of multi-task learning, in terms of the number of tasks, indeed results in better learned representations than smaller multi-task setups. However, if the target tasks are known ahead of time, then training on a smaller set of related tasks is competitive to the large-scale multi-task training at a reduced computational cost.
    A machine-learning-based tool for last closed magnetic flux surface reconstruction on tokamak. (arXiv:2207.05695v1 [physics.plasm-ph])
    Nuclear fusion power created by tokamak devices holds one of the most promising ways as a sustainable source of clean energy. One main challenge research field of tokamak is to predict the last closed magnetic flux surface (LCFS) determined by the interaction of the actuator coils and the internal tokamak plasma. This work requires high-dimensional, high-frequency, high-fidelity, real-time tools, further complicated by the wide range of actuator coils input interact with internal tokamak plasma states. In this work, we present a new machine learning model for reconstructing the LCFS from the Experimental Advanced Superconducting Tokamak (EAST) that learns automatically from the experimental data of EAST. This architecture can check the control strategy design and integrate it with the tokamak control system for real-time magnetic prediction. In the real-time modeling test, our approach achieves over 99% average similarity in LCFS reconstruction of the entire discharge process. In the offline magnetic reconstruction, our approach reaches over 93% average similarity.
    Capturing Evolution Genes for Time Series Data. (arXiv:1905.05004v2 [cs.LG] UPDATED)
    The modeling of time series is becoming increasingly critical in a wide variety of applications. Overall, data evolves by following different patterns, which are generally caused by different user behaviors. Given a time series, we define the evolution gene to capture the latent user behaviors and to describe how the behaviors lead to the generation of time series. In particular, we propose a uniform framework that recognizes different evolution genes of segments by learning a classifier, and adopt an adversarial generator to implement the evolution gene by estimating the segments' distribution. Experimental results based on a synthetic dataset and five real-world datasets show that our approach can not only achieve a good prediction results (e.g., averagely +10.56% in terms of F1), but is also able to provide explanations of the results.
    Equivariance versus Augmentation for Spherical Images. (arXiv:2202.03990v2 [cs.LG] UPDATED)
    We analyze the role of rotational equivariance in convolutional neural networks (CNNs) applied to spherical images. We compare the performance of the group equivariant networks known as S2CNNs and standard non-equivariant CNNs trained with an increasing amount of data augmentation. The chosen architectures can be considered baseline references for the respective design paradigms. Our models are trained and evaluated on single or multiple items from the MNIST or FashionMNIST dataset projected onto the sphere. For the task of image classification, which is inherently rotationally invariant, we find that by considerably increasing the amount of data augmentation and the size of the networks, it is possible for the standard CNNs to reach at least the same performance as the equivariant network. In contrast, for the inherently equivariant task of semantic segmentation, the non-equivariant networks are consistently outperformed by the equivariant networks with significantly fewer parameters. We also analyze and compare the inference latency and training times of the different networks, enabling detailed tradeoff considerations between equivariant architectures and data augmentation for practical problems. The equivariant spherical networks used in the experiments are available at https://github.com/JanEGerken/sem_seg_s2cnn .
    Federated Unlearning: How to Efficiently Erase a Client in FL?. (arXiv:2207.05521v1 [cs.LG])
    With privacy legislation empowering users with the right to be forgotten, it has become essential to make a model forget about some of its training data. We explore the problem of removing any client's contribution in federated learning (FL). During FL rounds, each client performs local training to learn a model that minimizes the empirical loss on their private data. We propose to perform unlearning at the client (to be erased) by reversing the learning process, i.e., training a model to \emph{maximize} the local empirical loss. In particular, we formulate the unlearning problem as a constrained maximization problem by restricting to an $\ell_2$-norm ball around a suitably chosen reference model to help retain some knowledge learnt from the other clients' data. This allows the client to use projected gradient descent to perform unlearning. The method does neither require global access to the data used for training nor the history of the parameter updates to be stored by the aggregator (server) or any of the clients. Experiments on the MNIST dataset show that the proposed unlearning method is efficient and effective.
    PeopleSansPeople: A Synthetic Data Generator for Human-Centric Computer Vision. (arXiv:2112.09290v2 [cs.CV] UPDATED)
    In recent years, person detection and human pose estimation have made great strides, helped by large-scale labeled datasets. However, these datasets had no guarantees or analysis of human activities, poses, or context diversity. Additionally, privacy, legal, safety, and ethical concerns may limit the ability to collect more human data. An emerging alternative to real-world data that alleviates some of these issues is synthetic data. However, creation of synthetic data generators is incredibly challenging and prevents researchers from exploring their usefulness. Therefore, we release a human-centric synthetic data generator PeopleSansPeople which contains simulation-ready 3D human assets, a parameterized lighting and camera system, and generates 2D and 3D bounding box, instance and semantic segmentation, and COCO pose labels. Using PeopleSansPeople, we performed benchmark synthetic data training using a Detectron2 Keypoint R-CNN variant [1]. We found that pre-training a network using synthetic data and fine-tuning on various sizes of real-world data resulted in a keypoint AP increase of $+38.03$ ($44.43 \pm 0.17$ vs. $6.40$) for few-shot transfer (limited subsets of COCO-person train [2]), and an increase of $+1.47$ ($63.47 \pm 0.19$ vs. $62.00$) for abundant real data regimes, outperforming models trained with the same real data alone. We also found that our models outperformed those pre-trained with ImageNet with a keypoint AP increase of $+22.53$ ($44.43 \pm 0.17$ vs. $21.90$) for few-shot transfer and $+1.07$ ($63.47 \pm 0.19$ vs. $62.40$) for abundant real data regimes. This freely-available data generator should enable a wide range of research into the emerging field of simulation to real transfer learning in the critical area of human-centric computer vision.
    Pseudo value-based Deep Neural Networks for Multi-state Survival Analysis. (arXiv:2207.05291v1 [cs.LG])
    Multi-state survival analysis (MSA) uses multi-state models for the analysis of time-to-event data. In medical applications, MSA can provide insights about the complex disease progression in patients. A key challenge in MSA is the accurate subject-specific prediction of multi-state model quantities such as transition probability and state occupation probability in the presence of censoring. Traditional multi-state methods such as Aalen-Johansen (AJ) estimators and Cox-based methods are respectively limited by Markov and proportional hazards assumptions and are infeasible for making subject-specific predictions. Neural ordinary differential equations for MSA relax these assumptions but are computationally expensive and do not directly model the transition probabilities. To address these limitations, we propose a new class of pseudo-value-based deep learning models for multi-state survival analysis, where we show that pseudo values - designed to handle censoring - can be a natural replacement for estimating the multi-state model quantities when derived from a consistent estimator. In particular, we provide an algorithm to derive pseudo values from consistent estimators to directly predict the multi-state survival quantities from the subject's covariates. Empirical results on synthetic and real-world datasets show that our proposed models achieve state-of-the-art results under various censoring settings.
    PoeticTTS -- Controllable Poetry Reading for Literary Studies. (arXiv:2207.05549v1 [eess.AS])
    Speech synthesis for poetry is challenging due to specific intonation patterns inherent to poetic speech. In this work, we propose an approach to synthesise poems with almost human like naturalness in order to enable literary scholars to systematically examine hypotheses on the interplay between text, spoken realisation, and the listener's perception of poems. To meet these special requirements for literary studies, we resynthesise poems by cloning prosodic values from a human reference recitation, and afterwards make use of fine-grained prosody control to manipulate the synthetic speech in a human-in-the-loop setting to alter the recitation w.r.t. specific phenomena. We find that finetuning our TTS model on poetry captures poetic intonation patterns to a large extent which is beneficial for prosody cloning and manipulation and verify the success of our approach both in an objective evaluation as well as in human studies.
    Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent RL. (arXiv:2207.05683v1 [cs.MA])
    Cooperative multi-agent reinforcement learning (MARL) is making rapid progress for solving tasks in a grid world and real-world scenarios, in which agents are given different attributes and goals, resulting in different behavior through the whole multi-agent task. In this study, we quantify the agent's behavior difference and build its relationship with the policy performance via {\bf Role Diversity}, a metric to measure the characteristics of MARL tasks. We define role diversity from three perspectives: action-based, trajectory-based, and contribution-based to fully measure a multi-agent task. Through theoretical analysis, we find that the error bound in MARL can be decomposed into three parts that have a strong relation to the role diversity. The decomposed factors can significantly impact policy optimization on three popular directions including parameter sharing, communication mechanism, and credit assignment. The main experimental platforms are based on {\bf Multiagent Particle Environment (MPE)} and {\bf The StarCraft Multi-Agent Challenge (SMAC). Extensive experiments} clearly show that role diversity can serve as a robust measurement for the characteristics of a multi-agent cooperation task and help diagnose whether the policy fits the current multi-agent system for a better policy performance.
    Distributed Online System Identification for LTI Systems Using Reverse Experience Replay. (arXiv:2207.01062v1 [cs.LG] CROSS LISTED)
    Identification of linear time-invariant (LTI) systems plays an important role in control and reinforcement learning. Both asymptotic and finite-time offline system identification are well-studied in the literature. For online system identification, the idea of stochastic-gradient descent with reverse experience replay (SGD-RER) was recently proposed, where the data sequence is stored in several buffers and the stochastic-gradient descent (SGD) update performs backward in each buffer to break the time dependency between data points. Inspired by this work, we study distributed online system identification of LTI systems over a multi-agent network. We consider agents as identical LTI systems, and the network goal is to jointly estimate the system parameters by leveraging the communication between agents. We propose DSGD-RER, a distributed variant of the SGD-RER algorithm, and theoretically characterize the improvement of the estimation error with respect to the network size. Our numerical experiments certify the reduction of estimation error as the network size grows.
    Accelerated Deep Lossless Image Coding with Unified Paralleleized GPU Coding Architecture. (arXiv:2207.05152v1 [eess.IV])
    We propose Deep Lossless Image Coding (DLIC), a full resolution learned lossless image compression algorithm. Our algorithm is based on a neural network combined with an entropy encoder. The neural network performs a density estimation on each pixel of the source image. The density estimation is then used to code the target pixel, beating FLIF in terms of compression rate. Similar approaches have been attempted. However, long run times make them unfeasible for real world applications. We introduce a parallelized GPU based implementation, allowing for encoding and decoding of grayscale, 8-bit images in less than one second. Because DLIC uses a neural network to estimate the probabilities used for the entropy coder, DLIC can be trained on domain specific image data. We demonstrate this capability by adapting and training DLIC with Magnet Resonance Imaging (MRI) images.
    On robust risk-based active-learning algorithms for enhanced decision support. (arXiv:2201.02555v2 [cs.LG] UPDATED)
    Classification models are a fundamental component of physical-asset management technologies such as structural health monitoring (SHM) systems and digital twins. Previous work introduced risk-based active learning, an online approach for the development of statistical classifiers that takes into account the decision-support context in which they are applied. Decision-making is considered by preferentially querying data labels according to expected value of perfect information (EVPI). Although several benefits are gained by adopting a risk-based active learning approach, including improved decision-making performance, the algorithms suffer from issues relating to sampling bias as a result of the guided querying process. This sampling bias ultimately manifests as a decline in decision-making performance during the later stages of active learning, which in turn corresponds to lost resource/utility. The current paper proposes two novel approaches to counteract the effects of sampling bias: semi-supervised learning, and discriminative classification models. These approaches are first visualised using a synthetic dataset, then subsequently applied to an experimental case study, specifically, the Z24 Bridge dataset. The semi-supervised learning approach is shown to have variable performance; with robustness to sampling bias dependent on the suitability of the generative distributions selected for the model with respect to each dataset. In contrast, the discriminative classifiers are shown to have excellent robustness to the effects of sampling bias. Moreover, it was found that the number of inspections made during a monitoring campaign, and therefore resource expenditure, could be reduced with the careful selection of the statistical classifiers used within a decision-supporting monitoring system.
    IMG-NILM: A Deep learning NILM approach using energy heatmaps. (arXiv:2207.05463v1 [cs.LG])
    Energy disaggregation estimates appliance-by-appliance electricity consumption from a single meter that measures the whole home's electricity demand. Compared with intrusive load monitoring, NILM (Non-intrusive load monitoring) is low cost, easy to deploy, and flexible. In this paper, we propose a new method, coined IMG-NILM, that utilises convolutional neural networks (CNN) to disaggregate electricity data represented as images. CNN is proven to be efficient with images, hence, instead of the traditional representation of electricity data as time series, data is transformed into heatmaps with higher electricity readings portrayed as 'hotter' colours. The image representation is then used in CNN to detect the signature of an appliance from aggregated data. IMG-NILM is flexible and shows consistent performance in disaggregating various types of appliances; including single and multiple states. It attains a test accuracy of up to 93% on the UK dale dataset within a single house, where a substantial number of appliances are present. In more challenging settings where electricity data is collected from different houses, IMG-NILM attains also a very good average accuracy of 85%.
    Fast Yet Effective Machine Unlearning. (arXiv:2111.08947v4 [cs.LG] UPDATED)
    Unlearning the data observed during the training of a machine learning (ML) model is an important task that can play a pivotal role in fortifying the privacy and security of ML-based applications. This paper raises the following questions: (i) can we unlearn a single or multiple classes of data from an ML model without looking at the full training data even once? (ii) can we make the process of unlearning fast and scalable to large datasets, and generalize it to different deep networks? We introduce a novel machine unlearning framework with error-maximizing noise generation and impair-repair based weight manipulation that offers an efficient solution to the above questions. An error-maximizing noise matrix is learned for the class to be unlearned using the original model. The noise matrix is used to manipulate the model weights to unlearn the targeted class of data. We introduce impair and repair steps for a controlled manipulation of the network weights. In the impair step, the noise matrix along with a very high learning rate is used to induce sharp unlearning in the model. Thereafter, the repair step is used to regain the overall performance. With very few update steps, we show excellent unlearning while substantially retaining the overall model accuracy. Unlearning multiple classes requires a similar number of update steps as for the single class, making our approach scalable to large problems. Our method is quite efficient in comparison to the existing methods, works for multi-class unlearning, doesn't put any constraints on the original optimization mechanism or network design, and works well in both small and large-scale vision tasks. This work is an important step towards fast and easy implementation of unlearning in deep networks. We will make the source code publicly available.
    How Robust is your Fair Model? Exploring the Robustness of Diverse Fairness Strategies. (arXiv:2207.04581v2 [cs.LG] UPDATED)
    With the introduction of machine learning in high-stakes decision making, ensuring algorithmic fairness has become an increasingly important problem to solve. In response to this, many mathematical definitions of fairness have been proposed, and a variety of optimisation techniques have been developed, all designed to maximise a defined notion of fairness. However, fair solutions are reliant on the quality of the training data, and can be highly sensitive to noise. Recent studies have shown that robustness (the ability for a model to perform well on unseen data) plays a significant role in the type of strategy that should be used when approaching a new problem and, hence, measuring the robustness of these strategies has become a fundamental problem. In this work, we therefore propose a new criterion to measure the robustness of various fairness optimisation strategies - the robustness ratio. We conduct multiple extensive experiments on five bench mark fairness data sets using three of the most popular fairness strategies with respect to four of the most popular definitions of fairness. Our experiments empirically show that fairness methods that rely on threshold optimisation are very sensitive to noise in all the evaluated data sets, despite mostly outperforming other methods. This is in contrast to the other two methods, which are less fair for low noise scenarios but fairer for high noise ones. To the best of our knowledge, we are the first to quantitatively evaluate the robustness of fairness optimisation strategies. This can potentially can serve as a guideline in choosing the most suitable fairness strategy for various data sets.
    Building Korean Sign Language Augmentation (KoSLA) Corpus with Data Augmentation Technique. (arXiv:2207.05261v1 [cs.CL])
    We present an efficient framework of corpus for sign language translation. Aided with a simple but dramatic data augmentation technique, our method converts text into annotated forms with minimum information loss. Sign languages are composed of manual signals, non-manual signals, and iconic features. According to professional sign language interpreters, non-manual signals such as facial expressions and gestures play an important role in conveying exact meaning. By considering the linguistic features of sign language, our proposed framework is a first and unique attempt to build a multimodal sign language augmentation corpus (hereinafter referred to as the KoSLA corpus) containing both manual and non-manual modalities. The corpus we built demonstrates confident results in the hospital context, showing improved performance with augmented datasets. To overcome data scarcity, we resorted to data augmentation techniques such as synonym replacement to boost the efficiency of our translation model and available data, while maintaining grammatical and semantic structures of sign language. For the experimental support, we verify the effectiveness of data augmentation technique and usefulness of our corpus by performing a translation task between normal sentences and sign language annotations on two tokenizers. The result was convincing, proving that the BLEU scores with the KoSLA corpus were significant.
    Online Meta-Learning in Adversarial Multi-Armed Bandits. (arXiv:2205.15921v2 [cs.LG] UPDATED)
    We study meta-learning for adversarial multi-armed bandits. We consider the online-within-online setup, in which a player (learner) encounters a sequence of multi-armed bandit episodes. The player's performance is measured as regret against the best arm in each episode, according to the losses generated by an adversary. The difficulty of the problem depends on the empirical distribution of the per-episode best arm chosen by the adversary. We present an algorithm that can leverage the non-uniformity in this empirical distribution, and derive problem-dependent regret bounds. This solution comprises an inner learner that plays each episode separately, and an outer learner that updates the hyper-parameters of the inner algorithm between the episodes. In the case where the best arm distribution is far from uniform, it improves upon the best bound that can be achieved by any online algorithm executed on each episode individually without meta-learning.
    Root-aligned SMILES: A Tight Representation for Chemical Reaction Prediction. (arXiv:2203.11444v4 [cs.LG] UPDATED)
    Chemical reaction prediction, involving forward synthesis and retrosynthesis prediction, is a fundamental problem in organic synthesis. A popular computational paradigm formulates synthesis prediction as a sequence-to-sequence translation problem, where the typical SMILES is adopted for molecule representations. However, the general-purpose SMILES neglects the characteristics of chemical reactions, where the molecular graph topology is largely unaltered from reactants to products, resulting in the suboptimal performance of SMILES if straightforwardly applied. In this article, we propose the root-aligned SMILES (R-SMILES), which specifies a tightly aligned one-to-one mapping between the product and the reactant SMILES for more efficient synthesis prediction. Due to the strict one-to-one mapping and reduced edit distance, the computational model is largely relieved from learning the complex syntax and dedicated to learning the chemical knowledge for reactions. We compare the proposed R-SMILES with various state-of-the-art baselines and show that it significantly outperforms them all, demonstrating the superiority of the proposed method.
    Efficient NLP Inference at the Edge via Elastic Pipelining. (arXiv:2207.05022v2 [cs.LG] UPDATED)
    Natural Language Processing (NLP) inference is seeing increasing adoption by mobile applications, where on-device inference is desirable for crucially preserving user data privacy and avoiding network roundtrips. Yet, the unprecedented size of an NLP model stresses both latency and memory, the two key resources of a mobile device. To meet a target latency, holding the whole model in memory launches execution as soon as possible but increases one app's memory footprints by several times, limiting its benefits to only a few inferences before being recycled by mobile memory management. On the other hand, loading the model from storage on demand incurs a few seconds long IO, far exceeding the delay range satisfying to a user; pipelining layerwise model loading and execution does not hide IO either, due to the large skewness between IO and computation delays. To this end, we propose WRX. Built on the key idea of maximizing IO/compute resource utilization on the most important parts of a model, WRX reconciles the latency/memory tension via two novel techniques. First, model sharding. WRX manages model parameters as independently tunable shards and profiles their importance to accuracy. Second, elastic pipeline planning with a preload buffer. WRX instantiates an IO/computation pipeline and uses a small buffer for preload shards to bootstrap execution without stalling in early stages; it judiciously selects, tunes, and assembles shards per their importance for resource-elastic execution, which maximizes inference accuracy. Atop two commodity SoCs, we build WRX and evaluate it against a wide range of NLP tasks, under a practical range of target latencies, and on both CPU and GPU. We demonstrate that, WRX delivers high accuracies with 1--2 orders of magnitude lower memory, outperforming competitive baselines.
    Recent Developments in AI and USPTO Open Data. (arXiv:2207.05239v1 [cs.LG])
    The USPTO disseminates one of the largest publicly accessible repositories of scientific, technical, and commercial data worldwide. USPTO data has historically seen frequent use in fields such as patent analytics, economics, and prosecution & litigation tools. This article highlights an emerging class of usecases directed to the research, development, and application of artificial intelligence technology. Such usecases contemplate both the delivery of artificial intelligence capabilities for practical IP applications and the enablement of future state-of-the-art artificial intelligence research via USPTO data products. Examples from both within and beyond the USPTO are offered as case studies.
    Bi-fidelity Evolutionary Multiobjective Search for Adversarially Robust Deep Neural Architectures. (arXiv:2207.05321v1 [cs.LG])
    Deep neural networks have been found vulnerable to adversarial attacks, thus raising potentially concerns in security-sensitive contexts. To address this problem, recent research has investigated the adversarial robustness of deep neural networks from the architectural point of view. However, searching for architectures of deep neural networks is computationally expensive, particularly when coupled with adversarial training process. To meet the above challenge, this paper proposes a bi-fidelity multiobjective neural architecture search approach. First, we formulate the NAS problem for enhancing adversarial robustness of deep neural networks into a multiobjective optimization problem. Specifically, in addition to a low-fidelity performance predictor as the first objective, we leverage an auxiliary-objective -- the value of which is the output of a surrogate model trained with high-fidelity evaluations. Secondly, we reduce the computational cost by combining three performance estimation methods, i.e., parameter sharing, low-fidelity evaluation, and surrogate-based predictor. The effectiveness of the proposed approach is confirmed by extensive experiments conducted on CIFAR-10, CIFAR-100 and SVHN datasets.
    A Dataset Perspective on Offline Reinforcement Learning. (arXiv:2111.04714v2 [cs.LG] UPDATED)
    The application of Reinforcement Learning (RL) in real world environments can be expensive or risky due to sub-optimal policies during training. In Offline RL, this problem is avoided since interactions with an environment are prohibited. Policies are learned from a given dataset, which solely determines their performance. Despite this fact, how dataset characteristics influence Offline RL algorithms is still hardly investigated. The dataset characteristics are determined by the behavioral policy that samples this dataset. Therefore, we define characteristics of behavioral policies as exploratory for yielding high expected information in their interaction with the Markov Decision Process (MDP) and as exploitative for having high expected return. We implement two corresponding empirical measures for the datasets sampled by the behavioral policy in deterministic MDPs. The first empirical measure SACo is defined by the normalized unique state-action pairs and captures exploration. The second empirical measure TQ is defined by the normalized average trajectory return and captures exploitation. Empirical evaluations show the effectiveness of TQ and SACo. In large-scale experiments using our proposed measures, we show that the unconstrained off-policy Deep Q-Network family requires datasets with high SACo to find a good policy. Furthermore, experiments show that policy constraint algorithms perform well on datasets with high TQ and SACo. Finally, the experiments show, that purely dataset-constrained Behavioral Cloning performs competitively to the best Offline RL algorithms for datasets with high TQ.
    Unsupervised learning of observation functions in state-space models by nonparametric moment methods. (arXiv:2207.05242v1 [stat.ML])
    We investigate the unsupervised learning of non-invertible observation functions in nonlinear state-space models. Assuming abundant data of the observation process along with the distribution of the state process, we introduce a nonparametric generalized moment method to estimate the observation function via constrained regression. The major challenge comes from the non-invertibility of the observation function and the lack of data pairs between the state and observation. We address the fundamental issue of identifiability from quadratic loss functionals and show that the function space of identifiability is the closure of a RKHS that is intrinsic to the state process. Numerical results show that the first two moments and temporal correlations, along with upper and lower bounds, can identify functions ranging from piecewise polynomials to smooth functions, leading to convergent estimators. The limitations of this method, such as non-identifiability due to symmetry and stationarity, are also discussed.
    IDEA: Increasing Text Diversity via Online Multi-Label Recognition for Vision-Language Pre-training. (arXiv:2207.05333v1 [cs.CV])
    Vision-Language Pre-training (VLP) with large-scale image-text pairs has demonstrated superior performance in various fields. However, the image-text pairs co-occurrent on the Internet typically lack explicit alignment information, which is suboptimal for VLP. Existing methods proposed to adopt an off-the-shelf object detector to utilize additional image tag information. However, the object detector is time-consuming and can only identify the pre-defined object categories, limiting the model capacity. Inspired by the observation that the texts incorporate incomplete fine-grained image information, we introduce IDEA, which stands for increasing text diversity via online multi-label recognition for VLP. IDEA shows that multi-label learning with image tags extracted from the texts can be jointly optimized during VLP. Moreover, IDEA can identify valuable image tags online to provide more explicit textual supervision. Comprehensive experiments demonstrate that IDEA can significantly boost the performance on multiple downstream datasets with a small extra computational cost.
    The MuSe 2022 Multimodal Sentiment Analysis Challenge: Humor, Emotional Reactions, and Stress. (arXiv:2207.05691v1 [cs.LG])
    The Multimodal Sentiment Analysis Challenge (MuSe) 2022 is dedicated to multimodal sentiment and emotion recognition. For this year's challenge, we feature three datasets: (i) the Passau Spontaneous Football Coach Humor (Passau-SFCH) dataset that contains audio-visual recordings of German football coaches, labelled for the presence of humour; (ii) the Hume-Reaction dataset in which reactions of individuals to emotional stimuli have been annotated with respect to seven emotional expression intensities, and (iii) the Ulm-Trier Social Stress Test (Ulm-TSST) dataset comprising of audio-visual data labelled with continuous emotion values (arousal and valence) of people in stressful dispositions. Using the introduced datasets, MuSe 2022 2022 addresses three contemporary affective computing problems: in the Humor Detection Sub-Challenge (MuSe-Humor), spontaneous humour has to be recognised; in the Emotional Reactions Sub-Challenge (MuSe-Reaction), seven fine-grained `in-the-wild' emotions have to be predicted; and in the Emotional Stress Sub-Challenge (MuSe-Stress), a continuous prediction of stressed emotion values is featured. The challenge is designed to attract different research communities, encouraging a fusion of their disciplines. Mainly, MuSe 2022 targets the communities of audio-visual emotion recognition, health informatics, and symbolic sentiment analysis. This baseline paper describes the datasets as well as the feature sets extracted from them. A recurrent neural network with LSTM cells is used to set competitive baseline results on the test partitions for each sub-challenge. We report an Area Under the Curve (AUC) of .8480 for MuSe-Humor; .2801 mean (from 7-classes) Pearson's Correlations Coefficient for MuSe-Reaction, as well as .4931 Concordance Correlation Coefficient (CCC) and .4761 for valence and arousal in MuSe-Stress, respectively.
    Conservative SPDEs as fluctuating mean field limits of stochastic gradient descent. (arXiv:2207.05705v1 [math.PR])
    The convergence of stochastic interacting particle systems in the mean-field limit to solutions to conservative stochastic partial differential equations is shown, with optimal rate of convergence. As a second main result, a quantitative central limit theorem for such SPDEs is derived, again with optimal rate of convergence. The results apply in particular to the convergence in the mean-field scaling of stochastic gradient descent dynamics in overparametrized, shallow neural networks to solutions to SPDEs. It is shown that the inclusion of fluctuations in the limiting SPDE improves the rate of convergence, and retains information about the fluctuations of stochastic gradient descent in the continuum limit.
    Denoising single images by feature ensemble revisited. (arXiv:2207.05176v1 [cs.CV])
    Image denoising is still a challenging issue in many computer vision sub-domains. Recent studies show that significant improvements are made possible in a supervised setting. However, few challenges, such as spatial fidelity and cartoon-like smoothing remain unresolved or decisively overlooked. Our study proposes a simple yet efficient architecture for the denoising problem that addresses the aforementioned issues. The proposed architecture revisits the concept of modular concatenation instead of long and deeper cascaded connections, to recover a cleaner approximation of the given image. We find that different modules can capture versatile representations, and concatenated representation creates a richer subspace for low-level image restoration. The proposed architecture's number of parameters remains smaller than the number for most of the previous networks and still achieves significant improvements over the current state-of-the-art networks.
    A Data-Based Perspective on Transfer Learning. (arXiv:2207.05739v1 [cs.LG])
    It is commonly believed that in transfer learning including more pre-training data translates into better performance. However, recent evidence suggests that removing data from the source dataset can actually help too. In this work, we take a closer look at the role of the source dataset's composition in transfer learning and present a framework for probing its impact on downstream performance. Our framework gives rise to new capabilities such as pinpointing transfer learning brittleness as well as detecting pathologies such as data-leakage and the presence of misleading examples in the source dataset. In particular, we demonstrate that removing detrimental datapoints identified by our framework improves transfer learning performance from ImageNet on a variety of target tasks. Code is available at https://github.com/MadryLab/data-transfer
    Histopathological Imaging Classification of Breast Tissue for Cancer Diagnosis Support Using Deep Learning Models. (arXiv:2207.05057v1 [eess.IV])
    According to some medical imaging techniques, breast histopathology images called Hematoxylin and Eosin are considered as the gold standard for cancer diagnoses. Based on the idea of dividing the pathologic image (WSI) into multiple patches, we used the window [512,512] sliding from left to right and sliding from top to bottom, each sliding step overlapping by 50% to augmented data on a dataset of 400 images which were gathered from the ICIAR 2018 Grand Challenge. Then use the EffficientNet model to classify and identify the histopathological images of breast cancer into 4 types: Normal, Benign, Carcinoma, Invasive Carcinoma. The EffficientNet model is a recently developed model that uniformly scales the width, depth, and resolution of the network with a set of fixed scaling factors that are well suited for training images with high resolution. And the results of this model give a rather competitive classification efficiency, achieving 98% accuracy on the training set and 93% on the evaluation set.
    An Information-Theoretic Analysis for Transfer Learning: Error Bounds and Applications. (arXiv:2207.05377v1 [cs.IT])
    Transfer learning, or domain adaptation, is concerned with machine learning problems in which training and testing data come from possibly different probability distributions. In this work, we give an information-theoretic analysis on the generalization error and excess risk of transfer learning algorithms, following a line of work initiated by Russo and Xu. Our results suggest, perhaps as expected, that the Kullback-Leibler (KL) divergence $D(\mu||\mu')$ plays an important role in the characterizations where $\mu$ and $\mu'$ denote the distribution of the training data and the testing test, respectively. Specifically, we provide generalization error upper bounds for the empirical risk minimization (ERM) algorithm where data from both distributions are available in the training phase. We further apply the analysis to approximated ERM methods such as the Gibbs algorithm and the stochastic gradient descent method. We then generalize the mutual information bound with $\phi$-divergence and Wasserstein distance. These generalizations lead to tighter bounds and can handle the case when $\mu$ is not absolutely continuous with respect to $\mu'$. Furthermore, we apply a new set of techniques to obtain an alternative upper bound which gives a fast (and optimal) learning rate for some learning problems. Finally, inspired by the derived bounds, we propose the InfoBoost algorithm in which the importance weights for source and target data are adjusted adaptively in accordance to information measures. The empirical results show the effectiveness of the proposed algorithm.
    Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning. (arXiv:2207.05742v1 [cs.LG])
    In lifelong learning, an agent learns throughout its entire life without resets, in a constantly changing environment, as we humans do. Consequently, lifelong learning comes with a plethora of research problems such as continual domain shifts, which result in non-stationary rewards and environment dynamics. These non-stationarities are difficult to detect and cope with due to their continuous nature. Therefore, exploration strategies and learning methods are required that are capable of tracking the steady domain shifts, and adapting to them. We propose Reactive Exploration to track and react to continual domain shifts in lifelong reinforcement learning, and to update the policy correspondingly. To this end, we conduct experiments in order to investigate different exploration strategies. We empirically show that representatives of the policy-gradient family are better suited for lifelong learning, as they adapt more quickly to distribution shifts than Q-learning. Thereby, policy-gradient methods profit the most from Reactive Exploration and show good results in lifelong learning with continual domain shifts. Our code is available at: https://github.com/ml-jku/reactive-exploration.
    SWIS: Self-Supervised Representation Learning For Writer Independent Offline Signature Verification. (arXiv:2202.13078v2 [cs.CV] UPDATED)
    Writer independent offline signature verification is one of the most challenging tasks in pattern recognition as there is often a scarcity of training data. To handle such data scarcity problem, in this paper, we propose a novel self-supervised learning (SSL) framework for writer independent offline signature verification. To our knowledge, this is the first attempt to utilize self-supervised setting for the signature verification task. The objective of self-supervised representation learning from the signature images is achieved by minimizing the cross-covariance between two random variables belonging to different feature directions and ensuring a positive cross-covariance between the random variables denoting the same feature direction. This ensures that the features are decorrelated linearly and the redundant information is discarded. Through experimental results on different data sets, we obtained encouraging results.
    Remote sensing and AI for building climate adaptation applications. (arXiv:2107.02693v2 [cs.LG] UPDATED)
    Urban areas are not only one of the biggest contributors to climate change, but also they are one of the most vulnerable areas with high populations who would together experience the negative impacts. In this paper, we address some of the opportunities brought by satellite remote sensing imaging and artificial intelligence (AI) in order to measure climate adaptation of cities automatically. We propose a framework combining AI and simulation which may be useful for extracting indicators from remote-sensing images and may help with predictive estimation of future states of these climate-adaptation-related indicators. When such models become more robust and used in real life applications, they may help decision makers and early responders to choose the best actions to sustain the well-being of society, natural resources and biodiversity. We underline that this is an open field and an on-going area of research for many scientists, therefore we offer an in-depth discussion on the challenges and limitations of data-driven methods and the predictive estimation models in general.
    DeepTx: Deep Learning Beamforming with Channel Prediction. (arXiv:2202.07998v3 [eess.SP] UPDATED)
    Machine learning algorithms have recently been considered for many tasks in the field of wireless communications. Previously, we have proposed the use of a deep fully convolutional neural network (CNN) for receiver processing and shown it to provide considerable performance gains. In this study, we focus on machine learning algorithms for the transmitter. In particular, we consider beamforming and propose a CNN which, for a given uplink channel estimate as input, outputs downlink channel information to be used for beamforming. The CNN is trained in a supervised manner considering both uplink and downlink transmissions with a loss function that is based on UE receiver performance. The main task of the neural network is to predict the channel evolution between uplink and downlink slots, but it can also learn to handle inefficiencies and errors in the whole chain, including the actual beamforming phase. The provided numerical experiments demonstrate the improved beamforming performance.
    Optimal Clustering with Noisy Queries via Multi-Armed Bandit. (arXiv:2207.05376v1 [cs.LG])
    Motivated by many applications, we study clustering with a faulty oracle. In this problem, there are $n$ items belonging to $k$ unknown clusters, and the algorithm is allowed to ask the oracle whether two items belong to the same cluster or not. However, the answer from the oracle is correct only with probability $\frac{1}{2}+\frac{\delta}{2}$. The goal is to recover the hidden clusters with minimum number of noisy queries. Previous works have shown that the problem can be solved with $O(\frac{nk\log n}{\delta^2} + \text{poly}(k,\frac{1}{\delta}, \log n))$ queries, while $\Omega(\frac{nk}{\delta^2})$ queries is known to be necessary. So, for any values of $k$ and $\delta$, there is still a non-trivial gap between upper and lower bounds. In this work, we obtain the first matching upper and lower bounds for a wide range of parameters. In particular, a new polynomial time algorithm with $O(\frac{n(k+\log n)}{\delta^2} + \text{poly}(k,\frac{1}{\delta}, \log n))$ queries is proposed. Moreover, we prove a new lower bound of $\Omega(\frac{n\log n}{\delta^2})$, which, combined with the existing $\Omega(\frac{nk}{\delta^2})$ bound, matches our upper bound up to an additive $\text{poly}(k,\frac{1}{\delta},\log n)$ term. To obtain the new results, our main ingredient is an interesting connection between our problem and multi-armed bandit, which might provide useful insights for other similar problems.
    Solving a directed percolation inverse problem. (arXiv:2201.12222v3 [cond-mat.dis-nn] UPDATED)
    We present a directed percolation inverse problem for diode networks: Given information about which pairs of nodes allow current to percolate from one to the other, can one find a configuration of diodes consistent with the observed currents? We implement a divide-and-concur iterative projection method for solving the problem and demonstrate the supremacy of our method over an exhaustive approach for nontrivial instances of the problem. We find that the problem is most difficult when some but not all of the percolation data are hidden, and that the most difficult networks to reconstruct generally are those for which the currents are most sensitive to the addition or removal of a single diode.
    End-to-end speech recognition modeling from de-identified data. (arXiv:2207.05469v1 [eess.AS])
    De-identification of data used for automatic speech recognition modeling is a critical component in protecting privacy, especially in the medical domain. However, simply removing all personally identifiable information (PII) from end-to-end model training data leads to a significant performance degradation in particular for the recognition of names, dates, locations, and words from similar categories. We propose and evaluate a two-step method for partially recovering this loss. First, PII is identified, and each occurrence is replaced with a random word sequence of the same category. Then, corresponding audio is produced via text-to-speech or by splicing together matching audio fragments extracted from the corpus. These artificial audio/label pairs, together with speaker turns from the original data without PII, are used to train models. We evaluate the performance of this method on in-house data of medical conversations and observe a recovery of almost the entire performance degradation in the general word error rate while still maintaining a strong diarization performance. Our main focus is the improvement of recall and precision in the recognition of PII-related words. Depending on the PII category, between $50\% - 90\%$ of the performance degradation can be recovered using our proposed method.
    WheaCha: A Method for Explaining the Predictions of Models of Code. (arXiv:2102.04625v3 [cs.LG] UPDATED)
    Attribution methods have emerged as a popular approach to interpreting model predictions based on the relevance of input features. Although the feature importance ranking can provide insights of how models arrive at a prediction from a raw input, they do not give a clear-cut definition of the key features models use for the prediction. In this paper, we present a new method, called WheaCha, for explaining the predictions of code models. Although WheaCha employs the same mechanism of tracing model predictions back to the input features, it differs from all existing attribution methods in crucial ways. Specifically, WheaCha divides an input program into "wheat" (i.e., the defining features that are the reason for which models predict the label that they predict) and the rest "chaff" for any prediction of a learned code model. We realize WheaCha in a tool, HuoYan, and use it to explain four prominent code models: code2vec, seq-GNN, GGNN, and CodeBERT. Results show (1) HuoYan is efficient - taking on average under twenty seconds to compute the wheat for an input program in an end-to-end fashion (i.e., including model prediction time); (2) the wheat that all models use to predict input programs is made of simple syntactic or even lexical properties (i.e., identifier names); (3) Based on wheat, we present a novel approach to explaining the predictions of code models through the lens of training data.
    Learning with Noisy Labels by Efficient Transition Matrix Estimation to Combat Label Miscorrection. (arXiv:2111.14932v2 [cs.LG] UPDATED)
    Recent studies on learning with noisy labels have shown remarkable performance by exploiting a small clean dataset. In particular, model agnostic meta-learning-based label correction methods further improve performance by correcting noisy labels on the fly. However, there is no safeguard on the label miscorrection, resulting in unavoidable performance degradation. Moreover, every training step requires at least three back-propagations, significantly slowing down the training speed. To mitigate these issues, we propose a robust and efficient method that learns a label transition matrix on the fly. Employing the transition matrix makes the classifier skeptical about all the corrected samples, which alleviates the miscorrection issue. We also introduce a two-head architecture to efficiently estimate the label transition matrix every iteration within a single back-propagation, so that the estimated matrix closely follows the shifting noise distribution induced by label correction. Extensive experiments demonstrate that our approach shows the best performance in training efficiency while having comparable or better accuracy than existing methods.
    MESH2IR: Neural Acoustic Impulse Response Generator for Complex 3D Scenes. (arXiv:2205.09248v2 [cs.SD] UPDATED)
    We propose a mesh-based neural network (MESH2IR) to generate acoustic impulse responses (IRs) for indoor 3D scenes represented using a mesh. The IRs are used to create a high-quality sound experience in interactive applications and audio processing. Our method can handle input triangular meshes with arbitrary topologies (2K - 3M triangles). We present a novel training technique to train MESH2IR using energy decay relief and highlight its benefits. We also show that training MESH2IR on IRs preprocessed using our proposed technique significantly improves the accuracy of IR generation. We reduce the non-linearity in the mesh space by transforming 3D scene meshes to latent space using a graph convolution network. Our MESH2IR is more than 200 times faster than a geometric acoustic algorithm on a CPU and can generate more than 10,000 IRs per second on an NVIDIA GeForce RTX 2080 Ti GPU for a given furnished indoor 3D scene. The acoustic metrics are used to characterize the acoustic environment. We show that the acoustic metrics of the IRs predicted from our MESH2IR match the ground truth with less than 10% error. We also highlight the benefits of MESH2IR on audio and speech processing applications such as speech dereverberation and speech separation. To the best of our knowledge, ours is the first neural-network-based approach to predict IRs from a given 3D scene mesh in real-time.
    Brain-inspired Graph Spiking Neural Networks for Commonsense Knowledge Representation and Reasoning. (arXiv:2207.05561v1 [cs.NE])
    How neural networks in the human brain represent commonsense knowledge, and complete related reasoning tasks is an important research topic in neuroscience, cognitive science, psychology, and artificial intelligence. Although the traditional artificial neural network using fixed-length vectors to represent symbols has gained good performance in some specific tasks, it is still a black box that lacks interpretability, far from how humans perceive the world. Inspired by the grandmother-cell hypothesis in neuroscience, this work investigates how population encoding and spiking timing-dependent plasticity (STDP) mechanisms can be integrated into the learning of spiking neural networks, and how a population of neurons can represent a symbol via guiding the completion of sequential firing between different neuron populations. The neuron populations of different communities together constitute the entire commonsense knowledge graph, forming a giant graph spiking neural network. Moreover, we introduced the Reward-modulated spiking timing-dependent plasticity (R-STDP) mechanism to simulate the biological reinforcement learning process and completed the related reasoning tasks accordingly, achieving comparable accuracy and faster convergence speed than the graph convolutional artificial neural networks. For the fields of neuroscience and cognitive science, the work in this paper provided the foundation of computational modeling for further exploration of the way the human brain represents commonsense knowledge. For the field of artificial intelligence, this paper indicated the exploration direction for realizing a more robust and interpretable neural network by constructing a commonsense knowledge representation and reasoning spiking neural networks with solid biological plausibility.
    A Robust and Flexible EM Algorithm for Mixtures of Elliptical Distributions with Missing Data. (arXiv:2201.12020v3 [stat.ML] UPDATED)
    This paper tackles the problem of missing data imputation for noisy and non-Gaussian data. A classical imputation method, the Expectation Maximization (EM) algorithm for Gaussian mixture models, has shown interesting properties when compared to other popular approaches such as those based on k-nearest neighbors or on multiple imputations by chained equations. However, Gaussian mixture models are known to be non-robust to heterogeneous data, which can lead to poor estimation performance when the data is contaminated by outliers or follows non-Gaussian distributions. To overcome this issue, a new EM algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data. This paper shows that this problem reduces to the estimation of a mixture of Angular Gaussian distributions under generic assumptions (i.e., each sample is drawn from a mixture of elliptical distributions, which is possibly different for one sample to another). In that case, the complete-data likelihood associated with mixtures of elliptical distributions is well adapted to the EM framework with missing data thanks to its conditional distribution, which is shown to be a multivariate $t$-distribution. Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data. Furthermore, experiments conducted on real-world datasets show that this algorithm is very competitive when compared to other classical imputation methods.
    MAVIPER: Learning Decision Tree Policies for Interpretable Multi-Agent Reinforcement Learning. (arXiv:2205.12449v2 [cs.LG] UPDATED)
    Many recent breakthroughs in multi-agent reinforcement learning (MARL) require the use of deep neural networks, which are challenging for human experts to interpret and understand. On the other hand, existing work on interpretable reinforcement learning (RL) has shown promise in extracting more interpretable decision tree-based policies from neural networks, but only in the single-agent setting. To fill this gap, we propose the first set of algorithms that extract interpretable decision-tree policies from neural networks trained with MARL. The first algorithm, IVIPER, extends VIPER, a recent method for single-agent interpretable RL, to the multi-agent setting. We demonstrate that IVIPER learns high-quality decision-tree policies for each agent. To better capture coordination between agents, we propose a novel centralized decision-tree training algorithm, MAVIPER. MAVIPER jointly grows the trees of each agent by predicting the behavior of the other agents using their anticipated trees, and uses resampling to focus on states that are critical for its interactions with other agents. We show that both algorithms generally outperform the baselines and that MAVIPER-trained agents achieve better-coordinated performance than IVIPER-trained agents on three different multi-agent particle-world environments.
    Transferability-Guided Cross-Domain Cross-Task Transfer Learning. (arXiv:2207.05510v1 [cs.CV])
    We propose two novel transferability metrics F-OTCE (Fast Optimal Transport based Conditional Entropy) and JC-OTCE (Joint Correspondence OTCE) to evaluate how much the source model (task) can benefit the learning of the target task and to learn more transferable representations for cross-domain cross-task transfer learning. Unlike the existing metric that requires evaluating the empirical transferability on auxiliary tasks, our metrics are auxiliary-free such that they can be computed much more efficiently. Specifically, F-OTCE estimates transferability by first solving an Optimal Transport (OT) problem between source and target distributions, and then uses the optimal coupling to compute the Negative Conditional Entropy between source and target labels. It can also serve as a loss function to maximize the transferability of the source model before finetuning on the target task. Meanwhile, JC-OTCE improves the transferability robustness of F-OTCE by including label distances in the OT problem, though it may incur additional computation cost. Extensive experiments demonstrate that F-OTCE and JC-OTCE outperform state-of-the-art auxiliary-free metrics by 18.85% and 28.88%, respectively in correlation coefficient with the ground-truth transfer accuracy. By eliminating the training cost of auxiliary tasks, the two metrics reduces the total computation time of the previous method from 43 minutes to 9.32s and 10.78s, respectively, for a pair of tasks. When used as a loss function, F-OTCE shows consistent improvements on the transfer accuracy of the source model in few-shot classification experiments, with up to 4.41% accuracy gain.
    DDI Prediction via Heterogeneous Graph Attention Networks. (arXiv:2207.05672v1 [cs.LG])
    Polypharmacy, defined as the use of multiple drugs together, is a standard treatment method, especially for severe and chronic diseases. However, using multiple drugs together may cause interactions between drugs. Drug-drug interaction (DDI) is the activity that occurs when the impact of one drug changes when combined with another. DDIs may obstruct, increase, or decrease the intended effect of either drug or, in the worst-case scenario, create adverse side effects. While it is critical to detect DDIs on time, it is timeconsuming and expensive to identify them in clinical trials due to their short duration and many possible drug pairs to be considered for testing. As a result, computational methods are needed for predicting DDIs. In this paper, we present a novel heterogeneous graph attention model, HAN-DDI to predict drug-drug interactions. We create a heterogeneous network of drugs with different biological entities. Then, we develop a heterogeneous graph attention network to learn DDIs using relations of drugs with other entities. It consists of an attention-based heterogeneous graph node encoder for obtaining drug node representations and a decoder for predicting drug-drug interactions. Further, we utilize comprehensive experiments to evaluate of our model and to compare it with state-of-the-art models. Experimental results show that our proposed method, HAN-DDI, outperforms the baselines significantly and accurately predicts DDIs, even for new drugs.
    Large Language Models Can Be Strong Differentially Private Learners. (arXiv:2110.05679v3 [cs.LG] UPDATED)
    Differentially Private (DP) learning has seen limited success for building large deep learning models of text, and attempts at straightforwardly applying Differentially Private Stochastic Gradient Descent (DP-SGD) to NLP tasks have resulted in large performance drops and high computational overhead. We show that this performance drop can be mitigated with (1) the use of large pretrained models; (2) hyperparameters that suit DP optimization; and (3) fine-tuning objectives aligned with the pretraining procedure. With these factors set right, we obtain private NLP models that outperform state-of-the-art private training approaches and strong non-private baselines -- by directly fine-tuning pretrained models with DP optimization on moderately-sized corpora. To address the computational challenge of running DP-SGD with large Transformers, we propose a memory saving technique that allows clipping in DP-SGD to run without instantiating per-example gradients for any layer in the model. The technique enables privately training Transformers with almost the same memory cost as non-private training at a modest run-time overhead. Contrary to conventional wisdom that DP optimization fails at learning high-dimensional models (due to noise that scales with dimension) empirical results reveal that private learning with pretrained models tends to not suffer from dimension-dependent performance degradation.
    Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment. (arXiv:2203.15937v3 [eess.AS] UPDATED)
    Current leading mispronunciation detection and diagnosis (MDD) systems achieve promising performance via end-to-end phoneme recognition. One challenge of such end-to-end solutions is the scarcity of human-annotated phonemes on natural L2 speech. In this work, we leverage unlabeled L2 speech via a pseudo-labeling (PL) procedure and extend the fine-tuning approach based on pre-trained self-supervised learning (SSL) models. Specifically, we use Wav2vec 2.0 as our SSL model, and fine-tune it using original labeled L2 speech samples plus the created pseudo-labeled L2 speech samples. Our pseudo labels are dynamic and are produced by an ensemble of the online model on-the-fly, which ensures that our model is robust to pseudo label noise. We show that fine-tuning with pseudo labels achieves a 5.35% phoneme error rate reduction and 2.48% MDD F1 score improvement over a labeled-samples-only fine-tuning baseline. The proposed PL method is also shown to outperform conventional offline PL methods. Compared to the state-of-the-art MDD systems, our MDD solution produces a more accurate and consistent phonetic error diagnosis. In addition, we conduct an open test on a separate UTD-4Accents dataset, where our system recognition outputs show a strong correlation with human perception, based on accentedness and intelligibility.
    Accelerated Reinforcement Learning for Temporal Logic Control Objectives. (arXiv:2205.04424v3 [cs.RO] UPDATED)
    This paper addresses the problem of learning control policies for mobile robots, modeled as unknown Markov Decision Processes (MDPs), that are tasked with temporal logic missions, such as sequencing, coverage, or surveillance. The MDP captures uncertainty in the workspace structure and the outcomes of control decisions. The control objective is to synthesize a control policy that maximizes the probability of accomplishing a high-level task, specified as a Linear Temporal Logic (LTL) formula. To address this problem, we propose a novel accelerated model-based reinforcement learning (RL) algorithm for LTL control objectives that is capable of learning control policies significantly faster than related approaches. Its sample-efficiency relies on biasing exploration towards directions that may contribute to task satisfaction. This is accomplished by leveraging an automaton representation of the LTL task as well as a continuously learned MDP model. Finally, we provide comparative experiments that demonstrate the sample efficiency of the proposed method against recent RL methods for LTL objectives.
    Improving the Robustness and Generalization of Deep Neural Network with Confidence Threshold Reduction. (arXiv:2206.00913v2 [cs.LG] UPDATED)
    Deep neural networks are easily attacked by imperceptible perturbation. Presently, adversarial training (AT) is the most effective method to enhance the robustness of the model against adversarial examples. However, because adversarial training solved a min-max value problem, in comparison with natural training, the robustness and generalization are contradictory, i.e., the robustness improvement of the model will decrease the generalization of the model. To address this issue, in this paper, a new concept, namely confidence threshold (CT), is introduced and the reducing of the confidence threshold, known as confidence threshold reduction (CTR), is proven to improve both the generalization and robustness of the model. Specifically, to reduce the CT for natural training (i.e., for natural training with CTR), we propose a mask-guided divergence loss function (MDL) consisting of a cross-entropy loss term and an orthogonal term. The empirical and theoretical analysis demonstrates that the MDL loss improves the robustness and generalization of the model simultaneously for natural training. However, the model robustness improvement of natural training with CTR is not comparable to that of adversarial training. Therefore, for adversarial training, we propose a standard deviation loss function (STD), which minimizes the difference in the probabilities of the wrong categories, to reduce the CT by being integrated into the loss function of adversarial training. The empirical and theoretical analysis demonstrates that the STD based loss function can further improve the robustness of the adversarially trained model on basis of guaranteeing the changeless or slight improvement of the natural accuracy.
    Differentiable Physics Simulations with Contacts: Do They Have Correct Gradients w.r.t. Position, Velocity and Control?. (arXiv:2207.05060v1 [cs.LG])
    In recent years, an increasing amount of work has focused on differentiable physics simulation and has produced a set of open source projects such as Tiny Differentiable Simulator, Nimble Physics, diffTaichi, Brax, Warp, Dojo and DiffCoSim. By making physics simulations end-to-end differentiable, we can perform gradient-based optimization and learning tasks. A majority of differentiable simulators consider collisions and contacts between objects, but they use different contact models for differentiability. In this paper, we overview four kinds of differentiable contact formulations - linear complementarity problems (LCP), convex optimization models, compliant models and position-based dynamics (PBD). We analyze and compare the gradients calculated by these models and show that the gradients are not always correct. We also demonstrate their ability to learn an optimal control strategy by comparing the learned strategies with the optimal strategy in an analytical form. The codebase to reproduce the experiment results is available at https://github.com/DesmondZhong/diff_sim_grads.
    DGPO: Discovering Multiple Strategies with Diversity-Guided Policy Optimization. (arXiv:2207.05631v1 [cs.LG])
    Recent algorithms designed for reinforcement learning tasks focus on finding a single optimal solution. However, in many practical applications, it is important to develop reasonable agents with diverse strategies. In this paper, we propose Diversity-Guided Policy Optimization (DGPO), an on-policy framework for discovering multiple strategies for the same task. Our algorithm uses diversity objectives to guide a latent code conditioned policy to learn a set of diverse strategies in a single training procedure. Specifically, we formalize our algorithm as the combination of a diversity-constrained optimization problem and an extrinsic-reward constrained optimization problem. And we solve the constrained optimization as a probabilistic inference task and use policy iteration to maximize the derived lower bound. Experimental results show that our method efficiently finds diverse strategies in a wide variety of reinforcement learning tasks. We further show that DGPO achieves a higher diversity score and has similar sample complexity and performance compared to other baselines.
    CANF-VC: Conditional Augmented Normalizing Flows for Video Compression. (arXiv:2207.05315v1 [cs.CV])
    This paper presents an end-to-end learning-based video compression system, termed CANF-VC, based on conditional augmented normalizing flows (ANF). Most learned video compression systems adopt the same hybrid-based coding architecture as the traditional codecs. Recent research on conditional coding has shown the sub-optimality of the hybrid-based coding and opens up opportunities for deep generative models to take a key role in creating new coding frameworks. CANF-VC represents a new attempt that leverages the conditional ANF to learn a video generative model for conditional inter-frame coding. We choose ANF because it is a special type of generative model, which includes variational autoencoder as a special case and is able to achieve better expressiveness. CANF-VC also extends the idea of conditional coding to motion coding, forming a purely conditional coding framework. Extensive experimental results on commonly used datasets confirm the superiority of CANF-VC to the state-of-the-art methods.
    Synergistic Self-supervised and Quantization Learning. (arXiv:2207.05432v1 [cs.CV])
    With the success of self-supervised learning (SSL), it has become a mainstream paradigm to fine-tune from self-supervised pretrained models to boost the performance on downstream tasks. However, we find that current SSL models suffer severe accuracy drops when performing low-bit quantization, prohibiting their deployment in resource-constrained applications. In this paper, we propose a method called synergistic self-supervised and quantization learning (SSQL) to pretrain quantization-friendly self-supervised models facilitating downstream deployment. SSQL contrasts the features of the quantized and full precision models in a self-supervised fashion, where the bit-width for the quantized model is randomly selected in each step. SSQL not only significantly improves the accuracy when quantized to lower bit-widths, but also boosts the accuracy of full precision models in most cases. By only training once, SSQL can then benefit various downstream tasks at different bit-widths simultaneously. Moreover, the bit-width flexibility is achieved without additional storage overhead, requiring only one copy of weights during training and inference. We theoretically analyze the optimization process of SSQL, and conduct exhaustive experiments on various benchmarks to further demonstrate the effectiveness of our method. Our code is available at https://github.com/megvii-research/SSQL-ECCV2022.
    Representation learning with function call graph transformations for malware open set recognition. (arXiv:2205.06918v3 [cs.CR] UPDATED)
    Open set recognition (OSR) problem has been a challenge in many machine learning (ML) applications, such as security. As new/unknown malware families occur regularly, it is difficult to exhaust samples that cover all the classes for the training process in ML systems. An advanced malware classification system should classify the known classes correctly while sensitive to the unknown class. In this paper, we introduce a self-supervised pre-training approach for the OSR problem in malware classification. We propose two transformations for the function call graph (FCG) based malware representations to facilitate the pretext task. Also, we present a statistical thresholding approach to find the optimal threshold for the unknown class. Moreover, the experiment results indicate that our proposed pre-training process can improve different performances of different downstream loss functions for the OSR problem.
    Susceptibility of Continual Learning Against Adversarial Attacks. (arXiv:2207.05225v1 [cs.LG])
    The recent advances in continual (incremental or lifelong) learning have concentrated on the prevention of forgetting that can lead to catastrophic consequences, but there are two outstanding challenges that must be addressed. The first is the evaluation of the robustness of the proposed methods. The second is ensuring the security of learned tasks remains largely unexplored. This paper presents a comprehensive study of the susceptibility of the continually learned tasks (including both current and previously learned tasks) that are vulnerable to forgetting. Such vulnerability of tasks against adversarial attacks raises profound issues in data integrity and privacy. We consider the task incremental learning (Task-IL) scenario and explore three regularization-based experiments, three replay-based experiments, and one hybrid technique based on the reply and exemplar approach. We examine the robustness of these methods. In particular, we consider cases where we demonstrate that any class belonging to the current or previously learned tasks is prone to misclassification. Our observations highlight the potential limitations of existing Task-IL approaches. Our empirical study recommends that the research community consider the robustness of the proposed continual learning approaches and invest extensive efforts in mitigating catastrophic forgetting.
    Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model. (arXiv:2207.05437v1 [cs.LG])
    Online learning to rank (OLTR) interactively learns to choose lists of items from a large collection based on certain click models that describe users' click behaviors. Most recent works for this problem focus on the stochastic environment where the item attractiveness is assumed to be invariant during the learning process. In many real-world scenarios, however, the environment could be dynamic or even arbitrarily changing. This work studies the OLTR problem in both stochastic and adversarial environments under the position-based model (PBM). We propose a method based on the follow-the-regularized-leader (FTRL) framework with Tsallis entropy and develop a new self-bounding constraint especially designed for PBM. We prove the proposed algorithm simultaneously achieves $O(\log{T})$ regret in the stochastic environment and $O(m\sqrt{nT})$ regret in the adversarial environment, where $T$ is the number of rounds, $n$ is the number of items and $m$ is the number of positions. We also provide a lower bound of order $\Omega(m\sqrt{nT})$ for adversarial PBM, which matches our upper bound and improves over the state-of-the-art lower bound. The experiments show that our algorithm could simultaneously learn in both stochastic and adversarial environments and is competitive compared to existing methods that are designed for a single environment.
    A Benchmark dataset for predictive maintenance. (arXiv:2207.05466v1 [cs.LG])
    The paper describes the Railway data set, an outcome of a Predictive Maintenance project with an urban metro public transportation service in Porto, Portugal. The data was collected between 2020 and 2022 that aimed to develop machine learning methods for online anomaly detection and failure prediction. By capturing several analogic sensor signals (pressure, temperature, current consumption), digital signals (control signals, discrete signals), and GPS information (latitude, longitude, and speed), we provide a framework that can be easily used and developed for the new machine learning methods. We believe this dataset contains some interesting characteristics and can be a good benchmark for predictive maintenance models.
    Transformer Compressed Sensing via Global Image Tokens. (arXiv:2203.12861v3 [cs.CV] UPDATED)
    Convolutional neural networks (CNN) have demonstrated outstanding Compressed Sensing (CS) performance compared to traditional, hand-crafted methods. However, they are broadly limited in terms of generalisability, inductive bias and difficulty to model long distance relationships. Transformer neural networks (TNN) overcome such issues by implementing an attention mechanism designed to capture dependencies between inputs. However, high-resolution tasks typically require vision Transformers (ViT) to decompose an image into patch-based tokens, limiting inputs to inherently local contexts. We propose a novel image decomposition that naturally embeds images into low-resolution inputs. These Kaleidoscope tokens (KD) provide a mechanism for global attention, at the same computational cost as a patch-based approach. To showcase this development, we replace CNN components in a well-known CS-MRI neural network with TNN blocks and demonstrate the improvements afforded by KD. We also propose an ensemble of image tokens, which enhance overall image quality and reduces model size. Supplementary material is available: https://github.com/uqmarlonbran/TCS.git
    Improved Rates for Differentially Private Stochastic Convex Optimization with Heavy-Tailed Data. (arXiv:2106.01336v5 [cs.LG] UPDATED)
    We study stochastic convex optimization with heavy-tailed data under the constraint of differential privacy (DP). Most prior work on this problem is restricted to the case where the loss function is Lipschitz. Instead, as introduced by Wang, Xiao, Devadas, and Xu \cite{WangXDX20}, we study general convex loss functions with the assumption that the distribution of gradients has bounded $k$-th moments. We provide improved upper bounds on the excess population risk under concentrated DP for convex and strongly convex loss functions. Along the way, we derive new algorithms for private mean estimation of heavy-tailed distributions, under both pure and concentrated DP. Finally, we prove nearly-matching lower bounds for private stochastic convex optimization with strongly convex losses and mean estimation, showing new separations between pure and concentrated DP.
    EAGAN: Efficient Two-stage Evolutionary Architecture Search for GANs. (arXiv:2111.15097v2 [cs.CV] UPDATED)
    Generative adversarial networks (GANs) have proven successful in image generation tasks. However, GAN training is inherently unstable. Although many works try to stabilize it by manually modifying GAN architecture, it requires much expertise. Neural architecture search (NAS) has become an attractive solution to search GANs automatically. The early NAS-GANs search only generators to reduce search complexity but lead to a sub-optimal GAN. Some recent works try to search both generator (G) and discriminator (D), but they suffer from the instability of GAN training. To alleviate the instability, we propose an efficient two-stage evolutionary algorithm-based NAS framework to search GANs, namely EAGAN. We decouple the search of G and D into two stages, where stage-1 searches G with a fixed D and adopts the many-to-one training strategy, and stage-2 searches D with the optimal G found in stage-1 and adopts the one-to-one training and weight-resetting strategies to enhance the stability of GAN training. Both stages use the non-dominated sorting method to produce Pareto-front architectures under multiple objectives (e.g., model size, Inception Score (IS), and Fr\'echet Inception Distance (FID)). EAGAN is applied to the unconditional image generation task and can efficiently finish the search on the CIFAR-10 dataset in 1.2 GPU days. Our searched GANs achieve competitive results (IS=8.81$\pm$0.10, FID=9.91) on the CIFAR-10 dataset and surpass prior NAS-GANs on the STL-10 dataset (IS=10.44$\pm$0.087, FID=22.18). Source code: https://github.com/marsggbo/EAGAN.
    Learning Continuous Grasping Function with a Dexterous Hand from Human Demonstrations. (arXiv:2207.05053v2 [cs.RO] UPDATED)
    We propose to learn to generate grasping motion for manipulation with a dexterous hand using implicit functions. With continuous time inputs, the model can generate a continuous and smooth grasping plan. We name the proposed model Continuous Grasping Function (CGF). CGF is learned via generative modeling with a Conditional Variational Autoencoder using 3D human demonstrations. We will first convert the large-scale human-object interaction trajectories to robot demonstrations via motion retargeting, and then use these demonstrations to train CGF. During inference, we perform sampling with CGF to generate different grasping plans in the simulator and select the successful ones to transfer to the real robot. By training on diverse human data, our CGF allows generalization to manipulate multiple objects. Compared to previous planning algorithms, CGF is more efficient and achieves significant improvement on success rate when transferred to grasping with the real Allegro Hand. Our project page is at https://jianglongye.com/cgf .
    A semi-supervised geometric-driven methodology for supervised fishing activity detection on multi-source AIS tracking messages. (arXiv:2207.05514v1 [cs.LG])
    Automatic Identification System (AIS) messages are useful for tracking vessel activity across oceans worldwide using radio links and satellite transceivers. Such data plays a significant role in tracking vessel activity and mapping mobility patterns such as those found in fishing. Accordingly, this paper proposes a geometric-driven semi-supervised approach for fishing activity detection from AIS data. Through the proposed methodology we show how to explore the information included in the messages to extract features describing the geometry of the vessel route. To this end, we leverage the unsupervised nature of cluster analysis to label the trajectory geometry highlighting the changes in the vessel's moving pattern which tends to indicate fishing activity. The labels obtained by the proposed unsupervised approach are used to detect fishing activities, which we approach as a time-series classification task. In this context, we propose a solution using recurrent neural networks on AIS data streams with roughly 87% of the overall $F$-score on the whole trajectories of 50 different unseen fishing vessels. Such results are accompanied by a broad benchmark study assessing the performance of different Recurrent Neural Network (RNN) architectures. In conclusion, this work contributes by proposing a thorough process that includes data preparation, labeling, data modeling, and model validation. Therefore, we present a novel solution for mobility pattern detection that relies upon unfolding the trajectory in time and observing their inherent geometry.
    Contrastive Learning for Online Semi-Supervised General Continual Learning. (arXiv:2207.05615v1 [cs.LG])
    We study Online Continual Learning with missing labels and propose SemiCon, a new contrastive loss designed for partly labeled data. We demonstrate its efficiency by devising a memory-based method trained on an unlabeled data stream, where every data added to memory is labeled using an oracle. Our approach outperforms existing semi-supervised methods when few labels are available, and obtain similar results to state-of-the-art supervised methods while using only 2.6% of labels on Split-CIFAR10 and 10% of labels on Split-CIFAR100.
    Insights into Deep Non-linear Filters for Improved Multi-channel Speech Enhancement. (arXiv:2206.13310v2 [eess.AS] UPDATED)
    The key advantage of using multiple microphones for speech enhancement is that spatial filtering can be used to complement the tempo-spectral processing. In a traditional setting, linear spatial filtering (beamforming) and single-channel post-filtering are commonly performed separately. In contrast, there is a trend towards employing deep neural networks (DNNs) to learn a joint spatial and tempo-spectral non-linear filter, which means that the restriction of a linear processing model and that of a separate processing of spatial and tempo-spectral information can potentially be overcome. However, the internal mechanisms that lead to good performance of such data-driven filters for multi-channel speech enhancement are not well understood. Therefore, in this work, we analyse the properties of a non-linear spatial filter realized by a DNN as well as its interdependency with temporal and spectral processing by carefully controlling the information sources (spatial, spectral, and temporal) available to the network. We confirm the superiority of a non-linear spatial processing model, which outperforms an oracle linear spatial filter in a challenging speaker extraction scenario for a low number of microphones by 0.24 POLQA score. Our analyses reveal that in particular spectral information should be processed jointly with spatial information as this increases the spatial selectivity of the filter. Our systematic evaluation then leads to a simple network architecture, that outperforms state-of-the-art network architectures on a speaker extraction task by 0.22 POLQA score and by 0.32 POLQA score on the CHiME3 data.
    Multi-Model Federated Learning with Provable Guarantees. (arXiv:2207.04330v2 [cs.LG] UPDATED)
    Federated Learning (FL) is a variant of distributed learning where edge devices collaborate to learn a model without sharing their data with the central server or each other. We refer to the process of training multiple independent models simultaneously in a federated setting using a common pool of clients as multi-model FL. In this work, we propose two variants of the popular FedAvg algorithm for multi-model FL, with provable convergence guarantees. We further show that for the same amount of computation, multi-model FL can have better performance than training each model separately. We supplement our theoretical results with experiments in strongly convex, convex, and non-convex settings.
    Offline Equilibrium Finding. (arXiv:2207.05285v1 [cs.AI])
    Offline reinforcement learning (Offline RL) is an emerging field that has recently begun gaining attention across various application domains due to its ability to learn behavior from earlier collected datasets. Using logged data is imperative when further interaction with the environment is expensive (computationally or otherwise), unsafe, or entirely unfeasible. Offline RL proved very successful, paving a path to solving previously intractable real-world problems, and we aim to generalize this paradigm to a multi-agent or multiplayer-game setting. Very little research has been done in this area, as the progress is hindered by the lack of standardized datasets and meaningful benchmarks. In this work, we coin the term offline equilibrium finding (OEF) to describe this area and construct multiple datasets consisting of strategies collected across a wide range of games using several established methods. We also propose a benchmark method -- an amalgamation of a behavior-cloning and a model-based algorithm. Our two model-based algorithms -- OEF-PSRO and OEF-CFR -- are adaptations of the widely-used equilibrium finding algorithms Deep CFR and PSRO in the context of offline learning. In the empirical part, we evaluate the performance of the benchmark algorithms on the constructed datasets. We hope that our efforts may help to accelerate research in large-scale equilibrium finding. Datasets and code are available at https://github.com/SecurityGames/oef.
    Inner Monologue: Embodied Reasoning through Planning with Language Models. (arXiv:2207.05608v1 [cs.RO])
    Recent works have shown how the reasoning capabilities of Large Language Models (LLMs) can be applied to domains beyond natural language processing, such as planning and interaction for robots. These embodied problems require an agent to understand many semantic aspects of the world: the repertoire of skills available, how these skills influence the world, and how changes to the world map back to the language. LLMs planning in embodied environments need to consider not just what skills to do, but also how and when to do them - answers that change over time in response to the agent's own choices. In this work, we investigate to what extent LLMs used in such embodied contexts can reason over sources of feedback provided through natural language, without any additional training. We propose that by leveraging environment feedback, LLMs are able to form an inner monologue that allows them to more richly process and plan in robotic control scenarios. We investigate a variety of sources of feedback, such as success detection, scene description, and human interaction. We find that closed-loop language feedback significantly improves high-level instruction completion on three domains, including simulated and real table top rearrangement tasks and long-horizon mobile manipulation tasks in a kitchen environment in the real world.
    Label-Efficient Self-Supervised Speaker Verification With Information Maximization and Contrastive Learning. (arXiv:2207.05506v1 [eess.AS])
    State-of-the-art speaker verification systems are inherently dependent on some kind of human supervision as they are trained on massive amounts of labeled data. However, manually annotating utterances is slow, expensive and not scalable to the amount of data available today. In this study, we explore self-supervised learning for speaker verification by learning representations directly from raw audio. The objective is to produce robust speaker embeddings that have small intra-speaker and large inter-speaker variance. Our approach is based on recent information maximization learning frameworks and an intensive data augmentation pre-processing step. We evaluate the ability of these methods to work without contrastive samples before showing that they achieve better performance when combined with a contrastive loss. Furthermore, we conduct experiments to show that our method reaches competitive results compared to existing techniques and can get better performances compared to a supervised baseline when fine-tuned with a small portion of labeled data.
    Truly Sparse Neural Networks at Scale. (arXiv:2102.01732v2 [cs.LG] UPDATED)
    Recently, sparse training methods have started to be established as a de facto approach for training and inference efficiency in artificial neural networks. Yet, this efficiency is just in theory. In practice, everyone uses a binary mask to simulate sparsity since the typical deep learning software and hardware are optimized for dense matrix operations. In this paper, we take an orthogonal approach, and we show that we can train truly sparse neural networks to harvest their full potential. To achieve this goal, we introduce three novel contributions, specially designed for sparse neural networks: (1) a parallel training algorithm and its corresponding sparse implementation from scratch, (2) an activation function with non-trainable parameters to favour the gradient flow, and (3) a hidden neurons importance metric to eliminate redundancies. All in one, we are able to break the record and to train the largest neural network ever trained in terms of representational power -- reaching the bat brain size. The results show that our approach has state-of-the-art performance while opening the path for an environmentally friendly artificial intelligence era.
    Propagating State Uncertainty Through Trajectory Forecasting. (arXiv:2110.03267v4 [cs.RO] UPDATED)
    Uncertainty pervades through the modern robotic autonomy stack, with nearly every component (e.g., sensors, detection, classification, tracking, behavior prediction) producing continuous or discrete probabilistic distributions. Trajectory forecasting, in particular, is surrounded by uncertainty as its inputs are produced by (noisy) upstream perception and its outputs are predictions that are often probabilistic for use in downstream planning. However, most trajectory forecasting methods do not account for upstream uncertainty, instead taking only the most-likely values. As a result, perceptual uncertainties are not propagated through forecasting and predictions are frequently overconfident. To address this, we present a novel method for incorporating perceptual state uncertainty in trajectory forecasting, a key component of which is a new statistical distance-based loss function which encourages predicting uncertainties that better match upstream perception. We evaluate our approach both in illustrative simulations and on large-scale, real-world data, demonstrating its efficacy in propagating perceptual state uncertainty through prediction and producing more calibrated predictions.
    High-dimensional Inference for Dynamic Treatment Effects. (arXiv:2110.04924v3 [stat.ME] UPDATED)
    This paper proposes a confidence interval construction for heterogeneous treatment effects in the context of multi-stage experiments with $N$ samples and high-dimensional, $d$, confounders. Our focus is on the case of $d\gg N$, but the results obtained also apply to low-dimensional cases. We showcase that the bias of regularized estimation, unavoidable in high-dimensional covariate spaces, is mitigated with a simple double-robust score. In this way, no additional bias removal is necessary, and we obtain root-$N$ inference results while allowing multi-stage interdependency of the treatments and covariates. Memoryless property is also not assumed; treatment can possibly depend on all previous treatment assignments and all previous multi-stage confounders. Our results rely on certain sparsity assumptions of the underlying dependencies. We discover new product rate conditions necessary for robust inference with dynamic treatments.
    Grounding Aleatoric Uncertainty in Unsupervised Environment Design. (arXiv:2207.05219v1 [cs.LG])
    Adaptive curricula in reinforcement learning (RL) have proven effective for producing policies robust to discrepancies between the train and test environment. Recently, the Unsupervised Environment Design (UED) framework generalized RL curricula to generating sequences of entire environments, leading to new methods with robust minimax regret properties. Problematically, in partially-observable or stochastic settings, optimal policies may depend on the ground-truth distribution over aleatoric parameters of the environment in the intended deployment setting, while curriculum learning necessarily shifts the training distribution. We formalize this phenomenon as curriculum-induced covariate shift (CICS), and describe how its occurrence in aleatoric parameters can lead to suboptimal policies. Directly sampling these parameters from the ground-truth distribution avoids the issue, but thwarts curriculum learning. We propose SAMPLR, a minimax regret UED method that optimizes the ground-truth utility function, even when the underlying training data is biased due to CICS. We prove, and validate on challenging domains, that our approach preserves optimality under the ground-truth distribution, while promoting robustness across the full range of environment settings.
    Structure-Enhanced Pop Music Generation via Harmony-Aware Learning. (arXiv:2109.06441v2 [cs.SD] UPDATED)
    Pop music generation has always been an attractive topic for both musicians and scientists for a long time. However, automatically composing pop music with a satisfactory structure is still a challenging issue. In this paper, we propose to leverage harmony-aware learning for structure-enhanced pop music generation. On the one hand, one of the participants of harmony, chord, represents the harmonic set of multiple notes, which is integrated closely with the spatial structure of music, the texture. On the other hand, the other participant of harmony, chord progression, usually accompanies the development of the music, which promotes the temporal structure of music, the form. Moreover, when chords evolve into chord progression, the texture and form can be bridged by the harmony naturally, which contributes to the joint learning of the two structures. Furthermore, we propose the Harmony-Aware Hierarchical Music Transformer (HAT), which can exploit the structure adaptively from the music, and make the musical tokens interact hierarchically to enhance the structure in multi-level musical elements. Experimental results reveal that compared to the existing methods, HAT owns a much better understanding of the structure and it can also improve the quality of generated music, especially in the form and texture.
    Sliced-Wasserstein normalizing flows: beyond maximum likelihood training. (arXiv:2207.05468v1 [stat.ML])
    Despite their advantages, normalizing flows generally suffer from several shortcomings including their tendency to generate unrealistic data (e.g., images) and their failing to detect out-of-distribution data. One reason for these deficiencies lies in the training strategy which traditionally exploits a maximum likelihood principle only. This paper proposes a new training paradigm based on a hybrid objective function combining the maximum likelihood principle (MLE) and a sliced-Wasserstein distance. Results obtained on synthetic toy examples and real image data sets show better generative abilities in terms of both likelihood and visual aspects of the generated samples. Reciprocally, the proposed approach leads to a lower likelihood of out-of-distribution data, demonstrating a greater data fidelity of the resulting flows.
    Markovian Gaussian Process Variational Autoencoders. (arXiv:2207.05543v1 [cs.LG])
    Deep generative models are widely used for modelling high-dimensional time series, such as video animations, audio and climate data. Sequential variational autoencoders have been successfully considered for many applications, with many variant models relying on discrete-time methods and recurrent neural networks (RNNs). On the other hand, continuous-time methods have recently gained attraction, especially in the context of irregularly-sampled time series, where they can better handle the data than discrete-time methods. One such class are Gaussian process variational autoencoders (GPVAEs), where the VAE prior is set as a Gaussian process (GPs), allowing inductive biases to be explicitly encoded via the kernel function and interpretability of the latent space. However, a major limitation of GPVAEs is that it inherits the same cubic computational cost as GPs. In this work, we leverage the equivalent discrete state space representation of Markovian GPs to enable a linear-time GP solver via Kalman filtering and smoothing. We show via corrupt and missing frames tasks that our method performs favourably, especially on the latter where it outperforms RNN-based models.
    A Newton-CG based barrier method for finding a second-order stationary point of nonconvex conic optimization with complexity guarantees. (arXiv:2207.05697v1 [math.OC])
    In this paper we consider finding an approximate second-order stationary point (SOSP) of nonconvex conic optimization that minimizes a twice differentiable function over the intersection of an affine subspace and a convex cone. In particular, we propose a Newton-conjugate gradient (Newton-CG) based barrier method for finding an $(\epsilon,\sqrt{\epsilon})$-SOSP of this problem. Our method is not only implementable, but also achieves an iteration complexity of ${\cal O}(\epsilon^{-3/2})$, which matches the best known iteration complexity of second-order methods for finding an $(\epsilon,\sqrt{\epsilon})$-SOSP of unconstrained nonconvex optimization. The operation complexity of $\widetilde{\cal O}(\epsilon^{-3/2}\min\{n,\epsilon^{-1/4}\})$, measured by the amount of fundamental operations, is also established for our method.
    The Neural-Prediction based Acceleration Algorithm of Column Generation for Graph-Based Set Covering Problems. (arXiv:2207.01411v2 [cs.LG] UPDATED)
    Set covering problem is an important class of combinatorial optimization problems, which has been widely applied and studied in many fields. In this paper, we propose an improved column generation algorithm with neural prediction (CG-P) for solving graph-based set covering problems. We leverage a graph neural network based neural prediction model to predict the probability to be included in the final solution for each edge. Our CG-P algorithm constructs a reduced graph that only contains the edges with higher predicted probability, and this graph reduction process significantly speeds up the solution process. We evaluate the CG-P algorithm on railway crew scheduling problems and it outperforms the baseline column generation algorithm. We provide two solution modes for our CG-P algorithm. In the optimal mode, we can obtain a solution with an optimality guarantee while reducing the time cost to 63.12%. In the fast mode, we can obtain a sub-optimal solution with a 7.62% optimality gap in only 2.91% computation time.
    Joint NMF for Identification of Shared Features in Datasets and a Dataset Distance Measure. (arXiv:2207.05112v1 [cs.LG])
    In this paper, we derive a new method for determining shared features of datasets by employing joint non-negative matrix factorization and analyzing the resulting factorizations. Our approach uses the joint factorization of two dataset matrices $X_1,X_2$ into non-negative matrices $X_1 = AS_1, X_2 = AS_2$ to derive a similarity measure that determines how well a shared basis for $X_1, X_2$ approximates each dataset. We also propose a dataset distance measure built upon this method and the learned factorization. Our method is able to successfully identity differences in structure in both image and text datasets. Potential applications include classification, detecting plagiarism or other manipulation, and learning relationships between data sets.
    Revisiting Inlier and Outlier Specification for Improved Out-of-Distribution Detection. (arXiv:2207.05286v1 [cs.CV])
    Accurately detecting out-of-distribution (OOD) data with varying levels of semantic and covariate shifts with respect to the in-distribution (ID) data is critical for deployment of safe and reliable models. This is particularly the case when dealing with highly consequential applications (e.g. medical imaging, self-driving cars, etc). The goal is to design a detector that can accept meaningful variations of the ID data, while also rejecting examples from OOD regimes. In practice, this dual objective can be realized by enforcing consistency using an appropriate scoring function (e.g., energy) and calibrating the detector to reject a curated set of OOD data (referred to as outlier exposure or shortly OE). While OE methods are widely adopted, assembling representative OOD datasets is both costly and challenging due to the unpredictability of real-world scenarios, hence the recent trend of designing OE-free detectors. In this paper, we make a surprising finding that controlled generalization to ID variations and exposure to diverse (synthetic) outlier examples are essential to simultaneously improving semantic and modality shift detection. In contrast to existing methods, our approach samples inliers in the latent space, and constructs outlier examples via negative data augmentation. Through a rigorous empirical study on medical imaging benchmarks (MedMNIST, ISIC2019 and NCT), we demonstrate significant performance gains ($15\% - 35\%$ in AUROC) over existing OE-free, OOD detection approaches under both semantic and modality shifts.
    PAC-Bayesian Domain Adaptation Bounds for Multiclass Learners. (arXiv:2207.05685v1 [cs.LG])
    Multiclass neural networks are a common tool in modern unsupervised domain adaptation, yet an appropriate theoretical description for their non-uniform sample complexity is lacking in the adaptation literature. To fill this gap, we propose the first PAC-Bayesian adaptation bounds for multiclass learners. We facilitate practical use of our bounds by also proposing the first approximation techniques for the multiclass distribution divergences we consider. For divergences dependent on a Gibbs predictor, we propose additional PAC-Bayesian adaptation bounds which remove the need for inefficient Monte-Carlo estimation. Empirically, we test the efficacy of our proposed approximation techniques as well as some novel design-concepts which we include in our bounds. Finally, we apply our bounds to analyze a common adaptation algorithm that uses neural networks.
    Adversarial Robustness Assessment of NeuroEvolution Approaches. (arXiv:2207.05451v1 [cs.NE])
    NeuroEvolution automates the generation of Artificial Neural Networks through the application of techniques from Evolutionary Computation. The main goal of these approaches is to build models that maximize predictive performance, sometimes with an additional objective of minimizing computational complexity. Although the evolved models achieve competitive results performance-wise, their robustness to adversarial examples, which becomes a concern in security-critical scenarios, has received limited attention. In this paper, we evaluate the adversarial robustness of models found by two prominent NeuroEvolution approaches on the CIFAR-10 image classification task: DENSER and NSGA-Net. Since the models are publicly available, we consider white-box untargeted attacks, where the perturbations are bounded by either the L2 or the Linfinity-norm. Similarly to manually-designed networks, our results show that when the evolved models are attacked with iterative methods, their accuracy usually drops to, or close to, zero under both distance metrics. The DENSER model is an exception to this trend, showing some resistance under the L2 threat model, where its accuracy only drops from 93.70% to 18.10% even with iterative attacks. Additionally, we analyzed the impact of pre-processing applied to the data before the first layer of the network. Our observations suggest that some of these techniques can exacerbate the perturbations added to the original inputs, potentially harming robustness. Thus, this choice should not be neglected when automatically designing networks for applications where adversarial attacks are prone to occur.
    A Computational Model for Logical Analysis of Data. (arXiv:2207.05664v1 [cs.LG])
    Initially introduced by Peter Hammer, Logical Analysis of Data is a methodology that aims at computing a logical justification for dividing a group of data in two groups of observations, usually called the positive and negative groups. Consider this partition into positive and negative groups as the description of a partially defined Boolean function; the data is then processed to identify a subset of attributes, whose values may be used to characterize the observations of the positive groups against those of the negative group. LAD constitutes an interesting rule-based learning alternative to classic statistical learning techniques and has many practical applications. Nevertheless, the computation of group characterization may be costly, depending on the properties of the data instances. A major aim of our work is to provide effective tools for speeding up the computations, by computing some \emph{a priori} probability that a given set of attributes does characterize the positive and negative groups. To this effect, we propose several models for representing the data set of observations, according to the information we have on it. These models, and the probabilities they allow us to compute, are also helpful for quickly assessing some properties of the real data at hand; furthermore they may help us to better analyze and understand the computational difficulties encountered by solving methods. Once our models have been established, the mathematical tools for computing probabilities come from Analytic Combinatorics. They allow us to express the desired probabilities as ratios of generating functions coefficients, which then provide a quick computation of their numerical values. A further, long-range goal of this paper is to show that the methods of Analytic Combinatorics can help in analyzing the performance of various algorithms in LAD and related fields.  ( 3 min )
    A developmental approach for training deep belief networks. (arXiv:2207.05473v1 [cs.LG])
    Deep belief networks (DBNs) are stochastic neural networks that can extract rich internal representations of the environment from the sensory data. DBNs had a catalytic effect in triggering the deep learning revolution, demonstrating for the very first time the feasibility of unsupervised learning in networks with many layers of hidden neurons. Thanks to their biological and cognitive plausibility, these hierarchical architectures have been also successfully exploited to build computational models of human perception and cognition in a variety of domains. However, learning in DBNs is usually carried out in a greedy, layer-wise fashion, which does not allow to simulate the holistic development of cortical circuits. Here we present iDBN, an iterative learning algorithm for DBNs that allows to jointly update the connection weights across all layers of the hierarchy. We test our algorithm on two different sets of visual stimuli, and we show that network development can also be tracked in terms of graph theoretical properties. DBNs trained using our iterative approach achieve a final performance comparable to that of the greedy counterparts, at the same time allowing to accurately analyze the gradual development of internal representations in the generative model. Our work paves the way to the use of iDBN for modeling neurocognitive development.  ( 2 min )
    Bootstrapping a User-Centered Task-Oriented Dialogue System. (arXiv:2207.05223v1 [cs.CL])
    We present TacoBot, a task-oriented dialogue system built for the inaugural Alexa Prize TaskBot Challenge, which assists users in completing multi-step cooking and home improvement tasks. TacoBot is designed with a user-centered principle and aspires to deliver a collaborative and accessible dialogue experience. Towards that end, it is equipped with accurate language understanding, flexible dialogue management, and engaging response generation. Furthermore, TacoBot is backed by a strong search engine and an automated end-to-end test suite. In bootstrapping the development of TacoBot, we explore a series of data augmentation strategies to train advanced neural language processing models and continuously improve the dialogue experience with collected real conversations. At the end of the semifinals, TacoBot achieved an average rating of 3.55/5.0.  ( 2 min )
    Language Models (Mostly) Know What They Know. (arXiv:2207.05221v1 [cs.CL])
    We study whether language models can evaluate the validity of their own claims and predict which questions they will be able to answer correctly. We first show that larger models are well-calibrated on diverse multiple choice and true/false questions when they are provided in the right format. Thus we can approach self-evaluation on open-ended sampling tasks by asking models to first propose answers, and then to evaluate the probability "P(True)" that their answers are correct. We find encouraging performance, calibration, and scaling for P(True) on a diverse array of tasks. Performance at self-evaluation further improves when we allow models to consider many of their own samples before predicting the validity of one specific possibility. Next, we investigate whether models can be trained to predict "P(IK)", the probability that "I know" the answer to a question, without reference to any particular proposed answer. Models perform well at predicting P(IK) and partially generalize across tasks, though they struggle with calibration of P(IK) on new tasks. The predicted P(IK) probabilities also increase appropriately in the presence of relevant source materials in the context, and to the presence of hints towards the solution of mathematical word problems. We hope these observations lay the groundwork for training more honest models, and for investigating how honesty generalizes to cases where models are trained on objectives other than the imitation of human writing.  ( 3 min )
    Learning an evolved mixture model for task-free continual learning. (arXiv:2207.05080v1 [cs.LG])
    Recently, continual learning (CL) has gained significant interest because it enables deep learning models to acquire new knowledge without forgetting previously learnt information. However, most existing works require knowing the task identities and boundaries, which is not realistic in a real context. In this paper, we address a more challenging and realistic setting in CL, namely the Task-Free Continual Learning (TFCL) in which a model is trained on non-stationary data streams with no explicit task information. To address TFCL, we introduce an evolved mixture model whose network architecture is dynamically expanded to adapt to the data distribution shift. We implement this expansion mechanism by evaluating the probability distance between the knowledge stored in each mixture model component and the current memory buffer using the Hilbert Schmidt Independence Criterion (HSIC). We further introduce two simple dropout mechanisms to selectively remove stored examples in order to avoid memory overload while preserving memory diversity. Empirical results demonstrate that the proposed approach achieves excellent performance.  ( 2 min )
    Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning. (arXiv:2207.05480v1 [cs.LG])
    In real-world robotics applications, Reinforcement Learning (RL) agents are often unable to generalise to environment variations that were not observed during training. This issue is intensified for image-based RL where a change in one variable, such as the background colour, can change many pixels in the image, and in turn can change all values in the agent's internal representation of the image. To learn more robust representations, we introduce TEmporal Disentanglement (TED), a self-supervised auxiliary task that leads to disentangled representations using the sequential nature of RL observations. We find empirically that RL algorithms with TED as an auxiliary task adapt more quickly to changes in environment variables with continued training compared to state-of-the-art representation learning methods. Due to the disentangled structure of the representation, we also find that policies trained with TED generalise better to unseen values of variables irrelevant to the task (e.g. background colour) as well as unseen values of variables that affect the optimal policy (e.g. goal positions).  ( 2 min )
    Causal Conceptions of Fairness and their Consequences. (arXiv:2207.05302v1 [cs.LG])
    Recent work highlights the role of causality in designing equitable decision-making algorithms. It is not immediately clear, however, how existing causal conceptions of fairness relate to one another, or what the consequences are of using these definitions as design principles. Here, we first assemble and categorize popular causal definitions of algorithmic fairness into two broad families: (1) those that constrain the effects of decisions on counterfactual disparities; and (2) those that constrain the effects of legally protected characteristics, like race and gender, on decisions. We then show, analytically and empirically, that both families of definitions \emph{almost always} -- in a measure theoretic sense -- result in strongly Pareto dominated decision policies, meaning there is an alternative, unconstrained policy favored by every stakeholder with preferences drawn from a large, natural class. For example, in the case of college admissions decisions, policies constrained to satisfy causal fairness definitions would be disfavored by every stakeholder with neutral or positive preferences for both academic preparedness and diversity. Indeed, under a prominent definition of causal fairness, we prove the resulting policies require admitting all students with the same probability, regardless of academic qualifications or group membership. Our results highlight formal limitations and potential adverse consequences of common mathematical notions of causal fairness.  ( 3 min )
    Dev2vec: Representing Domain Expertise of Developers in an Embedding Space. (arXiv:2207.05132v1 [cs.SE])
    Accurate assessment of the domain expertise of developers is important for assigning the proper candidate to contribute to a project or to attend a job role. Since the potential candidate can come from a large pool, the automated assessment of this domain expertise is a desirable goal. While previous methods have had some success within a single software project, the assessment of a developer's domain expertise from contributions across multiple projects is more challenging. In this paper, we employ doc2vec to represent the domain expertise of developers as embedding vectors. These vectors are derived from different sources that contain evidence of developers' expertise, such as the description of repositories that they contributed, their issue resolving history, and API calls in their commits. We name it dev2vec and demonstrate its effectiveness in representing the technical specialization of developers. Our results indicate that encoding the expertise of developers in an embedding vector outperforms state-of-the-art methods and improves the F1-score up to 21%. Moreover, our findings suggest that ``issue resolving history'' of developers is the most informative source of information to represent the domain expertise of developers in embedding spaces.  ( 2 min )
    A Single-Loop Gradient Descent and Perturbed Ascent Algorithm for Nonconvex Functional Constrained Optimization. (arXiv:2207.05650v1 [math.OC])
    Nonconvex constrained optimization problems can be used to model a number of machine learning problems, such as multi-class Neyman-Pearson classification and constrained Markov decision processes. However, such kinds of problems are challenging because both the objective and constraints are possibly nonconvex, so it is difficult to balance the reduction of the loss value and reduction of constraint violation. Although there are a few methods that solve this class of problems, all of them are double-loop or triple-loop algorithms, and they require oracles to solve some subproblems up to certain accuracy by tuning multiple hyperparameters at each iteration. In this paper, we propose a novel gradient descent and perturbed ascent (GDPA) algorithm to solve a class of smooth nonconvex inequality constrained problems. The GDPA is a primal-dual algorithm, which only exploits the first-order information of both the objective and constraint functions to update the primal and dual variables in an alternating way. The key feature of the proposed algorithm is that it is a single-loop algorithm, where only two step-sizes need to be tuned. We show that under a mild regularity condition GDPA is able to find Karush-Kuhn-Tucker (KKT) points of nonconvex functional constrained problems with convergence rate guarantees. To the best of our knowledge, it is the first single-loop algorithm that can solve the general nonconvex smooth problems with nonconvex inequality constraints. Numerical results also showcase the superiority of GDPA compared with the best-known algorithms (in terms of both stationarity measure and feasibility of the obtained solutions).  ( 3 min )
    FreeREA: Training-Free Evolution-based Architecture Search. (arXiv:2207.05135v1 [cs.NE])
    In the last decade, most research in Machine Learning contributed to the improvement of existing models, with the aim of increasing the performance of neural networks for the solution of a variety of different tasks. However, such advancements often come at the cost of an increase of model memory and computational requirements. This represents a significant limitation for the deployability of research output in realistic settings, where the cost, the energy consumption, and the complexity of the framework play a crucial role. To solve this issue, the designer should search for models that maximise the performance while limiting its footprint. Typical approaches to reach this goal rely either on manual procedures, which cannot guarantee the optimality of the final design, or upon Neural Architecture Search algorithms to automatise the process, at the expenses of extremely high computational time. This paper provides a solution for the fast identification of a neural network that maximises the model accuracy while preserving size and computational constraints typical of tiny devices. Our approach, named FreeREA, is a custom cell-based evolution NAS algorithm that exploits an optimised combination of training-free metrics to rank architectures during the search, thus without need of model training. Our experiments, carried out on the common benchmarks NAS-Bench-101 and NATS-Bench, demonstrate that i) FreeREA is the first method able to provide very accurate models in minutes of search time; ii) it outperforms State of the Art training-based and training-free techniques in all the datasets and benchmarks considered, and iii) it can easily generalise to constrained scenarios, representing a competitive solution for fast Neural Architecture Search in generic constrained applications.  ( 3 min )
    Split Time Series into Patches: Rethinking Long-term Series Forecasting with Dateformer. (arXiv:2207.05397v1 [cs.LG])
    Time is one of the most significant characteristics of time-series, yet has received insufficient attention. Prior time-series forecasting research has mainly focused on mapping a past subseries (lookback window) to a future series (forecast window), and time of series often just play an auxiliary role even completely ignored in most cases. Due to the point-wise processing within these windows, extrapolating series to longer-term future is tough in the pattern. To overcome this barrier, we propose a brand-new time-series forecasting framework named Dateformer who turns attention to modeling time instead of following the above practice. Specifically, time-series are first split into patches by day to supervise the learning of dynamic date-representations with Date Encoder Representations from Transformers (DERT). These representations are then fed into a simple decoder to produce a coarser (or global) prediction, and used to help the model seek valuable information from the lookback window to learn a refined (or local) prediction. Dateformer obtains the final result by summing the above two parts. Our empirical studies on seven benchmarks show that the time-modeling method is more efficient for long-term series forecasting compared with sequence modeling methods. Dateformer yields state-of-the-art accuracy with a 40% remarkable relative improvement, and broadens the maximum credible forecasting range to a half-yearly level.  ( 2 min )
    LightViT: Towards Light-Weight Convolution-Free Vision Transformers. (arXiv:2207.05557v1 [cs.CV])
    Vision transformers (ViTs) are usually considered to be less light-weight than convolutional neural networks (CNNs) due to the lack of inductive bias. Recent works thus resort to convolutions as a plug-and-play module and embed them in various ViT counterparts. In this paper, we argue that the convolutional kernels perform information aggregation to connect all tokens; however, they would be actually unnecessary for light-weight ViTs if this explicit aggregation could function in a more homogeneous way. Inspired by this, we present LightViT as a new family of light-weight ViTs to achieve better accuracy-efficiency balance upon the pure transformer blocks without convolution. Concretely, we introduce a global yet efficient aggregation scheme into both self-attention and feed-forward network (FFN) of ViTs, where additional learnable tokens are introduced to capture global dependencies; and bi-dimensional channel and spatial attentions are imposed over token embeddings. Experiments show that our model achieves significant improvements on image classification, object detection, and semantic segmentation tasks. For example, our LightViT-T achieves 78.7% accuracy on ImageNet with only 0.7G FLOPs, outperforming PVTv2-B0 by 8.2% while 11% faster on GPU. Code is available at https://github.com/hunto/LightViT.  ( 2 min )
    Hybrid Physical-Neural ODEs for Fast N-body Simulations. (arXiv:2207.05509v1 [astro-ph.CO])
    We present a new scheme to compensate for the small-scales approximations resulting from Particle-Mesh (PM) schemes for cosmological N-body simulations. This kind of simulations are fast and low computational cost realizations of the large scale structures, but lack resolution on small scales. To improve their accuracy, we introduce an additional effective force within the differential equations of the simulation, parameterized by a Fourier-space Neural Network acting on the PM-estimated gravitational potential. We compare the results for the matter power spectrum obtained to the ones obtained by the PGD scheme (Potential gradient descent scheme). We notice a similar improvement in term of power spectrum, but we find that our approach outperforms PGD for the cross-correlation coefficients, and is more robust to changes in simulation settings (different resolutions, different cosmologies).  ( 2 min )
    Size and depth of monotone neural networks: interpolation and approximation. (arXiv:2207.05275v1 [cs.LG])
    Monotone functions and data sets arise in a variety of applications. We study the interpolation problem for monotone data sets: The input is a monotone data set with $n$ points, and the goal is to find a size and depth efficient monotone neural network, with non negative parameters and threshold units, that interpolates the data set. We show that there are monotone data sets that cannot be interpolated by a monotone network of depth $2$. On the other hand, we prove that for every monotone data set with $n$ points in $\mathbb{R}^d$, there exists an interpolating monotone network of depth $4$ and size $O(nd)$. Our interpolation result implies that every monotone function over $[0,1]^d$ can be approximated arbitrarily well by a depth-4 monotone network, improving the previous best-known construction of depth $d+1$. Finally, building on results from Boolean circuit complexity, we show that the inductive bias of having positive parameters can lead to a super-polynomial blow-up in the number of neurons when approximating monotone functions.  ( 2 min )
    On the Representation of Causal Background Knowledge and its Applications in Causal Inference. (arXiv:2207.05067v1 [cs.AI])
    Causal background knowledge about the existence or the absence of causal edges and paths is frequently encountered in observational studies. The shared directed edges and links of a subclass of Markov equivalent DAGs refined due to background knowledge can be represented by a causal maximally partially directed acyclic graph (MPDAG). In this paper, we first provide a sound and complete graphical characterization of causal MPDAGs and give a minimal representation of a causal MPDAG. Then, we introduce a novel representation called direct causal clause (DCC) to represent all types of causal background knowledge in a unified form. Using DCCs, we study the consistency and equivalency of causal background knowledge and show that any causal background knowledge set can be equivalently decomposed into a causal MPDAG plus a minimal residual set of DCCs. Polynomial-time algorithms are also provided for checking the consistency, equivalency, and finding the decomposed MPDAG and residual DCCs. Finally, with causal background knowledge, we prove a sufficient and necessary condition to identify causal effects and surprisingly find that the identifiability of causal effects only depends on the decomposed MPDAG. We also develop a local IDA-type algorithm to estimate the possible values of an unidentifiable effect. Simulations suggest that causal background knowledge can significantly improve the identifiability of causal effects.  ( 3 min )
    "Why do so?" -- A Practical Perspective on Machine Learning Security. (arXiv:2207.05164v1 [cs.LG])
    Despite the large body of academic work on machine learning security, little is known about the occurrence of attacks on machine learning systems in the wild. In this paper, we report on a quantitative study with 139 industrial practitioners. We analyze attack occurrence and concern and evaluate statistical hypotheses on factors influencing threat perception and exposure. Our results shed light on real-world attacks on deployed machine learning. On the organizational level, while we find no predictors for threat exposure in our sample, the amount of implement defenses depends on exposure to threats or expected likelihood to become a target. We also provide a detailed analysis of practitioners' replies on the relevance of individual machine learning attacks, unveiling complex concerns like unreliable decision making, business information leakage, and bias introduction into models. Finally, we find that on the individual level, prior knowledge about machine learning security influences threat perception. Our work paves the way for more research about adversarial machine learning in practice, but yields also insights for regulation and auditing.  ( 2 min )
    Learning to segment prostate cancer by aggressiveness from scribbles in bi-parametric MRI. (arXiv:2207.05056v1 [eess.IV])
    In this work, we propose a deep U-Net based model to tackle the challenging task of prostate cancer segmentation by aggressiveness in MRI based on weak scribble annotations. This model extends the size constraint loss proposed by Kervadec et al. 1 in the context of multiclass detection and segmentation task. This model is of high clinical interest as it allows training on prostate biopsy samples and avoids time-consuming full annotation process. Performance is assessed on a private dataset (219 patients) where the full ground truth is available as well as on the ProstateX-2 challenge database, where only biopsy results at different localisations serve as reference. We show that we can approach the fully-supervised baseline in grading the lesions by using only 6.35% of voxels for training. We report a lesion-wise Cohen's kappa score of 0.29 $\pm$ 0.07 for the weak model versus 0.32 $\pm$ 0.05 for the baseline. We also report a kappa score (0.276 $\pm$ 0.037) on the ProstateX-2 challenge dataset with our weak U-Net trained on a combination of ProstateX-2 and our dataset, which is the highest reported value on this challenge dataset for a segmentation task to our knowledge.  ( 3 min )
    Accelerating Large-Scale Graph-based Nearest Neighbor Search on a Computational Storage Platform. (arXiv:2207.05241v1 [cs.AR])
    K-nearest neighbor search is one of the fundamental tasks in various applications and the hierarchical navigable small world (HNSW) has recently drawn attention in large-scale cloud services, as it easily scales up the database while offering fast search. On the other hand, a computational storage device (CSD) that combines programmable logic and storage modules on a single board becomes popular to address the data bandwidth bottleneck of modern computing systems. In this paper, we propose a computational storage platform that can accelerate a large-scale graph-based nearest neighbor search algorithm based on SmartSSD CSD. To this end, we modify the algorithm more amenable on the hardware and implement two types of accelerators using HLS- and RTL-based methodology with various optimization methods. In addition, we scale up the proposed platform to have 4 SmartSSDs and apply graph parallelism to boost the system performance further. As a result, the proposed computational storage platform achieves 75.59 query per second throughput for the SIFT1B dataset at 258.66W power dissipation, which is 12.83x and 17.91x faster and 10.43x and 24.33x more energy efficient than the conventional CPU-based and GPU-based server platform, respectively. With multi-terabyte storage and custom acceleration capability, we believe that the proposed computational storage platform is a promising solution for cost-sensitive cloud datacenters.  ( 3 min )
    Fourier Neural Operator with Learned Deformations for PDEs on General Geometries. (arXiv:2207.05209v1 [cs.LG])
    Deep learning surrogate models have shown promise in solving partial differential equations (PDEs). Among them, the Fourier neural operator (FNO) achieves good accuracy, and is significantly faster compared to numerical solvers, on a variety of PDEs, such as fluid flows. However, the FNO uses the Fast Fourier transform (FFT), which is limited to rectangular domains with uniform grids. In this work, we propose a new framework, viz., geo-FNO, to solve PDEs on arbitrary geometries. Geo-FNO learns to deform the input (physical) domain, which may be irregular, into a latent space with a uniform grid. The FNO model with the FFT is applied in the latent space. The resulting geo-FNO model has both the computation efficiency of FFT and the flexibility of handling arbitrary geometries. Our geo-FNO is also flexible in terms of its input formats, viz., point clouds, meshes, and design parameters are all valid inputs. We consider a variety of PDEs such as the Elasticity, Plasticity, Euler's, and Navier-Stokes equations, and both forward modeling and inverse design problems. Geo-FNO is $10^5$ times faster than the standard numerical solvers and twice more accurate compared to direct interpolation on existing ML-based PDE solvers such as the standard FNO.  ( 2 min )
    DAUX: a Density-based Approach for Uncertainty eXplanations. (arXiv:2207.05161v1 [cs.LG])
    Uncertainty quantification (UQ) is essential for creating trustworthy machine learning models. Recent years have seen a steep rise in UQ methods that can flag suspicious examples, however, it is often unclear what exactly these methods identify. In this work, we propose an assumption-light method for interpreting UQ models themselves. We introduce the confusion density matrix -- a kernel-based approximation of the misclassification density -- and use this to categorize suspicious examples identified by a given UQ method into three classes: out-of-distribution (OOD) examples, boundary (Bnd) examples, and examples in regions of high in-distribution misclassification (IDM). Through extensive experiments, we shed light on existing UQ methods and show that the cause of the uncertainty differs across models. Additionally, we show how the proposed framework can make use of the categorized examples to improve predictive performance.  ( 2 min )
    Adaptive Graph Spatial-Temporal Transformer Network for Traffic Flow Forecasting. (arXiv:2207.05064v1 [cs.LG])
    Traffic flow forecasting on graphs has real-world applications in many fields, such as transportation system and computer networks. Traffic forecasting can be highly challenging due to complex spatial-temporal correlations and non-linear traffic patterns. Existing works mostly model such spatial-temporal dependencies by considering spatial correlations and temporal correlations separately and fail to model the direct spatial-temporal correlations. Inspired by the recent success of transformers in the graph domain, in this paper, we propose to directly model the cross-spatial-temporal correlations on the spatial-temporal graph using local multi-head self-attentions. To reduce the time complexity, we set the attention receptive field to the spatially neighboring nodes, and we also introduce an adaptive graph to capture the hidden spatial-temporal dependencies. Based on these attention mechanisms, we propose a novel Adaptive Graph Spatial-Temporal Transformer Network (ASTTN), which stacks multiple spatial-temporal attention layers to apply self-attention on the input graph, followed by linear layers for predictions. Experimental results on public traffic network datasets, METR-LA PEMS-BAY, PeMSD4, and PeMSD7, demonstrate the superior performance of our model.  ( 2 min )
    Discovering Domain Disentanglement for Generalized Multi-source Domain Adaptation. (arXiv:2207.05070v1 [cs.LG])
    A typical multi-source domain adaptation (MSDA) approach aims to transfer knowledge learned from a set of labeled source domains, to an unlabeled target domain. Nevertheless, prior works strictly assume that each source domain shares the identical group of classes with the target domain, which could hardly be guaranteed as the target label space is not observable. In this paper, we consider a more versatile setting of MSDA, namely Generalized Multi-source Domain Adaptation, wherein the source domains are partially overlapped, and the target domain is allowed to contain novel categories that are not presented in any source domains. This new setting is more elusive than any existing domain adaptation protocols due to the coexistence of the domain and category shifts across the source and target domains. To address this issue, we propose a variational domain disentanglement (VDD) framework, which decomposes the domain representations and semantic features for each instance by encouraging dimension-wise independence. To identify the target samples of unknown classes, we leverage online pseudo labeling, which assigns the pseudo-labels to unlabeled target data based on the confidence scores. Quantitative and qualitative experiments conducted on two benchmark datasets demonstrate the validity of the proposed framework.  ( 2 min )
    A Bipartite Graph Neural Network Approach for Scalable Beamforming Optimization. (arXiv:2207.05364v1 [eess.SP])
    Deep learning (DL) techniques have been intensively studied for the optimization of multi-user multiple-input single-output (MU-MISO) downlink systems owing to the capability of handling nonconvex formulations. However, the fixed computation structure of existing deep neural networks (DNNs) lacks flexibility with respect to the system size, i.e., the number of antennas or users. This paper develops a bipartite graph neural network (BGNN) framework, a scalable DL solution designed for multi-antenna beamforming optimization. The MU-MISO system is first characterized by a bipartite graph where two disjoint vertex sets, each of which consists of transmit antennas and users, are connected via pairwise edges. These vertex interconnection states are modeled by channel fading coefficients. Thus, a generic beamforming optimization process is interpreted as a computation task over a weight bipartite graph. This approach partitions the beamforming optimization procedure into multiple suboperations dedicated to individual antenna vertices and user vertices. Separated vertex operations lead to scalable beamforming calculations that are invariant to the system size. The vertex operations are realized by a group of DNN modules that collectively form the BGNN architecture. Identical DNNs are reused at all antennas and users so that the resultant learning structure becomes flexible to the network size. Component DNNs of the BGNN are trained jointly over numerous MU-MISO configurations with randomly varying network sizes. As a result, the trained BGNN can be universally applied to arbitrary MU-MISO systems. Numerical results validate the advantages of the BGNN framework over conventional methods.  ( 3 min )
    Photonic Reconfigurable Accelerators for Efficient Inference of CNNs with Mixed-Sized Tensors. (arXiv:2207.05278v1 [cs.AR])
    Photonic Microring Resonator (MRR) based hardware accelerators have been shown to provide disruptive speedup and energy-efficiency improvements for processing deep Convolutional Neural Networks (CNNs). However, previous MRR-based CNN accelerators fail to provide efficient adaptability for CNNs with mixed-sized tensors. One example of such CNNs is depthwise separable CNNs. Performing inferences of CNNs with mixed-sized tensors on such inflexible accelerators often leads to low hardware utilization, which diminishes the achievable performance and energy efficiency from the accelerators. In this paper, we present a novel way of introducing reconfigurability in the MRR-based CNN accelerators, to enable dynamic maximization of the size compatibility between the accelerator hardware components and the CNN tensors that are processed using the hardware components. We classify the state-of-the-art MRR-based CNN accelerators from prior works into two categories, based on the layout and relative placements of the utilized hardware components in the accelerators. We then use our method to introduce reconfigurability in accelerators from these two classes, to consequently improve their parallelism, the flexibility of efficiently mapping tensors of different sizes, speed, and overall energy efficiency. We evaluate our reconfigurable accelerators against three prior works for the area proportionate outlook (equal hardware area for all accelerators). Our evaluation for the inference of four modern CNNs indicates that our designed reconfigurable CNN accelerators provide improvements of up to 1.8x in Frames-Per-Second (FPS) and up to 1.5x in FPS/W, compared to an MRR-based accelerator from prior work.  ( 3 min )
    FedPseudo: Pseudo value-based Deep Learning Models for Federated Survival Analysis. (arXiv:2207.05247v1 [cs.LG])
    Survival analysis, time-to-event analysis, is an important problem in healthcare since it has a wide-ranging impact on patients and palliative care. Many survival analysis methods have assumed that the survival data is centrally available either from one medical center or by data sharing from multi-centers. However, the sensitivity of the patient attributes and the strict privacy laws have increasingly forbidden sharing of healthcare data. To address this challenge, the research community has looked at the solution of decentralized training and sharing of model parameters using the Federated Learning (FL) paradigm. In this paper, we study the utilization of FL for performing survival analysis on distributed healthcare datasets. Recently, the popular Cox proportional hazard (CPH) models have been adapted for FL settings; however, due to its linearity and proportional hazards assumptions, CPH models result in suboptimal performance, especially for non-linear, non-iid, and heavily censored survival datasets. To overcome the challenges of existing federated survival analysis methods, we leverage the predictive accuracy of the deep learning models and the power of pseudo values to propose a first-of-its-kind, pseudo value-based deep learning model for federated survival analysis (FSA) called FedPseudo. Furthermore, we introduce a novel approach of deriving pseudo values for survival probability in the FL settings that speeds up the computation of pseudo values. Extensive experiments on synthetic and real-world datasets show that our pseudo valued-based FL framework achieves similar performance as the best centrally trained deep survival analysis model. Moreover, our proposed FL approach obtains the best results for various censoring settings.  ( 3 min )
    Few-Shot Semantic Relation Prediction across Heterogeneous Graphs. (arXiv:2207.05068v1 [cs.LG])
    Semantic relation prediction aims to mine the implicit relationships between objects in heterogeneous graphs, which consist of different types of objects and different types of links. In real-world scenarios, new semantic relations constantly emerge and they typically appear with only a few labeled data. Since a variety of semantic relations exist in multiple heterogeneous graphs, the transferable knowledge can be mined from some existing semantic relations to help predict the new semantic relations with few labeled data. This inspires a novel problem of few-shot semantic relation prediction across heterogeneous graphs. However, the existing methods cannot solve this problem because they not only require a large number of labeled samples as input, but also focus on a single graph with a fixed heterogeneity. Targeting this novel and challenging problem, in this paper, we propose a Meta-learning based Graph neural network for Semantic relation prediction, named MetaGS. Firstly, MetaGS decomposes the graph structure between objects into multiple normalized subgraphs, then adopts a two-view graph neural network to capture local heterogeneous information and global structure information of these subgraphs. Secondly, MetaGS aggregates the information of these subgraphs with a hyper-prototypical network, which can learn from existing semantic relations and adapt to new semantic relations. Thirdly, using the well-initialized two-view graph neural network and hyper-prototypical network, MetaGS can effectively learn new semantic relations from different graphs while overcoming the limitation of few labeled data. Extensive experiments on three real-world datasets have demonstrated the superior performance of MetaGS over the state-of-the-art methods.  ( 3 min )
    A Macrocolumn Architecture Implemented with Temporal (Spiking) Neurons. (arXiv:2207.05081v1 [cs.NE])
    With the long-term goal of reverse-architecting the computational brain from the bottom up, the focus of this document is the macrocolumn abstraction layer. A basic macrocolumn architecture is developed by first describing its operation with a state machine model. Then state machine functions are implemented with spiking neurons that support temporal computation. The neuron model is based on active spiking dendrites and mirrors the Hawkins/Numenta neuron model. The architecture is demonstrated with a research benchmark in which an agent uses a macrocolumn to first learn and then navigate 2-d environments containing randomly placed features. Environments are represented in the macrocolumn as labeled directed graphs where edges connect features and labels indicate the relative displacements between them.  ( 2 min )
    Efficient Real-world Testing of Causal Decision Making via Bayesian Experimental Design for Contextual Optimisation. (arXiv:2207.05250v1 [stat.ML])
    The real-world testing of decisions made using causal machine learning models is an essential prerequisite for their successful application. We focus on evaluating and improving contextual treatment assignment decisions: these are personalised treatments applied to e.g. customers, each with their own contextual information, with the aim of maximising a reward. In this paper we introduce a model-agnostic framework for gathering data to evaluate and improve contextual decision making through Bayesian Experimental Design. Specifically, our method is used for the data-efficient evaluation of the regret of past treatment assignments. Unlike approaches such as A/B testing, our method avoids assigning treatments that are known to be highly sub-optimal, whilst engaging in some exploration to gather pertinent information. We achieve this by introducing an information-based design objective, which we optimise end-to-end. Our method applies to discrete and continuous treatments. Comparing our information-theoretic approach to baselines in several simulation studies demonstrates the superior performance of our proposed approach.  ( 2 min )
    Scaling Novel Object Detection with Weakly Supervised Detection Transformers. (arXiv:2207.05205v1 [cs.CV])
    Weakly supervised object detection (WSOD) enables object detectors to be trained using image-level class labels. However, the practical application of current WSOD models is limited, as they operate at small scales and require extensive training and refinement. We propose the Weakly Supervised Detection Transformer, which enables efficient knowledge transfer from a large-scale pretraining dataset to WSOD finetuning on hundreds of novel objects. We leverage pretrained knowledge to improve the multiple instance learning framework used in WSOD, and experiments show our approach outperforms the state-of-the-art on datasets with twice the novel classes than previously shown.  ( 2 min )
    RUSH: Robust Contrastive Learning via Randomized Smoothing. (arXiv:2207.05127v1 [cs.LG])
    Recently, adversarial training has been incorporated in self-supervised contrastive pre-training to augment label efficiency with exciting adversarial robustness. However, the robustness came at a cost of expensive adversarial training. In this paper, we show a surprising fact that contrastive pre-training has an interesting yet implicit connection with robustness, and such natural robustness in the pre trained representation enables us to design a powerful robust algorithm against adversarial attacks, RUSH, that combines the standard contrastive pre-training and randomized smoothing. It boosts both standard accuracy and robust accuracy, and significantly reduces training costs as compared with adversarial training. We use extensive empirical studies to show that the proposed RUSH outperforms robust classifiers from adversarial training, by a significant margin on common benchmarks (CIFAR-10, CIFAR-100, and STL-10) under first-order attacks. In particular, under $\ell_{\infty}$-norm perturbations of size 8/255 PGD attack on CIFAR-10, our model using ResNet-18 as backbone reached 77.8% robust accuracy and 87.9% standard accuracy. Our work has an improvement of over 15% in robust accuracy and a slight improvement in standard accuracy, compared to the state-of-the-arts.  ( 2 min )
    Can Language Models perform Abductive Commonsense Reasoning?. (arXiv:2207.05155v1 [cs.AI])
    Abductive Reasoning is a task of inferring the most plausible hypothesis given a set of observations. In literature, the community has approached to solve this challenge by classifying/generating a likely hypothesis that does not contradict with a past observation and future observation. Some of the most well-known benchmarks that tackle this problem are aNLI and aNLG (pronounced as alpha-NLI and alpha-NLG). In this report, I review over some of the methodologies that were attempted to solve this challenge, re-implement the baseline models, and analyze some of the weaknesses that current approaches have. The code and the re-implemented results are available at this link.  ( 2 min )
    Online Continual Learning of End-to-End Speech Recognition Models. (arXiv:2207.05071v1 [cs.LG])
    Continual Learning, also known as Lifelong Learning, aims to continually learn from new data as it becomes available. While prior research on continual learning in automatic speech recognition has focused on the adaptation of models across multiple different speech recognition tasks, in this paper we propose an experimental setting for \textit{online continual learning} for automatic speech recognition of a single task. Specifically focusing on the case where additional training data for the same task becomes available incrementally over time, we demonstrate the effectiveness of performing incremental model updates to end-to-end speech recognition models with an online Gradient Episodic Memory (GEM) method. Moreover, we show that with online continual learning and a selective sampling strategy, we can maintain an accuracy that is similar to retraining a model from scratch while requiring significantly lower computation costs. We have also verified our method with self-supervised learning (SSL) features.  ( 2 min )
    Keep your Distance: Determining Sampling and Distance Thresholds in Machine Learning Monitoring. (arXiv:2207.05078v1 [cs.LG])
    Machine Learning~(ML) has provided promising results in recent years across different applications and domains. However, in many cases, qualities such as reliability or even safety need to be ensured. To this end, one important aspect is to determine whether or not ML components are deployed in situations that are appropriate for their application scope. For components whose environments are open and variable, for instance those found in autonomous vehicles, it is therefore important to monitor their operational situation to determine its distance from the ML components' trained scope. If that distance is deemed too great, the application may choose to consider the ML component outcome unreliable and switch to alternatives, e.g. using human operator input instead. SafeML is a model-agnostic approach for performing such monitoring, using distance measures based on statistical testing of the training and operational datasets. Limitations in setting SafeML up properly include the lack of a systematic approach for determining, for a given application, how many operational samples are needed to yield reliable distance information as well as to determine an appropriate distance threshold. In this work, we address these limitations by providing a practical approach and demonstrate its use in a well known traffic sign recognition problem, and on an example using the CARLA open-source automotive simulator.  ( 3 min )
  • Open

    Online Meta-Learning in Adversarial Multi-Armed Bandits. (arXiv:2205.15921v2 [cs.LG] UPDATED)
    We study meta-learning for adversarial multi-armed bandits. We consider the online-within-online setup, in which a player (learner) encounters a sequence of multi-armed bandit episodes. The player's performance is measured as regret against the best arm in each episode, according to the losses generated by an adversary. The difficulty of the problem depends on the empirical distribution of the per-episode best arm chosen by the adversary. We present an algorithm that can leverage the non-uniformity in this empirical distribution, and derive problem-dependent regret bounds. This solution comprises an inner learner that plays each episode separately, and an outer learner that updates the hyper-parameters of the inner algorithm between the episodes. In the case where the best arm distribution is far from uniform, it improves upon the best bound that can be achieved by any online algorithm executed on each episode individually without meta-learning.
    Scalable Bayesian Inference for Detection and Deblending in Astronomical Images. (arXiv:2207.05642v1 [astro-ph.IM])
    We present a new probabilistic method for detecting, deblending, and cataloging astronomical sources called the Bayesian Light Source Separator (BLISS). BLISS is based on deep generative models, which embed neural networks within a Bayesian model. For posterior inference, BLISS uses a new form of variational inference known as Forward Amortized Variational Inference. The BLISS inference routine is fast, requiring a single forward pass of the encoder networks on a GPU once the encoder networks are trained. BLISS can perform fully Bayesian inference on megapixel images in seconds, and produces highly accurate catalogs. BLISS is highly extensible, and has the potential to directly answer downstream scientific questions in addition to producing probabilistic catalogs.
    Improved Rates for Differentially Private Stochastic Convex Optimization with Heavy-Tailed Data. (arXiv:2106.01336v5 [cs.LG] UPDATED)
    We study stochastic convex optimization with heavy-tailed data under the constraint of differential privacy (DP). Most prior work on this problem is restricted to the case where the loss function is Lipschitz. Instead, as introduced by Wang, Xiao, Devadas, and Xu \cite{WangXDX20}, we study general convex loss functions with the assumption that the distribution of gradients has bounded $k$-th moments. We provide improved upper bounds on the excess population risk under concentrated DP for convex and strongly convex loss functions. Along the way, we derive new algorithms for private mean estimation of heavy-tailed distributions, under both pure and concentrated DP. Finally, we prove nearly-matching lower bounds for private stochastic convex optimization with strongly convex losses and mean estimation, showing new separations between pure and concentrated DP.
    On the Representation of Causal Background Knowledge and its Applications in Causal Inference. (arXiv:2207.05067v1 [cs.AI])
    Causal background knowledge about the existence or the absence of causal edges and paths is frequently encountered in observational studies. The shared directed edges and links of a subclass of Markov equivalent DAGs refined due to background knowledge can be represented by a causal maximally partially directed acyclic graph (MPDAG). In this paper, we first provide a sound and complete graphical characterization of causal MPDAGs and give a minimal representation of a causal MPDAG. Then, we introduce a novel representation called direct causal clause (DCC) to represent all types of causal background knowledge in a unified form. Using DCCs, we study the consistency and equivalency of causal background knowledge and show that any causal background knowledge set can be equivalently decomposed into a causal MPDAG plus a minimal residual set of DCCs. Polynomial-time algorithms are also provided for checking the consistency, equivalency, and finding the decomposed MPDAG and residual DCCs. Finally, with causal background knowledge, we prove a sufficient and necessary condition to identify causal effects and surprisingly find that the identifiability of causal effects only depends on the decomposed MPDAG. We also develop a local IDA-type algorithm to estimate the possible values of an unidentifiable effect. Simulations suggest that causal background knowledge can significantly improve the identifiability of causal effects.
    Shapley Computations Using Surrogate Model-Based Trees. (arXiv:2207.05214v1 [stat.ML])
    Shapley-related techniques have gained attention as both global and local interpretation tools because of their desirable properties. However, their computation using conditional expectations is computationally expensive. Approximation methods suggested in the literature have limitations. This paper proposes the use of a surrogate model-based tree to compute Shapley and SHAP values based on conditional expectation. Simulation studies show that the proposed algorithm provides improvements in accuracy, unifies global Shapley and SHAP interpretation, and the thresholding method provides a way to trade-off running time and accuracy.
    Uncertainty-Aware Learning Against Label Noise on Imbalanced Datasets. (arXiv:2207.05471v1 [stat.ML])
    Learning against label noise is a vital topic to guarantee a reliable performance for deep neural networks. Recent research usually refers to dynamic noise modeling with model output probabilities and loss values, and then separates clean and noisy samples. These methods have gained notable success. However, unlike cherry-picked data, existing approaches often cannot perform well when facing imbalanced datasets, a common scenario in the real world. We thoroughly investigate this phenomenon and point out two major issues that hinder the performance, i.e., \emph{inter-class loss distribution discrepancy} and \emph{misleading predictions due to uncertainty}. The first issue is that existing methods often perform class-agnostic noise modeling. However, loss distributions show a significant discrepancy among classes under class imbalance, and class-agnostic noise modeling can easily get confused with noisy samples and samples in minority classes. The second issue refers to that models may output misleading predictions due to epistemic uncertainty and aleatoric uncertainty, thus existing methods that rely solely on the output probabilities may fail to distinguish confident samples. Inspired by our observations, we propose an Uncertainty-aware Label Correction framework~(ULC) to handle label noise on imbalanced datasets. First, we perform epistemic uncertainty-aware class-specific noise modeling to identify trustworthy clean samples and refine/discard highly confident true/corrupted labels. Then, we introduce aleatoric uncertainty in the subsequent learning process to prevent noise accumulation in the label noise modeling process. We conduct experiments on several synthetic and real-world datasets. The results demonstrate the effectiveness of the proposed method, especially on imbalanced datasets.
    Collaborative Uncertainty Benefits Multi-Agent Multi-Modal Trajectory Forecasting. (arXiv:2207.05195v1 [cs.CV])
    In multi-modal multi-agent trajectory forecasting, two major challenges have not been fully tackled: 1) how to measure the uncertainty brought by the interaction module that causes correlations among the predicted trajectories of multiple agents; 2) how to rank the multiple predictions and select the optimal predicted trajectory. In order to handle these challenges, this work first proposes a novel concept, collaborative uncertainty (CU), which models the uncertainty resulting from interaction modules. Then we build a general CU-aware regression framework with an original permutation-equivariant uncertainty estimator to do both tasks of regression and uncertainty estimation. Further, we apply the proposed framework to current SOTA multi-agent multi-modal forecasting systems as a plugin module, which enables the SOTA systems to 1) estimate the uncertainty in the multi-agent multi-modal trajectory forecasting task; 2) rank the multiple predictions and select the optimal one based on the estimated uncertainty. We conduct extensive experiments on a synthetic dataset and two public large-scale multi-agent trajectory forecasting benchmarks. Experiments show that: 1) on the synthetic dataset, the CU-aware regression framework allows the model to appropriately approximate the ground-truth Laplace distribution; 2) on the multi-agent trajectory forecasting benchmarks, the CU-aware regression framework steadily helps SOTA systems improve their performances. Specially, the proposed framework helps VectorNet improve by 262 cm regarding the Final Displacement Error of the chosen optimal prediction on the nuScenes dataset; 3) for multi-agent multi-modal trajectory forecasting systems, prediction uncertainty is positively correlated with future stochasticity; and 4) the estimated CU values are highly related to the interactive information among agents.
    The d-separation criterion in Categorical Probability. (arXiv:2207.05740v1 [math.ST])
    The d-separation criterion detects the compatibility of a joint probability distribution with a directed acyclic graph through certain conditional independences. In this work, we study this problem in the context of categorical probability theory by introducing a categorical definition of causal models, a categorical notion of d-separation, and proving an abstract version of the d-separation criterion. This approach has two main benefits. First, categorical d-separation is a very intuitive criterion based on topological connectedness. Second, our results apply in measure-theoretic probability (with standard Borel spaces), and therefore provide a clean proof of the equivalence of local and global Markov properties with causal compatibility for continuous and mixed variables.
    Unsupervised learning of observation functions in state-space models by nonparametric moment methods. (arXiv:2207.05242v1 [stat.ML])
    We investigate the unsupervised learning of non-invertible observation functions in nonlinear state-space models. Assuming abundant data of the observation process along with the distribution of the state process, we introduce a nonparametric generalized moment method to estimate the observation function via constrained regression. The major challenge comes from the non-invertibility of the observation function and the lack of data pairs between the state and observation. We address the fundamental issue of identifiability from quadratic loss functionals and show that the function space of identifiability is the closure of a RKHS that is intrinsic to the state process. Numerical results show that the first two moments and temporal correlations, along with upper and lower bounds, can identify functions ranging from piecewise polynomials to smooth functions, leading to convergent estimators. The limitations of this method, such as non-identifiability due to symmetry and stationarity, are also discussed.
    Sliced-Wasserstein normalizing flows: beyond maximum likelihood training. (arXiv:2207.05468v1 [stat.ML])
    Despite their advantages, normalizing flows generally suffer from several shortcomings including their tendency to generate unrealistic data (e.g., images) and their failing to detect out-of-distribution data. One reason for these deficiencies lies in the training strategy which traditionally exploits a maximum likelihood principle only. This paper proposes a new training paradigm based on a hybrid objective function combining the maximum likelihood principle (MLE) and a sliced-Wasserstein distance. Results obtained on synthetic toy examples and real image data sets show better generative abilities in terms of both likelihood and visual aspects of the generated samples. Reciprocally, the proposed approach leads to a lower likelihood of out-of-distribution data, demonstrating a greater data fidelity of the resulting flows.
    Grounding Aleatoric Uncertainty in Unsupervised Environment Design. (arXiv:2207.05219v1 [cs.LG])
    Adaptive curricula in reinforcement learning (RL) have proven effective for producing policies robust to discrepancies between the train and test environment. Recently, the Unsupervised Environment Design (UED) framework generalized RL curricula to generating sequences of entire environments, leading to new methods with robust minimax regret properties. Problematically, in partially-observable or stochastic settings, optimal policies may depend on the ground-truth distribution over aleatoric parameters of the environment in the intended deployment setting, while curriculum learning necessarily shifts the training distribution. We formalize this phenomenon as curriculum-induced covariate shift (CICS), and describe how its occurrence in aleatoric parameters can lead to suboptimal policies. Directly sampling these parameters from the ground-truth distribution avoids the issue, but thwarts curriculum learning. We propose SAMPLR, a minimax regret UED method that optimizes the ground-truth utility function, even when the underlying training data is biased due to CICS. We prove, and validate on challenging domains, that our approach preserves optimality under the ground-truth distribution, while promoting robustness across the full range of environment settings.
    A Robust and Flexible EM Algorithm for Mixtures of Elliptical Distributions with Missing Data. (arXiv:2201.12020v3 [stat.ML] UPDATED)
    This paper tackles the problem of missing data imputation for noisy and non-Gaussian data. A classical imputation method, the Expectation Maximization (EM) algorithm for Gaussian mixture models, has shown interesting properties when compared to other popular approaches such as those based on k-nearest neighbors or on multiple imputations by chained equations. However, Gaussian mixture models are known to be non-robust to heterogeneous data, which can lead to poor estimation performance when the data is contaminated by outliers or follows non-Gaussian distributions. To overcome this issue, a new EM algorithm is investigated for mixtures of elliptical distributions with the property of handling potential missing data. This paper shows that this problem reduces to the estimation of a mixture of Angular Gaussian distributions under generic assumptions (i.e., each sample is drawn from a mixture of elliptical distributions, which is possibly different for one sample to another). In that case, the complete-data likelihood associated with mixtures of elliptical distributions is well adapted to the EM framework with missing data thanks to its conditional distribution, which is shown to be a multivariate $t$-distribution. Experimental results on synthetic data demonstrate that the proposed algorithm is robust to outliers and can be used with non-Gaussian data. Furthermore, experiments conducted on real-world datasets show that this algorithm is very competitive when compared to other classical imputation methods.  ( 3 min )
    Capturing Evolution Genes for Time Series Data. (arXiv:1905.05004v2 [cs.LG] UPDATED)
    The modeling of time series is becoming increasingly critical in a wide variety of applications. Overall, data evolves by following different patterns, which are generally caused by different user behaviors. Given a time series, we define the evolution gene to capture the latent user behaviors and to describe how the behaviors lead to the generation of time series. In particular, we propose a uniform framework that recognizes different evolution genes of segments by learning a classifier, and adopt an adversarial generator to implement the evolution gene by estimating the segments' distribution. Experimental results based on a synthetic dataset and five real-world datasets show that our approach can not only achieve a good prediction results (e.g., averagely +10.56% in terms of F1), but is also able to provide explanations of the results.
    Size and depth of monotone neural networks: interpolation and approximation. (arXiv:2207.05275v1 [cs.LG])
    Monotone functions and data sets arise in a variety of applications. We study the interpolation problem for monotone data sets: The input is a monotone data set with $n$ points, and the goal is to find a size and depth efficient monotone neural network, with non negative parameters and threshold units, that interpolates the data set. We show that there are monotone data sets that cannot be interpolated by a monotone network of depth $2$. On the other hand, we prove that for every monotone data set with $n$ points in $\mathbb{R}^d$, there exists an interpolating monotone network of depth $4$ and size $O(nd)$. Our interpolation result implies that every monotone function over $[0,1]^d$ can be approximated arbitrarily well by a depth-4 monotone network, improving the previous best-known construction of depth $d+1$. Finally, building on results from Boolean circuit complexity, we show that the inductive bias of having positive parameters can lead to a super-polynomial blow-up in the number of neurons when approximating monotone functions.
    High-dimensional Inference for Dynamic Treatment Effects. (arXiv:2110.04924v3 [stat.ME] UPDATED)
    This paper proposes a confidence interval construction for heterogeneous treatment effects in the context of multi-stage experiments with $N$ samples and high-dimensional, $d$, confounders. Our focus is on the case of $d\gg N$, but the results obtained also apply to low-dimensional cases. We showcase that the bias of regularized estimation, unavoidable in high-dimensional covariate spaces, is mitigated with a simple double-robust score. In this way, no additional bias removal is necessary, and we obtain root-$N$ inference results while allowing multi-stage interdependency of the treatments and covariates. Memoryless property is also not assumed; treatment can possibly depend on all previous treatment assignments and all previous multi-stage confounders. Our results rely on certain sparsity assumptions of the underlying dependencies. We discover new product rate conditions necessary for robust inference with dynamic treatments.
    Edge Augmentation on Disconnected Graphs via Eigenvalue Elevation. (arXiv:2207.05301v1 [cs.SI])
    The graph-theoretical task of determining most likely inter-community edges based on disconnected subgraphs' intra-community connectivity is proposed. An algorithm is developed for this edge augmentation task, based on elevating the zero eigenvalues of graph's spectrum. Upper bounds for eigenvalue elevation amplitude and for the corresponding augmented edge density are derived and are authenticated with simulation on random graphs. The algorithm works consistently across synthetic and real networks, yielding desirable performance at connecting graph components. Edge augmentation reverse-engineers graph partition under different community detection methods (Girvan-Newman method, greedy modularity maximization, label propagation, Louvain method, and fluid community), in most cases producing inter-community edges at >50% frequency.  ( 2 min )
    Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders. (arXiv:2203.12742v2 [cs.LG] UPDATED)
    Bayesian optimization (BayesOpt) is a gold standard for query-efficient continuous optimization. However, its adoption for drug design has been hindered by the discrete, high-dimensional nature of the decision variables. We develop a new approach (LaMBO) which jointly trains a denoising autoencoder with a discriminative multi-task Gaussian process head, allowing gradient-based optimization of multi-objective acquisition functions in the latent space of the autoencoder. These acquisition functions allow LaMBO to balance the explore-exploit tradeoff over multiple design rounds, and to balance objective tradeoffs by optimizing sequences at many different points on the Pareto frontier. We evaluate LaMBO on two small-molecule design tasks, and introduce new tasks optimizing \emph{in silico} and \emph{in vitro} properties of large-molecule fluorescent proteins. In our experiments LaMBO outperforms genetic optimizers and does not require a large pretraining corpus, demonstrating that BayesOpt is practical and effective for biological sequence design.  ( 2 min )
    On robust risk-based active-learning algorithms for enhanced decision support. (arXiv:2201.02555v2 [cs.LG] UPDATED)
    Classification models are a fundamental component of physical-asset management technologies such as structural health monitoring (SHM) systems and digital twins. Previous work introduced risk-based active learning, an online approach for the development of statistical classifiers that takes into account the decision-support context in which they are applied. Decision-making is considered by preferentially querying data labels according to expected value of perfect information (EVPI). Although several benefits are gained by adopting a risk-based active learning approach, including improved decision-making performance, the algorithms suffer from issues relating to sampling bias as a result of the guided querying process. This sampling bias ultimately manifests as a decline in decision-making performance during the later stages of active learning, which in turn corresponds to lost resource/utility. The current paper proposes two novel approaches to counteract the effects of sampling bias: semi-supervised learning, and discriminative classification models. These approaches are first visualised using a synthetic dataset, then subsequently applied to an experimental case study, specifically, the Z24 Bridge dataset. The semi-supervised learning approach is shown to have variable performance; with robustness to sampling bias dependent on the suitability of the generative distributions selected for the model with respect to each dataset. In contrast, the discriminative classifiers are shown to have excellent robustness to the effects of sampling bias. Moreover, it was found that the number of inspections made during a monitoring campaign, and therefore resource expenditure, could be reduced with the careful selection of the statistical classifiers used within a decision-supporting monitoring system.
    Conservative SPDEs as fluctuating mean field limits of stochastic gradient descent. (arXiv:2207.05705v1 [math.PR])
    The convergence of stochastic interacting particle systems in the mean-field limit to solutions to conservative stochastic partial differential equations is shown, with optimal rate of convergence. As a second main result, a quantitative central limit theorem for such SPDEs is derived, again with optimal rate of convergence. The results apply in particular to the convergence in the mean-field scaling of stochastic gradient descent dynamics in overparametrized, shallow neural networks to solutions to SPDEs. It is shown that the inclusion of fluctuations in the limiting SPDE improves the rate of convergence, and retains information about the fluctuations of stochastic gradient descent in the continuum limit.  ( 2 min )
    AGBoost: Attention-based Modification of Gradient Boosting Machine. (arXiv:2207.05724v1 [cs.LG])
    A new attention-based model for the gradient boosting machine (GBM) called AGBoost (the attention-based gradient boosting) is proposed for solving regression problems. The main idea behind the proposed AGBoost model is to assign attention weights with trainable parameters to iterations of GBM under condition that decision trees are base learners in GBM. Attention weights are determined by applying properties of decision trees and by using the Huber's contamination model which provides an interesting linear dependence between trainable parameters of the attention and the attention weights. This peculiarity allows us to train the attention weights by solving the standard quadratic optimization problem with linear constraints. The attention weights also depend on the discount factor as a tuning parameter, which determines how much the impact of the weight is decreased with the number of iterations. Numerical experiments performed for two types of base learners, original decision trees and extremely randomized trees with various regression datasets illustrate the proposed model.
    Latent Variable Models for Bayesian Causal Discovery. (arXiv:2207.05723v1 [cs.LG])
    Learning predictors that do not rely on spurious correlations involves building causal representations. However, learning such a representation is very challenging. We, therefore, formulate the problem of learning a causal representation from high dimensional data and study causal recovery with synthetic data. This work introduces a latent variable decoder model, Decoder BCD, for Bayesian causal discovery and performs experiments in mildly supervised and unsupervised settings. We present a series of synthetic experiments to characterize important factors for causal discovery and show that using known intervention targets as labels helps in unsupervised Bayesian inference over structure and parameters of linear Gaussian additive noise latent structural causal models.
    Neural Posterior Estimation with Differentiable Simulators. (arXiv:2207.05636v1 [astro-ph.IM])
    Simulation-Based Inference (SBI) is a promising Bayesian inference framework that alleviates the need for analytic likelihoods to estimate posterior distributions. Recent advances using neural density estimators in SBI algorithms have demonstrated the ability to achieve high-fidelity posteriors, at the expense of a large number of simulations ; which makes their application potentially very time-consuming when using complex physical simulations. In this work we focus on boosting the sample-efficiency of posterior density estimation using the gradients of the simulator. We present a new method to perform Neural Posterior Estimation (NPE) with a differentiable simulator. We demonstrate how gradient information helps constrain the shape of the posterior and improves sample-efficiency.
    Efficient Real-world Testing of Causal Decision Making via Bayesian Experimental Design for Contextual Optimisation. (arXiv:2207.05250v1 [stat.ML])
    The real-world testing of decisions made using causal machine learning models is an essential prerequisite for their successful application. We focus on evaluating and improving contextual treatment assignment decisions: these are personalised treatments applied to e.g. customers, each with their own contextual information, with the aim of maximising a reward. In this paper we introduce a model-agnostic framework for gathering data to evaluate and improve contextual decision making through Bayesian Experimental Design. Specifically, our method is used for the data-efficient evaluation of the regret of past treatment assignments. Unlike approaches such as A/B testing, our method avoids assigning treatments that are known to be highly sub-optimal, whilst engaging in some exploration to gather pertinent information. We achieve this by introducing an information-based design objective, which we optimise end-to-end. Our method applies to discrete and continuous treatments. Comparing our information-theoretic approach to baselines in several simulation studies demonstrates the superior performance of our proposed approach.
    Robustness and Personalization in Federated Learning: A Unified Approach via Regularization. (arXiv:2009.06303v3 [cs.LG] UPDATED)
    We present a class of methods for robust, personalized federated learning, called Fed+, that unifies many federated learning algorithms. The principal advantage of this class of methods is to better accommodate the real-world characteristics found in federated training, such as the lack of IID data across parties, the need for robustness to outliers or stragglers, and the requirement to perform well on party-specific datasets. We achieve this through a problem formulation that allows the central server to employ robust ways of aggregating the local models while keeping the structure of local computation intact. Without making any statistical assumption on the degree of heterogeneity of local data across parties, we provide convergence guarantees for Fed+ for convex and non-convex loss functions under different (robust) aggregation methods. The Fed+ theory is also equipped to handle heterogeneous computing environments including stragglers without additional assumptions; specifically, the convergence results cover the general setting where the number of local update steps across parties can vary. We demonstrate the benefits of Fed+ through extensive experiments across standard benchmark datasets.
    Log-Euclidean Signatures for Intrinsic Distances Between Unaligned Datasets. (arXiv:2202.01671v2 [stat.ML] UPDATED)
    The need for efficiently comparing and representing datasets with unknown alignment spans various fields, from model analysis and comparison in machine learning to trend discovery in collections of medical datasets. We use manifold learning to compare the intrinsic geometric structures of different datasets by comparing their diffusion operators, symmetric positive-definite (SPD) matrices that relate to approximations of the continuous Laplace-Beltrami operator from discrete samples. Existing methods typically assume known data alignment and compare such operators in a pointwise manner. Instead, we exploit the Riemannian geometry of SPD matrices to compare these operators and define a new theoretically-motivated distance based on a lower bound of the log-Euclidean metric. Our framework facilitates comparison of data manifolds expressed in datasets with different sizes, numbers of features, and measurement modalities. Our log-Euclidean signature (LES) distance recovers meaningful structural differences, outperforming competing methods in various application domains.
    A Newton-CG based barrier method for finding a second-order stationary point of nonconvex conic optimization with complexity guarantees. (arXiv:2207.05697v1 [math.OC])
    In this paper we consider finding an approximate second-order stationary point (SOSP) of nonconvex conic optimization that minimizes a twice differentiable function over the intersection of an affine subspace and a convex cone. In particular, we propose a Newton-conjugate gradient (Newton-CG) based barrier method for finding an $(\epsilon,\sqrt{\epsilon})$-SOSP of this problem. Our method is not only implementable, but also achieves an iteration complexity of ${\cal O}(\epsilon^{-3/2})$, which matches the best known iteration complexity of second-order methods for finding an $(\epsilon,\sqrt{\epsilon})$-SOSP of unconstrained nonconvex optimization. The operation complexity of $\widetilde{\cal O}(\epsilon^{-3/2}\min\{n,\epsilon^{-1/4}\})$, measured by the amount of fundamental operations, is also established for our method.
    Parallel APSM for Fast and Adaptive Digital SIC in Full-Duplex Transceivers with Nonlinearity. (arXiv:2207.05461v1 [eess.SP])
    This paper presents a kernel-based adaptive filter that is applied for the digital domain self-interference cancellation (SIC) in a transceiver operating in full-duplex (FD) mode. In FD, the benefit of simultaneous transmission and receiving of signals comes at the price of strong self-interference (SI). In this work, we are primarily interested in suppressing the SI using an adaptive filter namely adaptive projected subgradient method (APSM) in a reproducing kernel Hilbert space (RKHS) of functions. Using the projection concept as a powerful tool, APSM is used to model and consequently remove the SI. A low-complexity and fast-tracking algorithm is provided taking advantage of parallel projections as well as the kernel trick in RKHS. The performance of the proposed method is evaluated on real measurement data. The method illustrates the good performance of the proposed adaptive filter, compared to the known popular benchmarks. They demonstrate that the kernel-based algorithm achieves a favorable level of digital SIC while enabling parallel computation-based implementation within a rich and nonlinear function space, thanks to the employed adaptive filtering method.
    The Cosmic Graph: Optimal Information Extraction from Large-Scale Structure using Catalogues. (arXiv:2207.05202v1 [astro-ph.CO])
    We present an implicit likelihood approach to quantifying cosmological information over discrete catalogue data, assembled as graphs. To do so, we explore cosmological inference using mock dark matter halo catalogues. We employ Information Maximising Neural Networks (IMNNs) to quantify Fisher information extraction as a function of graph representation. We a) demonstrate the high sensitivity of modular graph structure to the underlying cosmology in the noise-free limit, b) show that networks automatically combine mass and clustering information through comparisons to traditional statistics, c) demonstrate that graph neural networks can still extract information when catalogues are subject to noisy survey cuts, and d) illustrate how nonlinear IMNN summaries can be used as asymptotically optimal compressed statistics for Bayesian implicit likelihood inference. We reduce the area of joint $\Omega_m, \sigma_8$ parameter constraints with small ($\sim$100 object) halo catalogues by a factor of 42 over the two-point correlation function, and demonstrate that the networks automatically combine mass and clustering information. This work utilises a new IMNN implementation over graph data in Jax, which can take advantage of either numerical or auto-differentiability. We also show that graph IMNNs successfully compress simulations far from the fiducial model at which the network is fitted, indicating a promising alternative to $n$-point statistics in catalogue-based analyses.
    Wasserstein multivariate auto-regressive models for modeling distributional time series and its application in graph learning. (arXiv:2207.05442v1 [stat.ML])
    We propose a new auto-regressive model for the statistical analysis of multivariate distributional time series. The data of interest consist of a collection of multiple series of probability measures supported over a bounded interval of the real line, and that are indexed by distinct time instants. The probability measures are modelled as random objects in the Wasserstein space. We establish the auto-regressive model in the tangent space at the Lebesgue measure by first centering all the raw measures so that their Fr\'echet means turn to be the Lebesgue measure. Using the theory of iterated random function systems, results on the existence, uniqueness and stationarity of the solution of such a model are provided. We also propose a consistent estimator for the model coefficient. In addition to the analysis of simulated data, the proposed model is illustrated with two real data sets made of observations from age distribution in different countries and bike sharing network in Paris. Finally, due to the positive and boundedness constraints that we impose on the model coefficients, the proposed estimator that is learned under these constraints, naturally has a sparse structure. The sparsity allows furthermore the application of the proposed model in learning a graph of temporal dependency from the multivariate distributional time series.
    Markovian Gaussian Process Variational Autoencoders. (arXiv:2207.05543v1 [cs.LG])
    Deep generative models are widely used for modelling high-dimensional time series, such as video animations, audio and climate data. Sequential variational autoencoders have been successfully considered for many applications, with many variant models relying on discrete-time methods and recurrent neural networks (RNNs). On the other hand, continuous-time methods have recently gained attraction, especially in the context of irregularly-sampled time series, where they can better handle the data than discrete-time methods. One such class are Gaussian process variational autoencoders (GPVAEs), where the VAE prior is set as a Gaussian process (GPs), allowing inductive biases to be explicitly encoded via the kernel function and interpretability of the latent space. However, a major limitation of GPVAEs is that it inherits the same cubic computational cost as GPs. In this work, we leverage the equivalent discrete state space representation of Markovian GPs to enable a linear-time GP solver via Kalman filtering and smoothing. We show via corrupt and missing frames tasks that our method performs favourably, especially on the latter where it outperforms RNN-based models.
    Multi-Model Federated Learning with Provable Guarantees. (arXiv:2207.04330v2 [cs.LG] UPDATED)
    Federated Learning (FL) is a variant of distributed learning where edge devices collaborate to learn a model without sharing their data with the central server or each other. We refer to the process of training multiple independent models simultaneously in a federated setting using a common pool of clients as multi-model FL. In this work, we propose two variants of the popular FedAvg algorithm for multi-model FL, with provable convergence guarantees. We further show that for the same amount of computation, multi-model FL can have better performance than training each model separately. We supplement our theoretical results with experiments in strongly convex, convex, and non-convex settings.

  • Open

    Is reinforcement learning the tool for this?
    Help with creating a first reinforcement learning AI I'm wondering if reinforcement learning is right for a game. In the game you need to pick which objects to move and move them an arbitrary distance to accomplish a desired configuration of objects and their connections. The point of the game is to move a minimal number of objects. I guess my question is can I use keras reinforcement learning to create an agent where its action is this: it picks an object, a direction and a distance it moves the object? Then it would make as much actions as it needs to solve the problem, and hopefully learn to solve it in less steps than previously until it reaches an optimal number of steps. And any feedback would be well and truly appreciated. Thanks in advance! submitted by /u/RollingLSlowly [link] [comments]  ( 85 min )
    CleanRL now has a DDPG + JAX implementation roughly 2.5-4x faster than DDPG + PyTorch
    submitted by /u/vwxyzjn [link] [comments]  ( 84 min )
    Oleh Rybkin, UPenn, on exploration and planning with world models
    Here is a podcast with Oleh Rybkin where we discuss agents that explore and plan (and do yoga), how to learn world models from video, what's missing from current RL research, and much more! submitted by /u/thejashGI [link] [comments]  ( 84 min )
    Is ML conferences challenge worth participating?
    Do industry and academia really value these challenges? Or, what is your thoughts about it? submitted by /u/Blasphemer666 [link] [comments]  ( 84 min )
    Help : have anyone coded Hexagonal Maze environment ?
    I am looking for maze like environment, each cell in a maze is hexagonal (6 sides), with few sides opened for passage and few sides act as a wall. submitted by /u/kachua26 [link] [comments]  ( 84 min )
    Best libraries to code gym envs simulation for GPU?
    I'm trying to test the speed between executing RL in CPU vs GPU for a simple workstation (user level high end PC). My nets are simple (3 layers of 256 units) and the environment I'm trying to test is a drone-like environment (similar to 3D robots without world interactions, only aerial movement physics). I've already executed only the training in GPU (specifically with ray/rllib), but due to small net and high compute sim, the speed is almost the same. I think due to latency sending back and ford the data. So now I want to execute all the train and simulation for the GPU. Up until now I've come to know Nvidia's Isaac Gym and Brax simulators, but both use libraries dedicated to using gpu (like Pytorch or Jax). Is there any other libraries? Which is easier to implement new custom gym envs? submitted by /u/NavirAur [link] [comments]  ( 85 min )
    Does entropy used in SAC and PPO different?
    Hi, I would like to know if implementation of entropy in SAC and PPO different? If yes, what is the difference? Thanks submitted by /u/4thfever [link] [comments]  ( 84 min )
  • Open

    [N] BigScience Releases their 176 Billion Parameter Open-access Multilingual Language Model
    BigScience recently released their new open-access (with weights) massive 176B language model that looks incredibly promising.The size is comparable to OpenAI's largest GPT-3 model. More info about the model can be found on BigScience's blog. You can play with the model interactively, for free(!) on Huggingface. submitted by /u/MonLiH [link] [comments]  ( 86 min )
    [R] Deep Hierarchical Planning from Pixels ( Director ) - Google 2022
    Paper: https://arxiv.org/pdf/2206.04114.pdf https://ai.googleblog.com/2022/07/deep-hierarchical-planning-from-pixels.html?m=1 Abstract: Intelligent agents need to select long sequences of actions to solve complex tasks. While humans easily break down tasks into subgoals and reach them through millions of muscle commands, current artificial intelligence is limited to tasks with horizons of a few hundred decisions, despite large compute budgets. Research on hierarchical reinforcement learning aims to overcome this limitation but has proven to be challenging, current methods rely on manually specified goal spaces or subtasks, and no general solution exists. We introduce Director, a practical method for learning hierarchical behaviors directly from pixels by planning inside the latent space of a learned world model. The high-level policy maximizes task and exploration rewards by selecting latent goals and the low-level policy learns to achieve the goals. Despite operating in latent space, the decisions are interpretable because the world model can decode goals into images for visualization. Director outperforms exploration methods on tasks with sparse rewards, including 3D maze traversal with a quadruped robot from an egocentric camera and proprioception, without access to the global position or top-down view that was used by prior work. Director also learns successful behaviors across a wide range of environments, including visual control, Atari games, and DMLab levels. https://preview.redd.it/lbvp6r7wl7b91.jpg?width=1034&format=pjpg&auto=webp&s=e9a28b2589eb41148de5b5bb6c4700354e795ae4 https://preview.redd.it/kikyu54xl7b91.jpg?width=1041&format=pjpg&auto=webp&s=b893e54790c420780c79819e689a9666ea95bf86 https://preview.redd.it/m5wc4tdxl7b91.jpg?width=1007&format=pjpg&auto=webp&s=17d7edf3cf7021ceabd3327d9408f1c3bd913c03 https://preview.redd.it/9cwsn9oxl7b91.jpg?width=1015&format=pjpg&auto=webp&s=c96348f290e9ff76c7003c51c97ac86705b77068 submitted by /u/Singularian2501 [link] [comments]  ( 86 min )
    [D] Does vector prediction merit using a multivariate output model?
    I am building a framework that predicts a displacement vector for a series of points on a map, using features from those points. There’s evidence of a relationship between the correlation coefficient of vector values (i.e. x and y-displacement) and some of the features. Would this merit using a multivariate output model (likely gradient boosting tree regression) or should I use two univariate output models? If not, what should I be looking into? submitted by /u/Boring-Violinist8291 [link] [comments]  ( 85 min )
    [P] Ensembling with multiple independent time-series
    I'm working on a project in which I have N independent time-series datasets, which can be thought of like prices for different currencies/crypto-coins etc. I've structured my dataset such that for each training batch, the first dimension is the index of the time-series. I have a prediction model based on a couple papers, which takes in a sliding window and outputs a prediction of the time series. Question: What is the best way to build an ensemble of this model, such that predictions for each time-series aren't affected by the others? When I say "aren't affected by other time series", i mean that the average of predictions of two different models trained on two different series might not be as accurate/precise as the predictions by themselves (without averaging)... Should I have N different models for each time series and just average the predictions? Should I have some K number of models with different loss functions and then average those? What would be a good strategy? submitted by /u/takeafuckinsipp [link] [comments]  ( 86 min )
    [R] DiBB: Distributing Black-Box Optimization
    Author here. Just presented this work at GECCO 2022. Quick summary: https://twitter.com/giuse_tweets/status/1546920346015637505 Paper: https://exascale.info/assets/pdf/cuccu2022gecco.pdf Code + tutorials: https://github.com/giuse/dibb Experiments (COCO/BBOB-LS): https://github.com/eXascaleInfolab/dibb_coco Recorded rehearsal of the talk: https://tinyurl.com/dibb-video AMA! submitted by /u/giuse_tweets [link] [comments]  ( 85 min )
    [D]Oleh Rybkin, UPenn, on exploration and planning with world models
    Here is a podcast with Oleh Rybkin where we discuss agents that explore and plan (and do yoga), how to learn world models from video, what's missing from current RL research, and much more! submitted by /u/thejashGI [link] [comments]  ( 85 min )
    [D] Does it make sense to generate text sequences with Transformer-based models and then have a classifier to choose between multiple options.
    Hello, I have a topic for discussion: Are you aware of systems which have a sequence-to-sequence architecture such as a Transformer, generating multiple outputs for a given task, and then another model - a MLP, another Transformer or something else which learns to pick the best option. Is it possible for this extra step to extract more knowledge from given data and increase the performance of the pipeline (even though at the cost of more computing power)? In what contexts does (not) that make sense? submitted by /u/IllustriousCicada603 [link] [comments]  ( 87 min )
    [P] Token-to-Token ViT Implementation in Flax
    ​ https://preview.redd.it/0mh5d00tx5b91.png?width=479&format=png&auto=webp&s=ac8c83e80d058d032e9083512da749216d9a2221 An open-source implementation of the Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet research paper in Google's JAX and Flax. "Transformers, which are popular for language modeling, have been explored for solving vision tasks recently, e.g., the Vision Transformer (ViT) for image classification. The ViT model splits each image into a sequence of tokens with fixed length and then applies multiple Transformer layers to model their global relation for classification. However, ViT achieves inferior performance to CNNs when trained from scratch on a midsize dataset like ImageNet. We find it is because: 1) the simple tokenization of input images fail…  ( 87 min )
    [P] Run transformers model inference in C/C++ and Assembly with the Python C API
    ​ https://preview.redd.it/xjtcha3r35b91.png?width=1298&format=png&auto=webp&s=00873223c1ea0c6afcd5e22c7645521036b7e341 This post presents a way to run transformers models via the Python C API. The referenced notebook loads two txtai workflows, one that translates English to French and another that summarizes a webpage. After loading the models through C code, another example runs the workflows through assembly to show this works with any native code. Full code links: Notebook | GitHub submitted by /u/davidmezzetti [link] [comments]  ( 86 min )
    [P] DALL·E Mini & Mega demo and production API
    Hi all - we've just put out the community DALL·E models on Playgrounds.ai: Mega - https://playgrounds.ai/models/dalle-mega Mini - https://playgrounds.ai/models/dalle-mini You can use this models via API on PipelineCloud here: https://dashboard.pipeline.ai The per image cost for the models are approx: Mega - $0.0014 (~10s of compute for 4 images) Mini - $0.00062 (~10s of compute for 9 images) This is for people who want to use these models in their apps/products or just play around with the demos and have fun! https://preview.redd.it/zre4tf40a4b91.png?width=3114&format=png&auto=webp&s=68d8c10236cdd23c642e581d479d479b38fede84 submitted by /u/paulcjh [link] [comments]  ( 87 min )
    [R] On the Principles of Parsimony and Self-Consistency for the Emergence of Intelligence
    submitted by /u/hardmaru [link] [comments]  ( 85 min )
    [D] How to choose best model during training if validation loss fluctuates a lot?
    I am training a deep neural network, unfortunately, I have few samples for my validation set, so the relative loss fluctuate a lot. How can I choose the best model during training? Usually I choose the model which is associated with the lowest validation loss, but now there are random fluctuation that lower loss function. I think the fluctuations are due to the fact that I can't use the whole sample because I am using Colab free and i haven't enough RAM. I tried to modify the splig Train/Train/Vali increasing Vali size and the oscillations seems a bit lower, but i would like to mantain the ratio 60/20/20 for a better and more significative classification. submitted by /u/imunabletocode [link] [comments]  ( 88 min )
    [P] Helping data scientists access large ML datasets
    I spent so much time building data pipelines which feels like a huge constraint on my time and ability to focus on actual ML tasks. That's why I'm building subtask.net which collects and builds large, constantly updated, ML datasets from across the internet. The goal is to cut out the data collection part of any ML project and make more datasets available beyond the typical open-source datasets provided by the community. submitted by /u/subtask_net [link] [comments]  ( 86 min )
    [P] Building efficient ML applications with Taichi's automatic differentiation
    ​ https://i.redd.it/d66p6f6p23b91.gif Hey guys I am working on an open-source, parallel programming language, Taichi Lang, which I find efficient in differentiable physical simulation and can help speed up the convergence of ML processes. Above is a simple demo supported by Taichi's inbuilt autodiff (automatic differentiation) system. You can move the target as you wish, and the magic fountain always changes its trajectory accordingly to hit the target. So basically, Taichi Lang's Source Code Transformation system generates gradient kernels during compile time, and the lightweight tape in the Python scope records the launched Taichi kernels and replays the gradient kernels in reverse order during backpropagation. Model training is done within 10 optimization iterations. A step-by-step explanation: https://www.reddit.com/user/mingrui-zhang/comments/vx49mz/training_a_magic_fountain_using_taichis_autodiff/ Source code: https://github.com/taichi-dev/taichi/blob/master/python/taichi/examples/autodiff/diff_sph/diff_sph.py submitted by /u/mingrui-zhang [link] [comments]  ( 86 min )
    [D] Understanding how hardware plays a role in creating AI models
    I'm wondering if there's any sort of article/videos/reddit post focused on explaining everything to know about hardware and it's impact on AI (cores, tensors, cores, threading, etc.) I have a lot of background from the software side so code optimization isn't something that tI've thought too much about but I'm currently working on building my own PC so I do need this information (I'm not looking for a guide because that won't help me learn, but I want to learn all this stuff from the ground up). Any recommendations on where I can learn more about this? Thanks! submitted by /u/anacondavibes [link] [comments]  ( 86 min )
    [D] How do you verify the novelty of your research?
    While working on my own research and struggling to find related works it got me thinking. What process do you follow to discover preexisting research similar to your own? With the fast pace of research in the field, and so much overlapping terminology, do you use fancy tools or go beyond just typing queries into google scholar until you get relevant papers to your own? How do you find what you don't know to look for? submitted by /u/ajt9000 [link] [comments]  ( 95 min )
    [D] Efficiently choose good papers in top-tier conferences
    Hey, As a senior Phd student, I still feel a bit tired of looking for and reading through the massive newly accepted papers in top-tier conferences/journals like neurips/icml/iclr/jmlr/cvpr.... Any suggestions for efficiently selecting good papers ? submitted by /u/Ok-Wind-1215 [link] [comments]  ( 88 min )
    [R] Machine Learning Operations (MLOps): Overview, Definition, and Architecture
    Paper: https://arxiv.org/ftp/arxiv/papers/2205/2205.02302.pdf Abstract: The final goal of all industrial machine learning (ML) projects is to develop ML products and rapidly bring them into production. However, it is highly challenging to automate and operationalize ML products and thus many ML endeavors fail to deliver on their expectations. The paradigm of Machine Learning Operations (MLOps) addresses this issue. MLOps includes several aspects, such as best practices, sets of concepts, and development culture. However, MLOps is still a vague term and its consequences for researchers and professionals are ambiguous. To address this gap, we conduct mixed-method research, including a literature review, a tool review, and expert interviews. As a result of these investigations, we provide an aggregated overview of the necessary principles, components, and roles, as well as the associated architecture and workflows. Furthermore, we furnish a definition of MLOps and highlight open challenges in the field. Finally, this work provides guidance for ML researchers and practitioners who want to automate and operate their ML products with a designated set of technologies. https://preview.redd.it/km40o6fce1b91.jpg?width=785&format=pjpg&auto=webp&s=1e1079e839c8230f03df4bcd25b2cc3d58d42049 submitted by /u/Singularian2501 [link] [comments]  ( 86 min )
    "[Project]" Brainchop: In Browser 3D Segmentation. Now 50 and 104 Brain Segmentations. (Follow up).
    ​ https://reddit.com/link/vwxs2u/video/91mo2fnr81b91/player Live Demo: brainchop.org Brainchop is a client-side web-application for automatic segmentation of MRI volumes , we make implementation of brainchop freely available releasing its pure Javascript code as open-source. We appreciate your ideas/feedback /comments here or with the discussion board, and please star Brainchop if you like it to keep it going. submitted by /u/Character-Rip-5824 [link] [comments]  ( 85 min )
  • Open

    “Paranoid Android” created on Pixelz.ai by user - Prompt in comments 👇🏽
    submitted by /u/pixelz_ai [link] [comments]  ( 84 min )
    Amazon Rekognition takes over the internet
    submitted by /u/NarcoticSlug [link] [comments]  ( 84 min )
    Alien Architecture Generated By AI
    submitted by /u/Electronic-Dealer-71 [link] [comments]  ( 83 min )
    bonsai-bt: A Behavior Tree library in Rust for creating complex AI logic https://github.com/Sollimann/bonsai
    submitted by /u/Sollimann [link] [comments]  ( 84 min )
    The test the could change everything
    submitted by /u/kbf_ [link] [comments]  ( 84 min )
    Interview with AGI Journalist who covered DeepBlue/Kasparov & AlphaGo/Sedol in person. Interview on interesting insights - subscribe for similar AI content soon! :)
    submitted by /u/joemurray1994 [link] [comments]  ( 84 min )
    Oleh Rybkin, UPenn, on exploration and planning with world models
    Here is a podcast with Oleh Rybkin where we discuss agents that explore and plan (and do yoga), how to learn world models from video, what's missing from current RL research, and much more! submitted by /u/thejashGI [link] [comments]  ( 84 min )
    BigScience AI Researchers Open-Source ‘BLOOM’: An Autoregressive Multilingual Large Language Model Larger Than GPT-3 and OPT-175B
    BigScience Project introduces BLOOM (BigScience Large Open-science Open-access Multilingual Language Model), the first multilingual Large Language Model (LLM) trained in complete transparency by the largest group of AI academics. Unlike the traditional secrecy of industrial AI research laboratories, the project demonstrates the possibility of training promising AI models published by the larger research community responsibly and openly. ✅ Transformers-based LLM ✅ 176B parameters (larger than GPT-3 and OPT-175B) ✅ Trained on 1.6TB text data, the equivalent of 320 times the complete works of Shakespeare Continue reading | Download submitted by /u/ai-lover [link] [comments]  ( 84 min )
    i experimented a bit with ai, that's what i get 😈
    submitted by /u/nalr00n [link] [comments]  ( 84 min )
    Top 10 AI Jobs and The Best Places to Find Them
    ​ This infographic shows the top job roles requiring AI and ML skills as well as the most attractive cities for AI jobs and the best companies in the field to work for. submitted by /u/Emily-joe [link] [comments]  ( 84 min )
    Psalms 34 completely illustrated with MidjourneyAI art - none of these images were post edited in any way, more details about creation in the description of the video
    submitted by /u/Racer_x32 [link] [comments]  ( 86 min )
    Sclera, Iris and Pupil Detector
    submitted by /u/Gloomy_Recognition_4 [link] [comments]  ( 86 min )
    73% of people mistook AI-generated images for human-made artwork
    submitted by /u/KazRainer [link] [comments]  ( 84 min )
    Hard rules in a GAN Neural Network.
    I have a script that can accept/reject outputs of the generator based on a set of rules, and I want to integrate it into the GAN, however, I'm not sure how to do so without breaking the math of the backpropagation and other stuff. What is the correct approach to this problem? submitted by /u/iLoveNintend0 [link] [comments]  ( 84 min )
    Heuristics and Algorithms in AI
    So this might be a more theoretical question and im not sure its related to this sub but i'll shoot my shot anyway: BFS,DFS,ITERATIVE DEEPENING and UNIFORM COST SEARCH are all algorithms that find a path in our domain of states that make NO USE of heuristics, They are what we call "blind search", Uniform cost search makes use of the weight between each nodes and the other three just blindly go through the nodes as if each edge has a weight of 1. GREEDY BEST FIRST SEARCH and A* are both algorithms that make use of heuristics which is basically a function that should give an estimation of a node n of a cost to the target node. I keep getting confused about each of them so would like to know if what i wrote above is correct. Thank you for your time. EDIT: haven't talked about completeness and optimal heuristics because i think i got those down just fine. submitted by /u/Alternative_Shoe2623 [link] [comments]  ( 84 min )
    I programmed Minecraft to control real LEDs when I look at the corresponding color in Minecraft (using computer vision as a real time data collection system)
    submitted by /u/MrDemonFrog [link] [comments]  ( 84 min )
    Wondrous Fairy Escapade | Cinematic 4K 24 FPS (FILM)
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 84 min )
  • Open

    Grand Entrance: Human Horizons Unveils Smart GT Built on NVIDIA DRIVE Orin
    Touring vehicles just became a little more grand. Electric vehicle maker Human Horizons provided a detailed glimpse earlier this month of its latest production model, the GT HiPhi Z. The intelligent EV is poised to redefine the grand tourer category with innovative, software-defined capabilities that bring luxurious cruising to the next level. The vehicle’s marquee Read article > The post Grand Entrance: Human Horizons Unveils Smart GT Built on NVIDIA DRIVE Orin appeared first on NVIDIA Blog.  ( 5 min )
    Merge Ahead: Researcher Takes Software Bridge to Quantum Computing
    Kristel Michielsen was into quantum computing before quantum computing was cool. The computational physicist simulated quantum computers as part of her Ph.D. work in the Netherlands in the early 1990s. Today, she manages one of Europe’s largest facilities for quantum computing, the Jülich Unified Infrastructure for Quantum Computing (JUNIQ) . Her mission is to help Read article > The post Merge Ahead: Researcher Takes Software Bridge to Quantum Computing appeared first on NVIDIA Blog.  ( 6 min )
    Sequences That Stun: Visual Effects Artist Surfaced Studio Arrives ‘In the NVIDIA Studio’
    Visual effects savant Surfaced Studio steps In the NVIDIA Studio this week to share his clever film sequences, Fluid Simulation and Destruction, as well as his creative workflows. These sequences feature quirky visual effects that Surfaced Studio is renowned for demonstrating on his YouTube channel. The post Sequences That Stun: Visual Effects Artist Surfaced Studio Arrives ‘In the NVIDIA Studio’ appeared first on NVIDIA Blog.  ( 6 min )
  • Open

    Artificial intelligence model finds potential drug molecules a thousand times faster
    A geometric deep-learning model is faster and more accurate than state-of-the-art computational models, reducing the chances and costs of drug trial failures.  ( 6 min )
  • Open

    Revisiting Mask Transformer from a Clustering Perspective
    Posted by Qihang Yu, Student Researcher, and Liang-Chieh Chen, Research Scientist, Google Research Panoptic segmentation is a computer vision problem that serves as a core task for many real-world applications. Due to its complexity, previous work often divides panoptic segmentation into semantic segmentation (assigning semantic labels, such as “person” and “sky”, to every pixel in an image) and instance segmentation (identifying and segmenting only countable objects, such as “pedestrians” and “cars”, in an image), and further divides it into several sub-tasks. Each sub-task is processed individually, and extra modules are applied to merge the results from each sub-task stage. This process is not only complex, but it also introduces many hand-designed priors when processing sub-tasks and …  ( 23 min )
  • Open

    Real-Time Apps: Why Node.js is the Ideal Choice
    In a world where technology is evolving at a tremendous pace, it comes as no surprise that there’s an increase in demand for apps that interact with users in real time. And, it is no secret that the development of real-time apps is an extremely popular concept in the global market, thanks to rapid digitalization… Read More »Real-Time Apps: Why Node.js is the Ideal Choice The post Real-Time Apps: Why Node.js is the Ideal Choice appeared first on Data Science Central.  ( 18 min )
    Web Analytics Dashboards Carry a World of Data for Various Purposes
    Web analytics tools offer vital insights into your website’s visitors’ behavior by tracking their real-time activities on the platform from behind. These tools study almost everything – the number of daily and regular visitors, sessions and duration, conversions, and beyond. You can access a comprehensive report covering every aspect and personalize it to focus on… Read More »Web Analytics Dashboards Carry a World of Data for Various Purposes The post Web Analytics Dashboards Carry a World of Data for Various Purposes appeared first on Data Science Central.  ( 18 min )
    Top Picks for Blockchain Certifications
    Blockchain Certifications and cryptocurrency have become popular among many new internet businesses. The security and transparency this technology offers are some of the reasons why cryptocurrency has gained popularity over the past years. Blockchain technology has remained to be the backbone of cryptocurrency. It is particularly helpful in maintaining data related to public transactions. The best… Read More »Top Picks for Blockchain Certifications The post Top Picks for Blockchain Certifications appeared first on Data Science Central.  ( 19 min )
  • Open

    Understanding the Design of a Convolutional Neural Network
    Convolutional neural networks have been found successful in computer vision applications. Various network architectures are proposed and they are neither magical nor hard to understand. In this tutorial, we will make sense of the operation of convolutional layers and their role in a larger convolutional neural network. After finishing this tutorial, you will learn: How […] The post Understanding the Design of a Convolutional Neural Network appeared first on Machine Learning Mastery.  ( 14 min )
  • Open

    How to make awesome datasets fast with Scrapy in Python
    Scrapy is highly customizable and developer friendly crawling framework in Python. It can help you build in few line wonderful crawler to…  ( 11 min )
  • Open

    Conway’s factoring trick
    The numbers 152 through 156 have a lot of small prime factors. I’ll be more explicit about that shortly, but take my word for it for now. John Conway [1] took this simple observation and turned it into a technique for mentally factoring integers. Conway’s factoring method To look for factors of a number n, […] Conway’s factoring trick first appeared on John D. Cook.  ( 7 min )
  • Open

    Sentiment Analysis of Stocktwits Messages using LSTM in PyTorch
    submitted by /u/Vasilkosturski [link] [comments]  ( 84 min )

  • Open

    [D] How to work with audio data?
    I have to work on a ML model which listens to sounds and classifies them as rat squeaks or not for my college project. Although I have already created a model using MFCC to convert the audios into float arrays (which are called feature vectors I think however I'm not 100% sure what they are) I later changed the sampling frequency everytime I took a different audio as input (in order to create the same number of array of arrays as output of the MFCC, i noticed changing the sampling rate changed the number of arrays outputted i think the correct term for it is hop_length) i couldn't use librosa as I couldn't install llvmlite after spending like half a day on it. Then I took each and every float in the arrays (61 arrays formed for each sound each containing 13 integers) and used it as a feature and ran RFC. (had 793 different columns at the end) My dataset is also just 159 sounds, most of which come from a machine squeaking sounds dataset which my teammate manually labelled those which sounded like rat squeaks as yes and rest as no. Then like 15 actual rat sounds mixed in (for which I had to change the hop_length, again idek what it actually is but I had to get the array lengths same. I looked up a lot on the internet but didn't seem to find any rat sound dataset nor anyone who could explain MFCCs properly) Needless to say, my ML model is quite inaccurate. Anyway, I think there has to be a better method than this in order to deal with audio data classify it. Can anyone who has experience with this, help me out? Thanks. submitted by /u/Spinner4177 [link] [comments]  ( 87 min )
    [P] Paper Implementation - Extracting Training Data from Large Language Models
    A re-implementation of the famous 2020 paper - "Extracting Training Data from Large Language Models" by Nicholas Carlini, Florian Tramer et al. Code - https://github.com/shreyansh26/Extracting-Training-Data-from-Large-Langauge-Models The official implementation is great and I definitely learned a few things from it. In the re-implementation, I have also included the temperature-decay sampling and sliding-window-based minimum perplexity metric which was not present in the official implementation. I checked the extracted Samples (refer to the Github repo) and they surely contained some memorized information. submitted by /u/shreyansh26 [link] [comments]  ( 85 min )
    [P] ScalableViT Implementation in Flax
    An open-source implementation of the ScalableViT: Rethinking the Context-oriented Generalization of Vision Transformer research paper in Google's JAX and Flax. "The vanilla self-attention mechanism inherently relies on pre-defined and steadfast computational dimensions. Such inflexibility restricts it from possessing context-oriented generalization that can bring more contextual cues and global representations. To mitigate this issue, we propose a Scalable Self-Attention (SSA) mechanism that leverages two scaling factors to release dimensions of query, key, and value matrix while unbinding them with the input. This scalability fetches context-oriented generalization and enhances object sensitivity, which pushes the whole network into a more effective trade-off state between accuracy and cost. Furthermore, we propose an Interactive Window-based Self-Attention (IWSA), which establishes interaction between non-overlapping regions by re-merging independent value tokens and aggregating spatial information from adjacent windows. By stacking the SSA and IWSA alternately, the Scalable Vision Transformer (ScalableViT) achieves state-of-the-art performance in general-purpose vision tasks. For example, ScalableViT-S outperforms Twins-SVT-S by 1.4% and Swin-T by 1.8% on ImageNet-1K classification." - Rui Yang, Hailong Ma, Jie Wu, Yansong Tang, Xuefeng Xiao, Min Zheng, Xiu Li Github repository for the Flax / JAX model: https://github.com/conceptofmind/Scalable-ViT-flax ScalableViT Research Paper: https://arxiv.org/abs/2203.10790 In collaboration with Lucid: https://github.com/lucidrains submitted by /u/EnricoShippole [link] [comments]  ( 86 min )
    [D] Instance segmentation using transformers
    Hi folks! I am looking for beginner-friendly and easy to implement papers on instance segmentation using transformers. Any help will be appreciated!! submitted by /u/cheemsdoge69 [link] [comments]  ( 85 min )
    [D] Speech Enhancement SOTA
    Audio denoising (removing background noises from audio), often referred as Speech Enhancement, has been a midly popular research field up to 2020. This was due to COVID and the need to filter unwanted noises from calls. However, I'm not sure where we're at today: Music Source Separation is improved by Tiktok and Deezer's researches Meta's denoiser looks like the most standard, production-ready model, and it implements a 2020 paper I'd like to search for more alternatives, but I struggle to find some: Googling "Denoising" will lead to images noise removal Paper with Code's "Speech denoising" and "Audio denoising" categories are pretty empty. The "Speech Enhancement" category seems to be the real deal, but the top models don't have any pretrained version available. Is there a model that outperform Meta's denoiser, while remaining open-source with an available pretrained model? submitted by /u/chaude_patate [link] [comments]  ( 86 min )
    [R] DA-Faster RCNN
    Hello, I have reimplemented DA-Faster RCNN using Detectron2 one of the most important architecture for domain adaptation for object detection. This implementations is easy to use and can be used also with google colab :) here there is the link: https://github.com/GiovanniPasq/DA-Faster-RCNN submitted by /u/CapitalShake3085 [link] [comments]  ( 85 min )
    [D] What is your go-to algorithm for Multiple Object Tracking with possible long time occlusions?
    Im interested in tracking cars and people with ability to solve occlusion of objects that might not be moving. Things I've tried are decent but not amazing(Deepsort, ByteTrack). There is a few recent studies about using transformers for tracking, but those things are heavy and not really production material, having deformable convolutions in them(hard or not possible convert to torchscript and tensorrt) and all. What's your go-to algorithm for this kind of problem? submitted by /u/InfiniteLife2 [link] [comments]  ( 87 min )
    [D] Modeling Adjacency Matrix
    Lets assume, I have some directed adjacency matrix A at time t and another adjacency matrix B at time t+1. I want to learn a mapping from A to B through some model f (suppose f is a neural network). Now, how should I create this model ? Should I use just Dense layers or GNNs or something? submitted by /u/Labib666Camp [link] [comments]  ( 86 min )
    [P] Semi-supervised learning for tabular data: VIME
    A lot of recent DL models for tabular data have used some sort of pre-training to increase the robustness and performance metrics on smaller/noisy datasets. That's why I've decided to write a deep-dive blog into a VIME paper which was one of the first to suggest pre-training tasks specific for tabular data. It comes with an accompanying repo that contains all the code and notebooks. From some personal testing that I've done, pre-training is the most valuable does improve the performance when we're dealing with very few labels (1-5% of the dataset). Of course, the best solution is to always get more labels lol, but when it's not possible, pre-training schemes like VIME can give you a small boost in performance. Give it a read and let me know what you think! I'll keep covering some interesting deep tabular architectures, so maybe also let me know which one would you want me to cover next! submitted by /u/blessedorcursed [link] [comments]  ( 86 min )
    [R] Closed-Form Diffeomorphic Transformations for Time Series Alignment
    Paper: https://arxiv.org/pdf/2206.08107.pdf Code: https://github.com/imartinezl/difw Abstract: Time series alignment methods call for highly expressive, differentiable and invertible warping functions which preserve temporal topology, i.e diffeomorphisms. Diffeomorphic warping functions can be generated from the integration of velocity fields governed by an ordinary differential equation (ODE). Gradient-based optimization frameworks containing diffeomorphic transformations require to calculate derivatives to the differential equation's solution with respect to the model parameters, i.e. sensitivity analysis. Unfortunately, deep learning frameworks typically lack automatic-differentiation-compatible sensitivity analysis methods; and implicit functions, such as the solution of ODE, require particular care. Current solutions appeal to adjoint sensitivity methods, ad-hoc numerical solvers or ResNet's Eulerian discretization. In this work, we present a closed-form expression for the ODE solution and its gradient under continuous piecewise-affine (CPA) velocity functions. We present a highly optimized implementation of the results on CPU and GPU. Furthermore, we conduct extensive experiments on several datasets to validate the generalization ability of our model to unseen data for time-series joint alignment. Results show significant improvements both in terms of efficiency and accuracy. https://reddit.com/link/vwf9wo/video/vvjnwp2y0xa91/player ​ submitted by /u/inigomlap [link] [comments]  ( 87 min )
    [R] An awesome collection of Federated learning & Blockchain research papers in the Healthcare domain
    An awesome collection of Federated learning & Blockchain research papers in the Healthcare domain. Federated learning, a mechanism of training a shared global model with a central server while keeping all the sensitive data in local institutions where the data belong, provides great promise to connect the fragmented healthcare data sources with privacy preservation. This repo contains a curated list of Federated Learning papers/resources and recent advancements in Healthcare. ​ As of now ~330 papers Pr's welcome https://github.com/monk1337/Aweome-Heathcare-Federated-Learning submitted by /u/aadityaura [link] [comments]  ( 85 min )
    [P]I used Note System on MNIST,traning speed was increased by more than two times!You can view this project on my github.
    submitted by /u/7NoteDancing [link] [comments]  ( 85 min )
    [D] Next big thing in the field
    Do you guys have any forecasts of next big model/algorithm/concept in DL? We had CNNs disrupting the field in ~2015, then GANs became a big deal, RL grown quite a lot, Transformers trended recently, now Diffusion models are moving probabilistic ML forward (sorry if I missed something). What other not fully investigated or underestimated concepts with high potential are there? submitted by /u/AdelSexy [link] [comments]  ( 93 min )
    [D] Why are Corgi dogs so popular in machine learning (especially in the image generation community)?
    For example, here's part of OpenAI's GLIDE paper: https://preview.redd.it/b6vkxyb3xua91.png?width=1225&format=png&auto=webp&s=15d56f256e323bb54d22eb9fdc0538644060c4a7 submitted by /u/Azuresonance [link] [comments]  ( 90 min )
  • Open

    I made cursed cartoon characters using Dream by Wombo
    submitted by /u/GetFlappy [link] [comments]  ( 84 min )
    Endless fun with HP creations 🧙🏼Voldemort Sketch on Pixelz.ai
    submitted by /u/mdfnb [link] [comments]  ( 84 min )
    The old and the new
    submitted by /u/deephugs [link] [comments]  ( 83 min )
    Is the brain an AI made of other AIs?
    If the brain can be broken down to multiple single-function specializing parts, what's stopping engineers to design AI for each of those parts and have all of them feed the resulting data into one overarching AI that, in turn, eats up those data and outputs...magic? Just a thought. I'm bored and I have 0 competence in AI, just a curious layman. Hope you may indulge my ignorance. Cheers! submitted by /u/TWHreddit [link] [comments]  ( 85 min )
    Weekly China AI News: Meet World's 1st Redstonic Neural Network in Minecraft; Shenzhen Holds Self-Driving Car Drivers Responsible for Crashes; AI Brings Back Decades-Old Concert
    submitted by /u/trcytony [link] [comments]  ( 84 min )
    Paper Implementation - Extracting Training Data from Large Language Models
    A re-implementation of the famous 2020 paper - "Extracting Training Data from Large Language Models" by Nicholas Carlini, Florian Tramer et al. Code - https://github.com/shreyansh26/Extracting-Training-Data-from-Large-Langauge-Models The official implementation is great and I definitely learned a few things from it. In the re-implementation, I have also included the temperature-decay sampling and sliding-window-based minimum perplexity metric which was not present in the official implementation. I checked the extracted Samples (refer to the Github repo) and they surely contained some memorized information. submitted by /u/shreyansh26 [link] [comments]  ( 84 min )
    Anyone Doing Andrew NG's Machine Learning Specialization?
    submitted by /u/biggbrother23 [link] [comments]  ( 83 min )
    New Open-Source Large Language Model 'Bloom' Does 40+ Languages And Has 176 Billion Parameters
    submitted by /u/getrich_or_diemining [link] [comments]  ( 84 min )
    Large language models might reason—if you know how to speak to them
    submitted by /u/bendee983 [link] [comments]  ( 84 min )
    i have used the amazing innovation of frame interpolation to make 60fps memes
    submitted by /u/oliviagolds [link] [comments]  ( 85 min )
    Its time for ai generator corporated!!!
    submitted by /u/GroundbreakingLaw878 [link] [comments]  ( 83 min )
    Ray Kurzweil Wants to Upload Your Brain to the Cloud
    submitted by /u/jormungandrsjig [link] [comments]  ( 85 min )
    Have you ever used an AI-powered photo editor? Could someone give me some advice on using it?
    submitted by /u/Lower_Peanut_9665 [link] [comments]  ( 84 min )
    is there an ai that can make a analog horror by text?
    Im looking for scares a bit so if theres any link it to me :D submitted by /u/GroundbreakingLaw878 [link] [comments]  ( 85 min )
    is there an ai that can make a analog horror by text?
    submitted by /u/GroundbreakingLaw878 [link] [comments]  ( 84 min )
    my generated art in nightcafe
    submitted by /u/GroundbreakingLaw878 [link] [comments]  ( 84 min )
  • Open

    AI on the Sky: Stunning New Images From the James Webb Space Telescope To Be Analyzed by, Train, AI
    The unveiling by U.S. President Joe Biden Monday of the first full-color image from the James Webb Space Telescope is already astounding — and delighting — humans around the globe. “We can see possibilities nobody has ever seen before, we can go places nobody has ever gone before,” Biden said during a White House press Read article > The post AI on the Sky: Stunning New Images From the James Webb Space Telescope To Be Analyzed by, Train, AI appeared first on NVIDIA Blog.  ( 5 min )
    Windfall: Omniverse Accelerates Turning Wind Power Into Clean Hydrogen Fuel
    Engineers are using the NVIDIA Omniverse 3D simulation platform as part of a proof of concept that promises to become a model for putting green energy to work around the world. Dubbed Gigastack, the pilot project — led by a consortium that includes Phillips 66 and Denmark-based renewable energy company Ørsted — will create low-emission Read article > The post Windfall: Omniverse Accelerates Turning Wind Power Into Clean Hydrogen Fuel appeared first on NVIDIA Blog.  ( 6 min )
  • Open

    "CausalAgents: A Robustness Benchmark for Motion Forecasting using Causal Relationships", Roelofs et al 2022 {Waymo}
    submitted by /u/gwern [link] [comments]  ( 84 min )
    "Director: Deep Hierarchical Planning from Pixels", Hafner et al 2022 {G} (hierarchical RL over world models)
    submitted by /u/gwern [link] [comments]  ( 84 min )
    "Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning", Fu et al 2022 (effectiveness of policy gradient MARL)
    submitted by /u/gwern [link] [comments]  ( 84 min )
    Visual (pixel based) RL, CNNs & Autoencoders
    There's been a lot of hype around visual RL (using pixel input for the agent's network) ever since Deepmind's DQN back in 2015. However, to the best of my knowledge, it seems like there hasn't been a lot of published work since then that uses images as observations. Therefore I have a few questions /discussion points for the community: ​ Have there been many/any notable image-based RL agents since DQN? If so, could you point me towards some? Are CNNs a good way to approach this type of RL? Could the CNN be trained independently of the agent, so that once the CNN is trained and can extract features and provide them as input to the agent, we can focus on training the agent and tuning its specific hyperparameters? How would one train a CNN independently from the agent? What would the CNN be trying to do? This leads me to think that autoencoders may be a good solution, since one can train them to reconstruct the original image and then use the trained encoder to build a latent space/compact feature representation of the original image during training as the input to the agent. Is this a good/bad idea? Has work been done in this area, if so could you point me towards it? This may seem like a lot but hopefully the evolution of my thoughts makes sense and therefore can start a discussion here :) Looking forward to hearing back from the community! submitted by /u/leozinho2r [link] [comments]  ( 86 min )
    PrefixRL: Optimization Of Parallel Prefix Circuits Using DRL {NVIDIA}
    submitted by /u/yazriel0 [link] [comments]  ( 84 min )
  • Open

    LiDAR 3D point cloud labeling with Velodyne LiDAR sensor in Amazon SageMaker Ground Truth
    LiDAR is a key enabling technology in growing autonomous markets, such as robotics, industrial, infrastructure, and automotive. LiDAR delivers precise 3D data about its environment in real time to provide “vision” for autonomous solutions. For autonomous vehicles (AVs), nearly every carmaker uses LiDAR to augment camera and radar systems for a comprehensive perception stack capable […]  ( 13 min )
  • Open

    A Gentle Introduction to tensorflow.data API
    When we build and train a Keras deep learning model, the training data can be provided in several different ways. Presenting the data as a NumPy array or a TensorFlow tensor is a common one. Making a Python generator function and let the training loop to read data from it is another way. Yet another […] The post A Gentle Introduction to tensorflow.data API appeared first on Machine Learning Mastery.  ( 17 min )
  • Open

    Why We Need to Move From Data-First to a Knowledge-First World
    We live in a data-rich world. Very data rich. Indeed, it’s estimated that roughly 2.5 quintillion bytes of data are created every day. Perhaps because of its ubiquity, there are those who believe the sheer volume of available data means we have all we need to easily and accurately answer any question without delay. If… Read More »Why We Need to Move From Data-First to a Knowledge-First World The post Why We Need to Move From Data-First to a Knowledge-First World appeared first on Data Science Central.  ( 19 min )
    Critical Role of Analytic Profiles in Developing Data Products
    The tech industry is abuzz with hyped up pontifications and bold predictions of the business-changing potential of Data Products.  I could not be happier as it’s a topic I have explored in several blogs (see the end of this blog for a list of my blogs on Data Products…yea, I know, get a life). A… Read More »Critical Role of Analytic Profiles in Developing Data Products The post Critical Role of Analytic Profiles in Developing Data Products appeared first on Data Science Central.  ( 20 min )
    Metaverse use cases – Which industries could the metaverse impact?
    According to the McKinsey Report called Value Creation in the Metaverse: $120b+ in investment has flowed into the metaverse so far in 2022 79% of consumers active on the metaverse have made a purchase >15% of corporate revenue is expected to come from the metaverse in the next 5 years according to 25% of senior… Read More »Metaverse use cases – Which industries could the metaverse impact? The post Metaverse use cases – Which industries could the metaverse impact? appeared first on Data Science Central.  ( 19 min )
    Features of IIoT (Industrial Internet of Things) Seamless Connectivity and Data Acquisition
    Executing Industrial Internet of Things (IIoT) solutions is vital as the most competitive global manufacturing companies are becoming digital enterprises. Industrial Internet of Things (IIoT) solutions and platforms are leading the reshaping and transformation of landscapes. A pre-built Industrial Internet of Things (IIoT) solution offers the benefit of a ready-made “IoT development kit” with the… Read More »Features of IIoT (Industrial Internet of Things) Seamless Connectivity and Data Acquisition The post Features of IIoT (Industrial Internet of Things) Seamless Connectivity and Data Acquisition appeared first on Data Science Central.  ( 18 min )
    Navigating the Costs of Cloud Networks
    Cloud networks have grown from what was seen as a passing trend by some experts, into full-fledged solutions that power some of the most important parts of various industries at this point. Large companies like Google and Microsoft have been investing steadily in the growth of their own solutions. At the same time, more tightly… Read More »Navigating the Costs of Cloud Networks The post Navigating the Costs of Cloud Networks appeared first on Data Science Central.  ( 21 min )
    Data Observability Vs Data Quality: What makes them different?
    Defining Data Observabilityand Data Quality As companies gather seemingly endless data streams from an increasing number of sources, they start to amass an ecosystem of data storage, would-be end-users, and pipelines. With each additional layer of complexity, opportunities for data downtime, moments when data is partial, erroneous, missing, or otherwise inaccurate, multiply. As a result,… Read More »Data Observability Vs Data Quality: What makes them different? The post Data Observability Vs Data Quality: What makes them different? appeared first on Data Science Central.  ( 19 min )
    Talent Management and Technology-A Perfect Blend
    Most small to medium businesses this year exhibit a desire to expand their scope of operations and increase their employee count over the course of the next year, a sign that the job market is on an upwards rise. That being said, the best and brightest are likely to be picked relatively quickly so competition… Read More »Talent Management and Technology-A Perfect Blend The post Talent Management and Technology-A Perfect Blend appeared first on Data Science Central.  ( 18 min )
    The Two Types of Agility You Need
    If your business is going to survive, you must be able to read and react to changes in your markets and continuously improve your competitive position.  It’s more important now than it’s ever been. SWOT is a model often employed to characterize a company’s competitive position in terms of Strengths, Weaknesses, Opportunities, and Threats.  If… Read More »The Two Types of Agility You Need The post The Two Types of Agility You Need appeared first on Data Science Central.  ( 21 min )
    Webinar Series -The rise of the Modern DataStack and the Modern Data Quality Platform
    On Wednesday, July 13th at 11 am EST, please join DQLabs for an exclusive virtual event“Defining Data Relevance: The rise of the Modern Data Stack and the Modern Data Quality Platform”. The data producers, consumers, and leaders deserve an ecosystem that delivers the data that is relevant to them – one size fits all approaches… Read More »Webinar Series -The rise of the Modern DataStack and the Modern Data Quality Platform The post Webinar Series -The rise of the Modern DataStack and the Modern Data Quality Platform appeared first on Data Science Central.  ( 17 min )
  • Open

    Your RPA Implementation Must be at Risk! [Here Are 7 Reasons Why]
    IT leaders are running into several RPA failures. Here, we have covered the top 7 reasons why RPA implementations fail and how you can…  ( 12 min )
    Which tool is the best to make a complete dataset?
    Crawling a website is as today an essential skill for anyone working in or with the digital industry. Firstly, I will start by clarifying…  ( 10 min )
    Google Imagen: text-to-image AI
    Google Imagen: a machine learning system that can generate graphics from text input.  ( 6 min )
  • Open

    New Open-Source Bloom AI To Challenge OpenAI & Google Deepmind | Breakthrough Chemical AI System
    submitted by /u/tohelpyou88 [link] [comments]  ( 84 min )
    “The use of Neural Networks in predicting share prices”
    So how accurately can neural network predict the future prices of the stocks in the share market. If there are any good resources that could help me know more about it could you please share. submitted by /u/Suspicious_Speed24 [link] [comments]  ( 88 min )
    What math operations are the bottleneck for running inference on the edge? I’m trying to select an edge accelerator for a product and the industry seems to be fairly immature and it’s difficult to compare units.
    To expand, it seems like there are countless accelerators intended to speed up NN inference on the market. Way more than I have the bandwidth to individually setup environments for testing and evaluation. The other factor is that since the academic world is changing fast it seems challenging to predict which math operations will end up being the standard, kind of like how the CPU world landed on x86 and RISC. I get the feeling that since everything is in flux, the long hardware development cycles mean we are taking a long time to stabilise and agree on the best ASICS to design. So in the meantime, I am seeking some recommendations for what the instructions the current state of the art neural networks (let’s say, for computer vision) need. Matrix multiplication? Linear algebra? Fused multiply add? Is the expectation that this will change over time? Any info would be much appreciated. submitted by /u/meregizzardavowal [link] [comments]  ( 89 min )
  • Open

    NeuralGrasps: Learning Implicit Representations for Grasps of Multiple Robotic Hands. (arXiv:2207.02959v1 [cs.RO] CROSS LISTED)
    We introduce a neural implicit representation for grasps of objects from multiple robotic hands. Different grasps across multiple robotic hands are encoded into a shared latent space. Each latent vector is learned to decode to the 3D shape of an object and the 3D shape of a robotic hand in a grasping pose in terms of the signed distance functions of the two 3D shapes. In addition, the distance metric in the latent space is learned to preserve the similarity between grasps across different robotic hands, where the similarity of grasps is defined according to contact regions of the robotic hands. This property enables our method to transfer grasps between different grippers including a human hand, and grasp transfer has the potential to share grasping skills between robots and enable robots to learn grasping skills from humans. Furthermore, the encoded signed distance functions of objects and grasps in our implicit representation can be used for 6D object pose estimation with grasping contact optimization from partial point clouds, which enables robotic grasping in the real world.  ( 2 min )
    Investigating Generalization by Controlling Normalized Margin. (arXiv:2205.03940v2 [cs.LG] UPDATED)
    Weight norm $\|w\|$ and margin $\gamma$ participate in learning theory via the normalized margin $\gamma/\|w\|$. Since standard neural net optimizers do not control normalized margin, it is hard to test whether this quantity causally relates to generalization. This paper designs a series of experimental studies that explicitly control normalized margin and thereby tackle two central questions. First: does normalized margin always have a causal effect on generalization? The paper finds that no -- networks can be produced where normalized margin has seemingly no relationship with generalization, counter to the theory of Bartlett et al. (2017). Second: does normalized margin ever have a causal effect on generalization? The paper finds that yes -- in a standard training setup, test performance closely tracks normalized margin. The paper suggests a Gaussian process model as a promising explanation for this behavior.  ( 2 min )
    Spatiotemporal Feature Learning Based on Two-Step LSTM and Transformer for CT Scans. (arXiv:2207.01579v2 [eess.IV] UPDATED)
    Computed tomography (CT) imaging could be very practical for diagnosing various diseases. However, the nature of the CT images is even more diverse since the resolution and number of the slices of a CT scan are determined by the machine and its settings. Conventional deep learning models are hard to tickle such diverse data since the essential requirement of the deep neural network is the consistent shape of the input data. In this paper, we propose a novel, effective, two-step-wise approach to tickle this issue for COVID-19 symptom classification thoroughly. First, the semantic feature embedding of each slice for a CT scan is extracted by conventional backbone networks. Then, we proposed a long short-term memory (LSTM) and Transformer-based sub-network to deal with temporal feature learning, leading to spatiotemporal feature representation learning. In this fashion, the proposed two-step LSTM model could prevent overfitting, as well as increase performance. Comprehensive experiments reveal that the proposed two-step method not only shows excellent performance but also could be compensated for each other. More specifically, the two-step LSTM model has a lower false-negative rate, while the 2-step Swin model has a lower false-positive rate. In summary, it is suggested that the model ensemble could be adopted for more stable and promising performance in real-world applications.  ( 3 min )
    A Structured Span Selector. (arXiv:2205.03977v2 [cs.CL] UPDATED)
    Many natural language processing tasks, e.g., coreference resolution and semantic role labeling, require selecting text spans and making decisions about them. A typical approach to such tasks is to score all possible spans and greedily select spans for task-specific downstream processing. This approach, however, does not incorporate any inductive bias about what sort of spans ought to be selected, e.g., that selected spans tend to be syntactic constituents. In this paper, we propose a novel grammar-based structured span selection model which learns to make use of the partial span-level annotation provided for such problems. Compared to previous approaches, our approach gets rid of the heuristic greedy span selection scheme, allowing us to model the downstream task on an optimal set of spans. We evaluate our model on two popular span prediction tasks: coreference resolution and semantic role labeling. We show empirical improvements on both.  ( 2 min )
    Approximately Solving Mean Field Games via Entropy-Regularized Deep Reinforcement Learning. (arXiv:2102.01585v2 [cs.MA] UPDATED)
    The recent mean field game (MFG) formalism facilitates otherwise intractable computation of approximate Nash equilibria in many-agent settings. In this paper, we consider discrete-time finite MFGs subject to finite-horizon objectives. We show that all discrete-time finite MFGs with non-constant fixed point operators fail to be contractive as typically assumed in existing MFG literature, barring convergence via fixed point iteration. Instead, we incorporate entropy-regularization and Boltzmann policies into the fixed point iteration. As a result, we obtain provable convergence to approximate fixed points where existing methods fail, and reach the original goal of approximate Nash equilibria. All proposed methods are evaluated with respect to their exploitability, on both instructive examples with tractable exact solutions and high-dimensional problems where exact methods become intractable. In high-dimensional scenarios, we apply established deep reinforcement learning methods and empirically combine fictitious play with our approximations.  ( 2 min )
    Online Learning in Budget-Constrained Dynamic Colonel Blotto Games. (arXiv:2103.12833v3 [cs.LG] UPDATED)
    In this paper, we study the strategic allocation of limited resources using a Colonel Blotto game (CBG) under a dynamic setting and analyze the problem using an online learning approach. In this model, one of the players is the learner who has limited troops to allocate over a finite time horizon, and the other player is an adversary. At each stage, the learner plays a Colonel Blotto game with the adversary and strategically determines the distribution of troops among battlefields based on past observations. The adversary chooses its allocation strategy randomly from some fixed distribution that is unknown to the learner. The learner's objective is to minimize its regret, which is the difference between the payoff of the best mixed strategy and the realized payoff by following a learning algorithm while not violating the budget constraint. The learning in dynamic CBG is analyzed under the framework of combinatorial bandit and bandit with knapsacks. We first convert the budget-constrained dynamic CBG to a path planning problem on a directed graph. We then devise an efficient algorithm that combines a special combinatorial bandit algorithm Edge for the path planning problem and a bandit with knapsack algorithm LagrangeBwK to cope with the budget constraint. The theoretical analysis shows that the learner's regret is bounded by a term sublinear in time horizon and polynomial in other parameters. Finally, we justify our theoretical results by performing simulations for various scenarios.  ( 3 min )
    Distributed Saddle-Point Problems: Lower Bounds, Near-Optimal and Robust Algorithms. (arXiv:2010.13112v8 [cs.LG] UPDATED)
    This paper focuses on the distributed optimization of stochastic saddle point problems. The first part of the paper is devoted to lower bounds for the cenralized and decentralized distributed methods for smooth (strongly) convex-(strongly) concave saddle-point problems as well as the near-optimal algorithms by which these bounds are achieved. Next, we present a new federated algorithm for cenralized distributed saddle point problems - Extra Step Local SGD. Theoretical analysis of the new method is carried out for strongly convex-strongly concave and non-convex-non-concave problems. In the experimental part of the paper, we show the effectiveness of our method in practice. In particular, we train GANs in a distributed manner.  ( 2 min )
    Risk aversion in learning algorithms and recommendation systems. (arXiv:2205.04619v2 [cs.LG] UPDATED)
    Consider online learning algorithms that simultaneously make decisions and learn from feedback. Such algorithms are widely deployed in recommendation systems for products and digital content. This article exhibits a bias of online learning algorithms towards less risky alternatives, and how it shapes demand on recommendation systems. First, we consider $k$-armed bandits. We prove that $\varepsilon$-Greedy chooses a riskless arm over a risky arm of equal expected reward with probability arbitrarily close to one. This is a consequence of undersampling of arms with bad reward estimates. Through experiments, we show that other online learning algorithms exhibit risk aversion as well. In a recommendation system environment we show that content that yields less noisy reward from users is favored by the algorithm. Combined with equilibrium forces driving strategic content creators towards content of similar expected quality, the advantage for content that is not necessarily better, just less volatile, is exaggerated.  ( 2 min )
    Sparsity and Heterogeneous Dropout for Continual Learning in the Null Space of Neural Activations. (arXiv:2203.06514v2 [cs.LG] UPDATED)
    Continual/lifelong learning from a non-stationary input data stream is a cornerstone of intelligence. Despite their phenomenal performance in a wide variety of applications, deep neural networks are prone to forgetting their previously learned information upon learning new ones. This phenomenon is called "catastrophic forgetting" and is deeply rooted in the stability-plasticity dilemma. Overcoming catastrophic forgetting in deep neural networks has become an active field of research in recent years. In particular, gradient projection-based methods have recently shown exceptional performance at overcoming catastrophic forgetting. This paper proposes two biologically-inspired mechanisms based on sparsity and heterogeneous dropout that significantly increase a continual learner's performance over a long sequence of tasks. Our proposed approach builds on the Gradient Projection Memory (GPM) framework. We leverage k-winner activations in each layer of a neural network to enforce layer-wise sparse activations for each task, together with a between-task heterogeneous dropout that encourages the network to use non-overlapping activation patterns between different tasks. In addition, we introduce two new benchmarks for continual learning under distributional shift, namely Continual Swiss Roll and ImageNet SuperDog-40. Lastly, we provide an in-depth analysis of our proposed method and demonstrate a significant performance boost on various benchmark continual learning problems.  ( 3 min )
    Bayesian Optimization Over Iterative Learners with Structured Responses: A Budget-aware Planning Approach. (arXiv:2206.12708v2 [cs.LG] UPDATED)
    The rising growth of deep neural networks (DNNs) and datasets in size motivates the need for efficient solutions for simultaneous model selection and training. Many methods for hyperparameter optimization (HPO) of iterative learners including DNNs attempt to solve this problem by querying and learning a response surface while searching for the optimum of that surface. However, many of these methods make myopic queries, do not consider prior knowledge about the response structure, and/or perform biased cost-aware search, all of which exacerbate identifying the best-performing model when a total cost budget is specified. This paper proposes a novel approach referred to as Budget-Aware Planning for Iterative Learners (BAPI) to solve HPO problems under a constrained cost budget. BAPI is an efficient non-myopic Bayesian optimization solution that accounts for the budget and leverages the prior knowledge about the objective function and cost function to select better configurations and to take more informed decisions during the evaluation (training). Experiments on diverse HPO benchmarks for iterative learners show that BAPI performs better than state-of-the-art baselines in most of the cases.  ( 2 min )
    Flexible Group Fairness Metrics for Survival Analysis. (arXiv:2206.03256v2 [cs.CY] UPDATED)
    Algorithmic fairness is an increasingly important field concerned with detecting and mitigating biases in machine learning models. There has been a wealth of literature for algorithmic fairness in regression and classification however there has been little exploration of the field for survival analysis. Survival analysis is the prediction task in which one attempts to predict the probability of an event occurring over time. Survival predictions are particularly important in sensitive settings such as when utilising machine learning for diagnosis and prognosis of patients. In this paper we explore how to utilise existing survival metrics to measure bias with group fairness metrics. We explore this in an empirical experiment with 29 survival datasets and 8 measures. We find that measures of discrimination are able to capture bias well whereas there is less clarity with measures of calibration and scoring rules. We suggest further areas for research including prediction-based fairness metrics for distribution predictions.  ( 2 min )
    Data-driven Numerical Invariant Synthesis with Automatic Generation of Attributes. (arXiv:2205.14943v3 [cs.PL] UPDATED)
    We propose a data-driven algorithm for numerical invariant synthesis and verification. The algorithm is based on the ICE-DT schema for learning decision trees from samples of positive and negative states and implications corresponding to program transitions. The main issue we address is the discovery of relevant attributes to be used in the learning process of numerical invariants. We define a method for solving this problem guided by the data sample. It is based on the construction of a separator that covers positive states and excludes negative ones, consistent with the implications. The separator is constructed using an abstract domain representation of convex sets. The generalization mechanism of the decision tree learning from the constraints of the separator allows the inference of general invariants, accurate enough for proving the targeted property. We implemented our algorithm and showed its efficiency.  ( 2 min )
    Rich Feature Construction for the Optimization-Generalization Dilemma. (arXiv:2203.15516v2 [cs.LG] UPDATED)
    There often is a dilemma between ease of optimization and robust out-of-distribution (OoD) generalization. For instance, many OoD methods rely on penalty terms whose optimization is challenging. They are either too strong to optimize reliably or too weak to achieve their goals. We propose to initialize the networks with a rich representation containing a palette of potentially useful features, ready to be used by even simple models. On the one hand, a rich representation provides a good initialization for the optimizer. On the other hand, it also provides an inductive bias that helps OoD generalization. Such a representation is constructed with the Rich Feature Construction (RFC) algorithm, also called the Bonsai algorithm, which consists of a succession of training episodes. During discovery episodes, we craft a multi-objective optimization criterion and its associated datasets in a manner that prevents the network from using the features constructed in the previous iterations. During synthesis episodes, we use knowledge distillation to force the network to simultaneously represent all the previously discovered features. Initializing the networks with Bonsai representations consistently helps six OoD methods achieve top performance on ColoredMNIST benchmark. The same technique substantially outperforms comparable results on the Wilds Camelyon17 task, eliminates the high result variance that plagues other methods, and makes hyperparameter tuning and model selection more reliable.  ( 3 min )
    Your Policy Regularizer is Secretly an Adversary. (arXiv:2203.12592v4 [cs.LG] UPDATED)
    Policy regularization methods such as maximum entropy regularization are widely used in reinforcement learning to improve the robustness of a learned policy. In this paper, we show how this robustness arises from hedging against worst-case perturbations of the reward function, which are chosen from a limited set by an imagined adversary. Using convex duality, we characterize this robust set of adversarial reward perturbations under KL and alpha-divergence regularization, which includes Shannon and Tsallis entropy regularization as special cases. Importantly, generalization guarantees can be given within this robust set. We provide detailed discussion of the worst-case reward perturbations, and present intuitive empirical examples to illustrate this robustness and its relationship with generalization. Finally, we discuss how our analysis complements and extends previous results on adversarial reward robustness and path consistency optimality conditions.  ( 2 min )
    Neuro-Inspired Deep Neural Networks with Sparse, Strong Activations. (arXiv:2202.13074v3 [cs.NE] UPDATED)
    While end-to-end training of Deep Neural Networks (DNNs) yields state of the art performance in an increasing array of applications, it does not provide insight into, or control over, the features being extracted. We report here on a promising neuro-inspired approach to DNNs with sparser and stronger activations. We use standard stochastic gradient training, supplementing the end-to-end discriminative cost function with layer-wise costs promoting Hebbian ("fire together," "wire together") updates for highly active neurons, and anti-Hebbian updates for the remaining neurons. Instead of batch norm, we use divisive normalization of activations (suppressing weak outputs using strong outputs), along with implicit $\ell_2$ normalization of neuronal weights. Experiments with standard image classification tasks on CIFAR-10 demonstrate that, relative to baseline end-to-end trained architectures, our proposed architecture (a) leads to sparser activations (with only a slight compromise on accuracy), (b) exhibits more robustness to noise (without being trained on noisy data), (c) exhibits more robustness to adversarial perturbations (without adversarial training).  ( 2 min )
    Standard Vs Uniform Binary Search and Their Variants in Learned Static Indexing: The Case of the Searching on Sorted Data Benchmarking Software Platform. (arXiv:2201.01554v2 [cs.DS] UPDATED)
    Learned Indexes are a novel approach to search in a sorted table. A model is used to predict an interval in which to search into and a Binary Search routine is used to finalize the search. They are quite effective. For the final stage, usually, the lower_bound routine of the Standard C++ library is used, although this is more of a natural choice rather than a requirement. However, recent studies, that do not use Machine Learning predictions, indicate that other implementations of Binary Search or variants, namely k-ary Search, are better suited to take advantage of the features offered by modern computer architectures. With the use of the Searching on Sorted Sets SOSD Learned Indexing benchmarking software, we investigate how to choose a Search routine for the final stage of searching in a Learned Index. Our results provide indications that better choices than the lower_bound routine can be made. We also highlight how such a choice may be dependent on the computer architecture that is to be used. Overall, our findings provide new and much-needed guidelines for the selection of the Search routine within the Learned Indexing framework.  ( 3 min )
    Optimal sizing of a holdout set for safe predictive model updating. (arXiv:2202.06374v3 [stat.ML] UPDATED)
    Predictive risk scores are increasingly used to guide clinical or other interventions in complex settings, particularly healthcare. Directly updating a risk score used to guide interventions leads to biased risk estimates. We propose updating using a `holdout set' -- a subset of the population that does not receive risk-score-guided interventions -- to prevent this. Since samples in the holdout set do not benefit from risk predictions, its size must trade off performance of the updated risk score whilst minimising the number of held out samples. We prove that this approach outperforms simple alternatives, and by defining a general loss function describe conditions under which an optimal holdout size (OHS) can be readily identified. We introduce parametric and semi-parametric algorithms for OHS estimation and demonstrate their use on a recent risk score for pre-eclampsia. Based on these results, we argue that a holdout set is a safe, viable and easily implemented means to safely update predictive risk scores.  ( 2 min )
    High Throughput Multi-Channel Parallelized Diffraction Convolutional Neural Network Accelerator. (arXiv:2112.12297v2 [cs.LG] UPDATED)
    Convolutional neural networks are paramount in image and signal processing including the relevant classification and training tasks alike and constitute for the majority of machine learning compute demand today. With convolution operations being computationally intensive, next generation hardware accelerators need to offer parallelization and algorithmic-hardware homomorphism. Fortunately, diffractive display optics is capable of million-channel parallel data processing at low latency, however, thus far only showed tens of Hertz slow single image and kernel capability, thereby significantly underdelivering from its performance potential. Here, we demonstrate an operation-parallelized high-throughput Fourier optic convolutional neural network accelerator. For the first time simultaneously processing of multiple kernels in Fourier domain enabled by optical diffraction has been achieved alongside with already conventional in the field input parallelism. Additionally, we show an about one hundred times system speed up over existing optical diffraction-based processors and this demonstration rivals performance of modern electronic solutions. Therefore, this system is capable of processing large-scale matrices about ten times faster than state of art electronic systems.  ( 2 min )
    Understanding Gradual Domain Adaptation: Improved Analysis, Optimal Path and Beyond. (arXiv:2204.08200v2 [cs.LG] UPDATED)
    The vast majority of existing algorithms for unsupervised domain adaptation (UDA) focus on adapting from a labeled source domain to an unlabeled target domain directly in a one-off way. Gradual domain adaptation (GDA), on the other hand, assumes a path of $(T-1)$ unlabeled intermediate domains bridging the source and target, and aims to provide better generalization in the target domain by leveraging the intermediate ones. Under certain assumptions, Kumar et al. (2020) proposed a simple algorithm, Gradual Self-Training, along with a generalization bound in the order of $e^{O(T)} \left(\varepsilon_0+O\left(\sqrt{log(T)/n}\right)\right)$ for the target domain error, where $\varepsilon_0$ is the source domain error and $n$ is the data size of each domain. Due to the exponential factor, this upper bound becomes vacuous when $T$ is only moderately large. In this work, we analyze gradual self-training under more general and relaxed assumptions, and prove a significantly improved generalization bound as $\varepsilon_0+ O \left(T\Delta + T/\sqrt{n}\right) + \widetilde{O}\left(1/\sqrt{nT}\right)$, where $\Delta$ is the average distributional distance between consecutive domains. Compared with the existing bound with an exponential dependency on $T$ as a multiplicative factor, our bound only depends on $T$ linearly and additively. Perhaps more interestingly, our result implies the existence of an optimal choice of $T$ that minimizes the generalization error, and it also naturally suggests an optimal way to construct the path of intermediate domains so as to minimize the accumulative path length $T\Delta$ between the source and target. To corroborate the implications of our theory, we examine gradual self-training on multiple semi-synthetic and real datasets, which confirms our findings. We believe our insights provide a path forward toward the design of future GDA algorithms.  ( 3 min )
    The Importance of Non-Markovianity in Maximum State Entropy Exploration. (arXiv:2202.03060v2 [cs.LG] UPDATED)
    In the maximum state entropy exploration framework, an agent interacts with a reward-free environment to learn a policy that maximizes the entropy of the expected state visitations it is inducing. Hazan et al. (2019) noted that the class of Markovian stochastic policies is sufficient for the maximum state entropy objective, and exploiting non-Markovianity is generally considered pointless in this setting. In this paper, we argue that non-Markovianity is instead paramount for maximum state entropy exploration in a finite-sample regime. Especially, we recast the objective to target the expected entropy of the induced state visitations in a single trial. Then, we show that the class of non-Markovian deterministic policies is sufficient for the introduced objective, while Markovian policies suffer non-zero regret in general. However, we prove that the problem of finding an optimal non-Markovian policy is NP-hard. Despite this negative result, we discuss avenues to address the problem in a tractable way and how non-Markovian exploration could benefit the sample efficiency of online reinforcement learning in future works.  ( 2 min )
    Towards Effective and Robust Neural Trojan Defenses via Input Filtering. (arXiv:2202.12154v4 [cs.CR] UPDATED)
    Trojan attacks on deep neural networks are both dangerous and surreptitious. Over the past few years, Trojan attacks have advanced from using only a single input-agnostic trigger and targeting only one class to using multiple, input-specific triggers and targeting multiple classes. However, Trojan defenses have not caught up with this development. Most defense methods still make inadequate assumptions about Trojan triggers and target classes, thus, can be easily circumvented by modern Trojan attacks. To deal with this problem, we propose two novel "filtering" defenses called Variational Input Filtering (VIF) and Adversarial Input Filtering (AIF) which leverage lossy data compression and adversarial learning respectively to effectively purify potential Trojan triggers in the input at run time without making assumptions about the number of triggers/target classes or the input dependence property of triggers. In addition, we introduce a new defense mechanism called "Filtering-then-Contrasting" (FtC) which helps avoid the drop in classification accuracy on clean data caused by "filtering", and combine it with VIF/AIF to derive new defenses of this kind. Extensive experimental results and ablation studies show that our proposed defenses significantly outperform well-known baseline defenses in mitigating five advanced Trojan attacks including two recent state-of-the-art while being quite robust to small amounts of training data and large-norm triggers.  ( 3 min )
    Evaluating Causal Inference Methods. (arXiv:2202.04208v3 [stat.ME] UPDATED)
    The fundamental challenge of drawing causal inference is that counterfactual outcomes are not fully observed for any unit. Furthermore, in observational studies, treatment assignment is likely to be confounded. Many statistical methods have emerged for causal inference under unconfoundedness conditions given pre-treatment covariates, including propensity score-based methods, prognostic score-based methods, and doubly robust methods. Unfortunately for applied researchers, there is no `one-size-fits-all' causal method that can perform optimally universally. In practice, causal methods are primarily evaluated quantitatively on handcrafted simulated data. Such data-generative procedures can be of limited value because they are typically stylized models of reality. They are simplified for tractability and lack the complexities of real-world data. For applied researchers, it is critical to understand how well a method performs for the data at hand. Our work introduces a deep generative model-based framework, Credence, to validate causal inference methods. The framework's novelty stems from its ability to generate synthetic data anchored at the empirical distribution for the observed sample, and therefore virtually indistinguishable from the latter. The approach allows the user to specify ground truth for the form and magnitude of causal effects and confounding bias as functions of covariates. Thus simulated data sets are used to evaluate the potential performance of various causal estimation methods when applied to data similar to the observed sample. We demonstrate Credence's ability to accurately assess the relative performance of causal estimation techniques in an extensive simulation study and two real-world data applications from Lalonde and Project STAR studies.  ( 3 min )
    Invariant Ancestry Search. (arXiv:2202.00913v2 [stat.ME] UPDATED)
    Recently, methods have been proposed that exploit the invariance of prediction models with respect to changing environments to infer subsets of the causal parents of a response variable. If the environments influence only few of the underlying mechanisms, the subset identified by invariant causal prediction (ICP), for example, may be small, or even empty. We introduce the concept of minimal invariance and propose invariant ancestry search (IAS). In its population version, IAS outputs a set which contains only ancestors of the response and is a superset of the output of ICP. When applied to data, corresponding guarantees hold asymptotically if the underlying test for invariance has asymptotic level and power. We develop scalable algorithms and perform experiments on simulated and real data.  ( 2 min )
    Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations. (arXiv:2202.10638v2 [stat.ML] UPDATED)
    Data augmentation is commonly applied to improve performance of deep learning by enforcing the knowledge that certain transformations on the input preserve the output. Currently, the used data augmentation is chosen by human effort and costly cross-validation, which makes it cumbersome to apply to new datasets. We develop a convenient gradient-based method for selecting the data augmentation without validation data and during training of a deep neural network. Our approach relies on phrasing data augmentation as an invariance in the prior distribution and learning it using Bayesian model selection, which has been shown to work in Gaussian processes, but not yet for deep neural networks. We propose a differentiable Kronecker-factored Laplace approximation to the marginal likelihood as our objective, which can be optimised without human supervision or validation data. We show that our method can successfully recover invariances present in the data, and that this improves generalisation and data efficiency on image datasets.  ( 2 min )
    GraphWorld: Fake Graphs Bring Real Insights for GNNs. (arXiv:2203.00112v2 [cs.LG] UPDATED)
    Despite advances in the field of Graph Neural Networks (GNNs), only a small number (~5) of datasets are currently used to evaluate new models. This continued reliance on a handful of datasets provides minimal insight into the performance differences between models, and is especially challenging for industrial practitioners who are likely to have datasets which look very different from those used as academic benchmarks. In the course of our work on GNN infrastructure and open-source software at Google, we have sought to develop improved benchmarks that are robust, tunable, scalable,and generalizable. In this work we introduce GraphWorld, a novel methodology and system for benchmarking GNN models on an arbitrarily-large population of synthetic graphs for any conceivable GNN task. GraphWorld allows a user to efficiently generate a world with millions of statistically diverse datasets. It is accessible, scalable, and easy to use. GraphWorld can be run on a single machine without specialized hardware, or it can be easily scaled up to run on arbitrary clusters or cloud frameworks. Using GraphWorld, a user has fine-grained control over graph generator parameters, and can benchmark arbitrary GNN models with built-in hyperparameter tuning. We present insights from GraphWorld experiments regarding the performance characteristics of tens of thousands of GNN models over millions of benchmark datasets. We further show that GraphWorld efficiently explores regions of benchmark dataset space uncovered by standard benchmarks, revealing comparisons between models that have not been historically obtainable. Using GraphWorld, we also are able to study in-detail the relationship between graph properties and task performance metrics, which is nearly impossible with the classic collection of real-world benchmarks.  ( 3 min )
    Architecture Agnostic Federated Learning for Neural Networks. (arXiv:2202.07757v3 [cs.LG] UPDATED)
    With growing concerns regarding data privacy and rapid increase in data volume, Federated Learning(FL) has become an important learning paradigm. However, jointly learning a deep neural network model in a FL setting proves to be a non-trivial task because of the complexities associated with the neural networks, such as varied architectures across clients, permutation invariance of the neurons, and presence of non-linear transformations in each layer. This work introduces a novel Federated Heterogeneous Neural Networks (FedHeNN) framework that allows each client to build a personalised model without enforcing a common architecture across clients. This allows each client to optimize with respect to local data and compute constraints, while still benefiting from the learnings of other (potentially more powerful) clients. The key idea of FedHeNN is to use the instance-level representations obtained from peer clients to guide the simultaneous training on each client. The extensive experimental results demonstrate that the FedHeNN framework is capable of learning better performing models on clients in both the settings of homogeneous and heterogeneous architectures across clients.  ( 2 min )
    Multi-Task Learning as a Bargaining Game. (arXiv:2202.01017v2 [cs.LG] UPDATED)
    In Multi-task learning (MTL), a joint model is trained to simultaneously make predictions for several tasks. Joint training reduces computation costs and improves data efficiency; however, since the gradients of these different tasks may conflict, training a joint model for MTL often yields lower performance than its corresponding single-task counterparts. A common method for alleviating this issue is to combine per-task gradients into a joint update direction using a particular heuristic. In this paper, we propose viewing the gradients combination step as a bargaining game, where tasks negotiate to reach an agreement on a joint direction of parameter update. Under certain assumptions, the bargaining problem has a unique solution, known as the Nash Bargaining Solution, which we propose to use as a principled approach to multi-task learning. We describe a new MTL optimization procedure, Nash-MTL, and derive theoretical guarantees for its convergence. Empirically, we show that Nash-MTL achieves state-of-the-art results on multiple MTL benchmarks in various domains.  ( 2 min )
    TEA: A Sequential Recommendation Framework via Temporally Evolving Aggregations. (arXiv:2111.07378v2 [cs.IR] UPDATED)
    Sequential recommendation aims to choose the most suitable items for a user at a specific timestamp given historical behaviors. Existing methods usually model the user behavior sequence based on the transition-based methods like Markov Chain. However, these methods also implicitly assume that the users are independent of each other without considering the influence between users. In fact, this influence plays an important role in sequence recommendation since the behavior of a user is easily affected by others. Therefore, it is desirable to aggregate both user behaviors and the influence between users, which are evolved temporally and involved in the heterogeneous graph of users and items. In this paper, we incorporate dynamic user-item heterogeneous graphs to propose a novel sequential recommendation framework. As a result, the historical behaviors as well as the influence between users can be taken into consideration. To achieve this, we firstly formalize sequential recommendation as a problem to estimate conditional probability given temporal dynamic heterogeneous graphs and user behavior sequences. After that, we exploit the conditional random field to aggregate the heterogeneous graphs and user behaviors for probability estimation, and employ the pseudo-likelihood approach to derive a tractable objective function. Finally, we provide scalable and flexible implementations of the proposed framework. Experimental results on three real-world datasets not only demonstrate the effectiveness of our proposed method but also provide some insightful discoveries on sequential recommendation.  ( 3 min )
    Test Sample Accuracy Scales with Training Sample Density in Neural Networks. (arXiv:2106.08365v6 [cs.LG] UPDATED)
    Intuitively, one would expect accuracy of a trained neural network's prediction on test samples to correlate with how densely the samples are surrounded by seen training samples in representation space. We find that a bound on empirical training error smoothed across linear activation regions scales inversely with training sample density in representation space. Empirically, we verify this bound is a strong predictor of the inaccuracy of the network's prediction on test samples. For unseen test sets, including those with out-of-distribution samples, ranking test samples by their local region's error bound and discarding samples with the highest bounds raises prediction accuracy by up to 20% in absolute terms for image classification datasets, on average over thresholds.  ( 2 min )
    A methodology for training homomorphicencryption friendly neural networks. (arXiv:2111.03362v3 [cs.CR] UPDATED)
    Privacy-preserving deep neural network (DNN) inference is a necessity in different regulated industries such as healthcare, finance and retail. Recently, homomorphic encryption (HE) has been used as a method to enable analytics while addressing privacy concerns. HE enables secure predictions over encrypted data. However, there are several challenges related to the use of HE, including DNN size limitations and the lack of support for some operation types. Most notably, the commonly used ReLU activation is not supported under some HE schemes. We propose a structured methodology to replace ReLU with a quadratic polynomial activation. To address the accuracy degradation issue, we use a pre-trained model that trains another HE-friendly model, using techniques such as trainable activation functions and knowledge distillation. We demonstrate our methodology on the AlexNet architecture, using the chest X-Ray and CT datasets for COVID-19 detection. Experiments using our approach reduced the gap between the F1 score and accuracy of the models trained with ReLU and the HE-friendly model to within a mere 0.32-5.3 percent degradation. We also demonstrate our methodology using the SqueezeNet architecture, for which we observed 7 percent accuracy and F1 improvements over training similar networks with other HE-friendly training methods.  ( 3 min )
    DeepSplit: Scalable Verification of Deep Neural Networks via Operator Splitting. (arXiv:2106.09117v3 [cs.LG] UPDATED)
    Analyzing the worst-case performance of deep neural networks against input perturbations amounts to solving a large-scale non-convex optimization problem, for which several past works have proposed convex relaxations as a promising alternative. However, even for reasonably-sized neural networks, these relaxations are not tractable, and so must be replaced by even weaker relaxations in practice. In this work, we propose a novel operator splitting method that can directly solve a convex relaxation of the problem to high accuracy, by splitting it into smaller sub-problems that often have analytical solutions. The method is modular, scales to very large problem instances, and compromises operations that are amenable to fast parallelization with GPU acceleration. We demonstrate our method in bounding the worst-case performance of large convolutional networks in image classification and reinforcement learning settings, and in reachability analysis of neural network dynamical systems.  ( 2 min )
    Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning. (arXiv:2112.08932v2 [cs.LG] UPDATED)
    Effective exploration continues to be a significant challenge that prevents the deployment of reinforcement learning for many physical systems. This is particularly true for systems with continuous and high-dimensional state and action spaces, such as robotic manipulators. The challenge is accentuated in the sparse rewards setting, where the low-level state information required for the design of dense rewards is unavailable. Adversarial imitation learning (AIL) can partially overcome this barrier by leveraging expert-generated demonstrations of optimal behaviour and providing, essentially, a replacement for dense reward information. Unfortunately, the availability of expert demonstrations does not necessarily improve an agent's capability to explore effectively and, as we empirically show, can lead to inefficient or stagnated learning. We present Learning from Guided Play (LfGP), a framework in which we leverage expert demonstrations of, in addition to a main task, multiple auxiliary tasks. Subsequently, a hierarchical model is used to learn each task reward and policy through a modified AIL procedure, in which exploration of all tasks is enforced via a scheduler composing different tasks together. This affords many benefits: learning efficiency is improved for main tasks with challenging bottleneck transitions, expert data becomes reusable between tasks, and transfer learning through the reuse of learned auxiliary task models becomes possible. Our experimental results in a challenging multitask robotic manipulation domain indicate that our method compares favourably to supervised imitation learning and to a state-of-the-art AIL method. Code is available at https://github.com/utiasSTARS/lfgp.  ( 3 min )
    Deep Neural Networks for Rank-Consistent Ordinal Regression Based On Conditional Probabilities. (arXiv:2111.08851v3 [cs.LG] UPDATED)
    In recent times, deep neural networks achieved outstanding predictive performance on various classification and pattern recognition tasks. However, many real-world prediction problems have ordinal response variables, and this ordering information is ignored by conventional classification losses such as the multi-category cross-entropy. Ordinal regression methods for deep neural networks address this. One such method is the CORAL method, which is based on an earlier binary label extension framework and achieves rank consistency among its output layer tasks by imposing a weight-sharing constraint. However, while earlier experiments showed that CORAL's rank consistency is beneficial for performance, {it is limited by a weight-sharing constraint in a neural network's fully connected output layer. We propose a new method for rank-consistent ordinal regression without this limitation. Our rank-consistent ordinal regression framework (CORN) achieves rank consistency by a novel training scheme. This training scheme uses} conditional training sets to obtain the unconditional rank probabilities through applying the chain rule for conditional probability distributions. Experiments on various datasets demonstrate the efficacy of the proposed method to utilize the ordinal target information, and the absence of the weight-sharing restriction improves the performance substantially compared to the CORAL reference approach.  ( 3 min )
    Supervising the Decoder of Variational Autoencoders to Improve Scientific Utility. (arXiv:2109.04561v3 [stat.ML] UPDATED)
    Probabilistic generative models are attractive for scientific modeling because their inferred parameters can be used to generate hypotheses and design experiments. This requires that the learned model provide an accurate representation of the input data and yield a latent space that effectively predicts outcomes relevant to the scientific question. Supervised Variational Autoencoders (SVAEs) have previously been used for this purpose, where a carefully designed decoder can be used as an interpretable generative model while the supervised objective ensures a predictive latent representation. Unfortunately, the supervised objective forces the encoder to learn a biased approximation to the generative posterior distribution, which renders the generative parameters unreliable when used in scientific models. This issue has remained undetected as reconstruction losses commonly used to evaluate model performance do not detect bias in the encoder. We address this previously-unreported issue by developing a second order supervision framework (SOS-VAE) that influences the decoder to induce a predictive latent representation. This ensures that the associated encoder maintains a reliable generative interpretation. We extend this technique to allow the user to trade-off some bias in the generative parameters for improved predictive performance, acting as an intermediate option between SVAEs and our new SOS-VAE. We also use this methodology to address missing data issues that often arise when combining recordings from multiple scientific experiments. We demonstrate the effectiveness of these developments using synthetic data and electrophysiological recordings with an emphasis on how our learned representations can be used to design scientific experiments.  ( 3 min )
    On Improving the Performance of Glitch Classification for Gravitational Wave Detection by using Generative Adversarial Networks. (arXiv:2207.04001v1 [astro-ph.HE])
    Spectrogram classification plays an important role in analyzing gravitational wave data. In this paper, we propose a framework to improve the classification performance by using Generative Adversarial Networks (GANs). As substantial efforts and expertise are required to annotate spectrograms, the number of training examples is very limited. However, it is well known that deep networks can perform well only when the sample size of the training set is sufficiently large. Furthermore, the imbalanced sample sizes in different classes can also hamper the performance. In order to tackle these problems, we propose a GAN-based data augmentation framework. While standard data augmentation methods for conventional images cannot be applied on spectrograms, we found that a variant of GANs, ProGAN, is capable of generating high-resolution spectrograms which are consistent with the quality of the high-resolution original images and provide a desirable diversity. We have validated our framework by classifying glitches in the {\it Gravity Spy} dataset with the GAN-generated spectrograms for training. We show that the proposed method can provide an alternative to transfer learning for the classification of spectrograms using deep networks, i.e. using a high-resolution GAN for data augmentation instead. Furthermore, fluctuations in classification performance with small sample sizes for training and evaluation can be greatly reduced. Using the trained network in our framework, we have also examined the spectrograms with label anomalies in {\it Gravity Spy}.  ( 3 min )
    BF++: a language for general-purpose program synthesis. (arXiv:2101.09571v6 [cs.AI] UPDATED)
    Most state of the art decision systems based on Reinforcement Learning (RL) are data-driven black-box neural models, where it is often difficult to incorporate expert knowledge into the models or let experts review and validate the learned decision mechanisms. Knowledge-insertion and model review are important requirements in many applications involving human health and safety. One way to bridge the gap between data and knowledge driven systems is program synthesis: replacing a neural network that outputs decisions with a symbolic program generated by a neural network or by means of genetic programming. We propose a new programming language, BF++, designed specifically for automatic programming of agents in a Partially Observable Markov Decision Process (POMDP) setting and apply neural program synthesis to solve standard OpenAI Gym benchmarks.  ( 2 min )
    Fair Exploration via Axiomatic Bargaining. (arXiv:2106.02553v2 [cs.LG] UPDATED)
    Exploration is often necessary in online learning to maximize long-term reward, but it comes at the cost of short-term 'regret'. We study how this cost of exploration is shared across multiple groups. For example, in a clinical trial setting, patients who are assigned a sub-optimal treatment effectively incur the cost of exploration. When patients are associated with natural groups on the basis of, say, race or age, it is natural to ask whether the cost of exploration borne by any single group is 'fair'. So motivated, we introduce the 'grouped' bandit model. We leverage the theory of axiomatic bargaining, and the Nash bargaining solution in particular, to formalize what might constitute a fair division of the cost of exploration across groups. On the one hand, we show that any regret-optimal policy strikingly results in the least fair outcome: such policies will perversely leverage the most 'disadvantaged' groups when they can. More constructively, we derive policies that are optimally fair and simultaneously enjoy a small 'price of fairness'. We illustrate the relative merits of our algorithmic framework with a case study on contextual bandits for warfarin dosing where we are concerned with the cost of exploration across multiple races and age groups.  ( 3 min )
    Seeing All the Angles: Learning Multiview Manipulation Policies for Contact-Rich Tasks from Demonstrations. (arXiv:2104.13907v3 [cs.RO] UPDATED)
    Learned visuomotor policies have shown considerable success as an alternative to traditional, hand-crafted frameworks for robotic manipulation. Surprisingly, an extension of these methods to the multiview domain is relatively unexplored. A successful multiview policy could be deployed on a mobile manipulation platform, allowing the robot to complete a task regardless of its view of the scene. In this work, we demonstrate that a multiview policy can be found through imitation learning by collecting data from a variety of viewpoints. We illustrate the general applicability of the method by learning to complete several challenging multi-stage and contact-rich tasks, from numerous viewpoints, both in a simulated environment and on a real mobile manipulation platform. Furthermore, we analyze our policies to determine the benefits of learning from multiview data compared to learning with data collected from a fixed perspective. We show that learning from multiview data results in little, if any, penalty to performance for a fixed-view task compared to learning with an equivalent amount of fixed-view data. Finally, we examine the visual features learned by the multiview and fixed-view policies. Our results indicate that multiview policies implicitly learn to identify spatially correlated features.  ( 3 min )
    Greedy Bayesian Posterior Approximation with Deep Ensembles. (arXiv:2105.14275v4 [cs.LG] UPDATED)
    Ensembles of independently trained neural networks are a state-of-the-art approach to estimate predictive uncertainty in Deep Learning, and can be interpreted as an approximation of the posterior distribution via a mixture of delta functions. The training of ensembles relies on non-convexity of the loss landscape and random initialization of their individual members, making the resulting posterior approximation uncontrolled. This paper proposes a novel and principled method to tackle this limitation, minimizing an $f$-divergence between the true posterior and a kernel density estimator (KDE) in a function space. We analyze this objective from a combinatorial point of view, and show that it is submodular with respect to mixture components for any $f$. Subsequently, we consider the problem of greedy ensemble construction. From the marginal gain on the negative $f$-divergence, which quantifies an improvement in posterior approximation yielded by adding a new component into the KDE, we derive a novel diversity term for ensemble methods. The performance of our approach is demonstrated on computer vision out-of-distribution detection benchmarks in a range of architectures trained on multiple datasets. The source code of our method is made publicly available at https://github.com/Oulu-IMEDS/greedy_ensembles_training.  ( 3 min )
    Feature Selection Methods for Uplift Modeling and Heterogeneous Treatment Effect. (arXiv:2005.03447v2 [cs.LG] UPDATED)
    Uplift modeling is a causal learning technique that estimates subgroup-level treatment effects. It is commonly used in industry and elsewhere for tasks such as targeting ads. In a typical setting, uplift models can take thousands of features as inputs, which is costly and results in problems such as overfitting and poor model interpretability. Consequently, there is a need to select a subset of the most important features for modeling. However, traditional methods for doing feature selection are not fit for the task because they are designed for standard machine learning models whose target is importantly different from uplift models. To address this, we introduce a set of feature selection methods explicitly designed for uplift modeling, drawing inspiration from statistics and information theory. We conduct empirical evaluations on the proposed methods on publicly available datasets, demonstrating the advantages of the proposed methods compared to traditional feature selection. We make the proposed methods publicly available as a part of the CausalML open-source package.  ( 2 min )
    Unpaired Single-Image Depth Synthesis with cycle-consistent Wasserstein GANs. (arXiv:2103.16938v3 [cs.CV] UPDATED)
    Real-time estimation of actual environment depth is an essential module for various autonomous system tasks such as localization, obstacle detection and pose estimation. During the last decade of machine learning, extensive deployment of deep learning methods to computer vision tasks yielded successful approaches for realistic depth synthesis out of a simple RGB modality. While most of these models rest on paired depth data or availability of video sequences and stereo images, there is a lack of methods facing single-image depth synthesis in an unsupervised manner. Therefore, in this study, latest advancements in the field of generative neural networks are leveraged to fully unsupervised single-image depth synthesis. To be more exact, two cycle-consistent generators for RGB-to-depth and depth-to-RGB transfer are implemented and simultaneously optimized using the Wasserstein-1 distance. To ensure plausibility of the proposed method, we apply the models to a self acquised industrial data set as well as to the renown NYU Depth v2 data set, which allows comparison with existing approaches. The observed success in this study suggests high potential for unpaired single-image depth estimation in real world applications.  ( 3 min )
    Combining Machine Learning and Effective Feature Selection for Real-time Stock Trading in Variable Time-frames. (arXiv:2107.13148v2 [q-fin.TR] UPDATED)
    The unpredictability and volatility of the stock market render it challenging to make a substantial profit using any generalised scheme. Many previous studies tried different techniques to build a machine learning model, which can make a significant profit in the US stock market by performing live trading. However, very few studies have focused on the importance of finding the best features for a particular trading period. Our top approach used the performance to narrow down the features from a total of 148 to about 30. Furthermore, the top 25 features were dynamically selected before each time training our machine learning model. It uses ensemble learning with four classifiers: Gaussian Naive Bayes, Decision Tree, Logistic Regression with L1 regularization, and Stochastic Gradient Descent, to decide whether to go long or short on a particular stock. Our best model performed daily trade between July 2011 and January 2019, generating 54.35% profit. Finally, our work showcased that mixtures of weighted classifiers perform better than any individual predictor of making trading decisions in the stock market.  ( 3 min )
    Layer Adaptive Node Selection in Bayesian Neural Networks: Statistical Guarantees and Implementation Details. (arXiv:2108.11000v2 [stat.ML] UPDATED)
    Sparse deep neural networks have proven to be efficient for predictive model building in large-scale studies. Although several works have studied theoretical and numerical properties of sparse neural architectures, they have primarily focused on the edge selection. Sparsity through edge selection might be intuitively appealing; however, it does not necessarily reduce the structural complexity of a network. Instead pruning excessive nodes leads to a structurally sparse network with significant computational speedup during inference. To this end, we propose a Bayesian sparse solution using spike-and-slab Gaussian priors to allow for automatic node selection during training. The use of spike-and-slab prior alleviates the need of an ad-hoc thresholding rule for pruning. In addition, we adopt a variational Bayes approach to circumvent the computational challenges of traditional Markov Chain Monte Carlo (MCMC) implementation. In the context of node selection, we establish the fundamental result of variational posterior consistency together with the characterization of prior parameters. In contrast to the previous works, our theoretical development relaxes the assumptions of the equal number of nodes and uniform bounds on all network weights, thereby accommodating sparse networks with layer-dependent node structures or coefficient bounds. With a layer-wise characterization of prior inclusion probabilities, we discuss the optimal contraction rates of the variational posterior. We empirically demonstrate that our proposed approach outperforms the edge selection method in computational complexity with similar or better predictive performance. Our experimental evidence further substantiates that our theoretical work facilitates layer-wise optimal node recovery.  ( 3 min )
    Neighbors From Hell: Voltage Attacks Against Deep Learning Accelerators on Multi-Tenant FPGAs. (arXiv:2012.07242v2 [cs.CR] UPDATED)
    Field-programmable gate arrays (FPGAs) are becoming widely used accelerators for a myriad of datacenter applications due to their flexibility and energy efficiency. Among these applications, FPGAs have shown promising results in accelerating low-latency real-time deep learning (DL) inference, which is becoming an indispensable component of many end-user applications. With the emerging research direction towards virtualized cloud FPGAs that can be shared by multiple users, the security aspect of FPGA-based DL accelerators requires careful consideration. In this work, we evaluate the security of DL accelerators against voltage-based integrity attacks in a multitenant FPGA scenario. We first demonstrate the feasibility of such attacks on a state-of-the-art Stratix 10 card using different attacker circuits that are logically and physically isolated in a separate attacker role, and cannot be flagged as malicious circuits by conventional bitstream checkers. We show that aggressive clock gating, an effective power-saving technique, can also be a potential security threat in modern FPGAs. Then, we carry out the attack on a DL accelerator running ImageNet classification in the victim role to evaluate the inherent resilience of DL models against timing faults induced by the adversary. We find that even when using the strongest attacker circuit, the prediction accuracy of the DL accelerator is not compromised when running at its safe operating frequency. Furthermore, we can achieve 1.18-1.31x higher inference performance by over-clocking the DL accelerator without affecting its prediction accuracy.  ( 3 min )
    On the representation and learning of monotone triangular transport maps. (arXiv:2009.10303v2 [stat.ML] UPDATED)
    Transportation of measure provides a versatile approach for modeling complex probability distributions, with applications in density estimation, Bayesian inference, generative modeling, and beyond. Monotone triangular transport maps$\unicode{x2014}$approximations of the Knothe$\unicode{x2013}$Rosenblatt (KR) rearrangement$\unicode{x2014}$are a canonical choice for these tasks. Yet the representation and parameterization of such maps have a significant impact on their generality and expressiveness, and on properties of the optimization problem that arises in learning a map from data (e.g., via maximum likelihood estimation). We present a general framework for representing monotone triangular maps via invertible transformations of smooth functions. We establish conditions on the transformation such that the associated infinite-dimensional minimization problem has no spurious local minima, i.e., all local minima are global minima; and we show for target distributions satisfying certain tail conditions that the unique global minimizer corresponds to the KR map. Given a sample from the target, we then propose an adaptive algorithm that estimates a sparse semi-parametric approximation of the underlying KR map. We demonstrate how this framework can be applied to joint and conditional density estimation, likelihood-free inference, and structure learning of directed graphical models, with stable generalization performance across a range of sample sizes.  ( 3 min )
    Interlocking Backpropagation: Improving depthwise model-parallelism. (arXiv:2010.04116v3 [cs.LG] UPDATED)
    The number of parameters in state of the art neural networks has drastically increased in recent years. This surge of interest in large scale neural networks has motivated the development of new distributed training strategies enabling such models. One such strategy is model-parallel distributed training. Unfortunately, model-parallelism can suffer from poor resource utilisation, which leads to wasted resources. In this work, we improve upon recent developments in an idealised model-parallel optimisation setting: local learning. Motivated by poor resource utilisation in the global setting and poor task performance in the local setting, we introduce a class of intermediary strategies between local and global learning referred to as interlocking backpropagation. These strategies preserve many of the compute-efficiency advantages of local optimisation, while recovering much of the task performance achieved by global optimisation. We assess our strategies on both image classification ResNets and Transformer language models, finding that our strategy consistently out-performs local learning in terms of task performance, and out-performs global learning in training efficiency.  ( 2 min )
    Bayesian Quantile and Expectile Optimisation. (arXiv:2001.04833v2 [stat.ML] UPDATED)
    Bayesian optimisation (BO) is widely used to optimise stochastic black box functions. While most BO approaches focus on optimising conditional expectations, many applications require risk-averse strategies and alternative criteria accounting for the distribution tails need to be considered. In this paper, we propose new variational models for Bayesian quantile and expectile regression that are well-suited for heteroscedastic noise settings. Our models consist of two latent Gaussian processes accounting respectively for the conditional quantile (or expectile) and the scale parameter of an asymmetric likelihood functions. Furthermore, we propose two BO strategies based on max-value entropy search and Thompson sampling, that are tailored to such models and that can accommodate large batches of points. Contrary to existing BO approaches for risk-averse optimisation, our strategies can directly optimise for the quantile and expectile, without requiring replicating observations or assuming a parametric form for the noise. As illustrated in the experimental section, the proposed approach clearly outperforms the state of the art in the heteroscedastic, non-Gaussian case.  ( 2 min )
    ElectroLens: Understanding Atomistic Simulations Through Spatially-resolved Visualization of High-dimensional Features. (arXiv:1908.08381v3 [cs.HC] UPDATED)
    In recent years, machine learning (ML) has gained significant popularity in the field of chemical informatics and electronic structure theory. These techniques often require researchers to engineer abstract "features" that encode chemical concepts into a mathematical form compatible with the input to machine-learning models. However, there is no existing tool to connect these abstract features back to the actual chemical system, making it difficult to diagnose failures and to build intuition about the meaning of the features. We present ElectroLens, a new visualization tool for high-dimensional spatially-resolved features to tackle this problem. The tool visualizes high-dimensional data sets for atomistic and electron environment features by a series of linked 3D views and 2D plots. The tool is able to connect different derived features and their corresponding regions in 3D via interactive selection. It is built to be scalable, and integrate with existing infrastructure.  ( 2 min )
    The Harvard USPTO Patent Dataset: A Large-Scale, Well-Structured, and Multi-Purpose Corpus of Patent Applications. (arXiv:2207.04043v1 [cs.CL])
    Innovation is a major driver of economic and social development, and information about many kinds of innovation is embedded in semi-structured data from patents and patent applications. Although the impact and novelty of innovations expressed in patent data are difficult to measure through traditional means, ML offers a promising set of techniques for evaluating novelty, summarizing contributions, and embedding semantics. In this paper, we introduce the Harvard USPTO Patent Dataset (HUPD), a large-scale, well-structured, and multi-purpose corpus of English-language patent applications filed to the United States Patent and Trademark Office (USPTO) between 2004 and 2018. With more than 4.5 million patent documents, HUPD is two to three times larger than comparable corpora. Unlike previously proposed patent datasets in NLP, HUPD contains the inventor-submitted versions of patent applications--not the final versions of granted patents--thereby allowing us to study patentability at the time of filing using NLP methods for the first time. It is also novel in its inclusion of rich structured metadata alongside the text of patent filings: By providing each application's metadata along with all of its text fields, the dataset enables researchers to perform new sets of NLP tasks that leverage variation in structured covariates. As a case study on the types of research HUPD makes possible, we introduce a new task to the NLP community--namely, binary classification of patent decisions. We additionally show the structured metadata provided in the dataset enables us to conduct explicit studies of concept shifts for this task. Finally, we demonstrate how HUPD can be used for three additional tasks: multi-class classification of patent subject areas, language modeling, and summarization.  ( 3 min )
    MACFE: A Meta-learning and Causality Based Feature Engineering Framework. (arXiv:2207.04010v1 [cs.LG])
    Feature engineering has become one of the most important steps to improve model prediction performance, and to produce quality datasets. However, this process requires non-trivial domain-knowledge which involves a time-consuming process. Thereby, automating such process has become an active area of research and of interest in industrial applications. In this paper, a novel method, called Meta-learning and Causality Based Feature Engineering (MACFE), is proposed; our method is based on the use of meta-learning, feature distribution encoding, and causality feature selection. In MACFE, meta-learning is used to find the best transformations, then the search is accelerated by pre-selecting "original" features given their causal relevance. Experimental evaluations on popular classification datasets show that MACFE can improve the prediction performance across eight classifiers, outperforms the current state-of-the-art methods in average by at least 6.54%, and obtains an improvement of 2.71% over the best previous works.  ( 2 min )
    Implicit Bias of Gradient Descent on Reparametrized Models: On Equivalence to Mirror Descent. (arXiv:2207.04036v1 [cs.LG])
    As part of the effort to understand implicit bias of gradient descent in overparametrized models, several results have shown how the training trajectory on the overparametrized model can be understood as mirror descent on a different objective. The main result here is a characterization of this phenomenon under a notion termed commuting parametrization, which encompasses all the previous results in this setting. It is shown that gradient flow with any commuting parametrization is equivalent to continuous mirror descent with a related Legendre function. Conversely, continuous mirror descent with any Legendre function can be viewed as gradient flow with a related commuting parametrization. The latter result relies upon Nash's embedding theorem.  ( 2 min )
    Communication Acceleration of Local Gradient Methods via an Accelerated Primal-Dual Algorithm with Inexact Prox. (arXiv:2207.03957v1 [cs.LG])
    Inspired by a recent breakthrough of Mishchenko et al (2022), who for the first time showed that local gradient steps can lead to provable communication acceleration, we propose an alternative algorithm which obtains the same communication acceleration as their method (ProxSkip). Our approach is very different, however: it is based on the celebrated method of Chambolle and Pock (2011), with several nontrivial modifications: i) we allow for an inexact computation of the prox operator of a certain smooth strongly convex function via a suitable gradient-based method (e.g., GD, Fast GD or FSFOM), ii) we perform a careful modification of the dual update step in order to retain linear convergence. Our general results offer the new state-of-the-art rates for the class of strongly convex-concave saddle-point problems with bilinear coupling characterized by the absence of smoothness in the dual function. When applied to federated learning, we obtain a theoretically better alternative to ProxSkip: our method requires fewer local steps ($O(\kappa^{1/3})$ or $O(\kappa^{1/4})$, compared to $O(\kappa^{1/2})$ of ProxSkip), and performs a deterministic number of local steps instead. Like ProxSkip, our method can be applied to optimization over a connected network, and we obtain theoretical improvements here as well.  ( 3 min )
    Predicting Opinion Dynamics via Sociologically-Informed Neural Networks. (arXiv:2207.03990v1 [cs.SI])
    Opinion formation and propagation are crucial phenomena in social networks and have been extensively studied across several disciplines. Traditionally, theoretical models of opinion dynamics have been proposed to describe the interactions between individuals (i.e., social interaction) and their impact on the evolution of collective opinions. Although these models can incorporate sociological and psychological knowledge on the mechanisms of social interaction, they demand extensive calibration with real data to make reliable predictions, requiring much time and effort. Recently, the widespread use of social media platforms provides new paradigms to learn deep learning models from a large volume of social media data. However, these methods ignore any scientific knowledge about the mechanism of social interaction. In this work, we present the first hybrid method called Sociologically-Informed Neural Network (SINN), which integrates theoretical models and social media data by transporting the concepts of physics-informed neural networks (PINNs) from natural science (i.e., physics) into social science (i.e., sociology and social psychology). In particular, we recast theoretical models as ordinary differential equations (ODEs). Then we train a neural network that simultaneously approximates the data and conforms to the ODEs that represent the social scientific knowledge. In addition, we extend PINNs by integrating matrix factorization and a language model to incorporate rich side information (e.g., user profiles) and structural knowledge (e.g., cluster structure of the social interaction network). Moreover, we develop an end-to-end training procedure for SINN, which involves Gumbel-Softmax approximation to include stochastic mechanisms of social interaction. Extensive experiments on real-world and synthetic datasets show SINN outperforms six baseline methods in predicting opinion dynamics.  ( 3 min )
    A law of adversarial risk, interpolation, and label noise. (arXiv:2207.03933v1 [stat.ML])
    In supervised learning, it has been shown that label noise in the data can be interpolated without penalties on test accuracy under many circumstances. We show that interpolating label noise induces adversarial vulnerability, and prove the first theorem showing the dependence of label noise and adversarial risk in terms of the data distribution. Our results are almost sharp without accounting for the inductive bias of the learning algorithm. We also show that inductive bias makes the effect of label noise much stronger.  ( 2 min )
    Active Learning-based Isolation Forest (ALIF): Enhancing Anomaly Detection in Decision Support Systems. (arXiv:2207.03934v1 [cs.LG])
    The detection of anomalous behaviours is an emerging need in many applications, particularly in contexts where security and reliability are critical aspects. While the definition of anomaly strictly depends on the domain framework, it is often impractical or too time consuming to obtain a fully labelled dataset. The use of unsupervised models to overcome the lack of labels often fails to catch domain specific anomalies as they rely on general definitions of outlier. This paper suggests a new active learning based approach, ALIF, to solve this problem by reducing the number of required labels and tuning the detector towards the definition of anomaly provided by the user. The proposed approach is particularly appealing in the presence of a Decision Support System (DSS), a case that is increasingly popular in real-world scenarios. While it is common that DSS embedded with anomaly detection capabilities rely on unsupervised models, they don't have a way to improve their performance: ALIF is able to enhance the capabilities of DSS by exploiting the user feedback during common operations. ALIF is a lightweight modification of the popular Isolation Forest that proved superior performances with respect to other state-of-art algorithms in a multitude of real anomaly detection datasets.  ( 2 min )
    Generalization-Memorization Machines. (arXiv:2207.03976v1 [cs.LG])
    Classifying the training data correctly without over-fitting is one of the goals in machine learning. In this paper, we propose a generalization-memorization mechanism, including a generalization-memorization decision and a memory modeling principle. Under this mechanism, error-based learning machines improve their memorization abilities of training data without over-fitting. Specifically, the generalization-memorization machines (GMM) are proposed by applying this mechanism. The optimization problems in GMM are quadratic programming problems and could be solved efficiently. It should be noted that the recently proposed generalization-memorization kernel and the corresponding support vector machines are the special cases of our GMM. Experimental results show the effectiveness of the proposed GMM both on memorization and generalization.  ( 2 min )
    Black and Gray Box Learning of Amplitude Equations: Application to Phase Field Systems. (arXiv:2207.03954v1 [stat.ML])
    We present a data-driven approach to learning surrogate models for amplitude equations, and illustrate its application to interfacial dynamics of phase field systems. In particular, we demonstrate learning effective partial differential equations describing the evolution of phase field interfaces from full phase field data. We illustrate this on a model phase field system, where analytical approximate equations for the dynamics of the phase field interface (a higher order eikonal equation and its approximation, the Kardar-Parisi-Zhang (KPZ) equation) are known. For this system, we discuss data-driven approaches for the identification of equations that accurately describe the front interface dynamics. When the analytical approximate models mentioned above become inaccurate, as we move beyond the region of validity of the underlying assumptions, the data-driven equations outperform them. In these regimes, going beyond black-box identification, we explore different approaches to learn data-driven corrections to the analytically approximate models, leading to effective gray box partial differential equations.  ( 2 min )
    Learning with Muscles: Benefits for Data-Efficiency and Robustness in Anthropomorphic Tasks. (arXiv:2207.03952v1 [cs.RO])
    Humans are able to outperform robots in terms of robustness, versatility, and learning of new tasks in a wide variety of movements. We hypothesize that highly nonlinear muscle dynamics play a large role in providing inherent stability, which is favorable to learning. While recent advances have been made in applying modern learning techniques to muscle-actuated systems both in simulation as well as in robotics, so far, no detailed analysis has been performed to show the benefits of muscles in this setting. Our study closes this gap by investigating core robotics challenges and comparing the performance of different actuator morphologies in terms of data-efficiency, hyperparameter sensitivity, and robustness.  ( 2 min )
    High Performance Simulation for Scalable Multi-Agent Reinforcement Learning. (arXiv:2207.03945v1 [cs.MA])
    Multi-agent reinforcement learning experiments and open-source training environments are typically limited in scale, supporting tens or sometimes up to hundreds of interacting agents. In this paper we demonstrate the use of Vogue, a high performance agent based model (ABM) framework. Vogue serves as a multi-agent training environment, supporting thousands to tens of thousands of interacting agents while maintaining high training throughput by running both the environment and reinforcement learning (RL) agents on the GPU. High performance multi-agent environments at this scale have the potential to enable the learning of robust and flexible policies for use in ABMs and simulations of complex systems. We demonstrate training performance with two newly developed, large scale multi-agent training environments. Moreover, we show that these environments can train shared RL policies on time-scales of minutes and hours.  ( 2 min )
    ControlBurn: Nonlinear Feature Selection with Sparse Tree Ensembles. (arXiv:2207.03935v1 [stat.ML])
    ControlBurn is a Python package to construct feature-sparse tree ensembles that support nonlinear feature selection and interpretable machine learning. The algorithms in this package first build large tree ensembles that prioritize basis functions with few features and then select a feature-sparse subset of these basis functions using a weighted lasso optimization criterion. The package includes visualizations to analyze the features selected by the ensemble and their impact on predictions. Hence ControlBurn offers the accuracy and flexibility of tree-ensemble models and the interpretability of sparse generalized additive models. ControlBurn is scalable and flexible: for example, it can use warm-start continuation to compute the regularization path (prediction error for any number of selected features) for a dataset with tens of thousands of samples and hundreds of features in seconds. For larger datasets, the runtime scales linearly in the number of samples and features (up to a log factor), and the package support acceleration using sketching. Moreover, the ControlBurn framework accommodates feature costs, feature groupings, and $\ell_0$-based regularizers. The package is user-friendly and open-source: its documentation and source code appear on https://pypi.org/project/ControlBurn/ and https://github.com/udellgroup/controlburn/.  ( 2 min )
    Memory-free Online Change-point Detection: A Novel Neural Network Approach. (arXiv:2207.03932v1 [cs.LG])
    Change-point detection (CPD), which detects abrupt changes in the data distribution, is recognized as one of the most significant tasks in time series analysis. Despite the extensive literature on offline CPD, unsupervised online CPD still suffers from major challenges, including scalability, hyperparameter tuning, and learning constraints. To mitigate some of these challenges, in this paper, we propose a novel deep learning approach for unsupervised online CPD from multi-dimensional time series, named Adaptive LSTM-Autoencoder Change-Point Detection (ALACPD). ALACPD exploits an LSTM-autoencoder-based neural network to perform unsupervised online CPD. It continuously adapts to the incoming samples without keeping the previously received input, thus being memory-free. We perform an extensive evaluation on several real-world time series CPD benchmarks. We show that ALACPD, on average, ranks first among state-of-the-art CPD algorithms in terms of quality of the time series segmentation, and it is on par with the best performer in terms of the accuracy of the estimated change-points. The implementation of ALACPD is available online on Github\footnote{\url{https://github.com/zahraatashgahi/ALACPD}}.  ( 2 min )
    Generative Adversarial Networks and Other Generative Models. (arXiv:2207.03887v1 [cs.CV])
    Generative networks are fundamentally different in their aim and methods compared to CNNs for classification, segmentation, or object detection. They have initially not been meant to be an image analysis tool, but to produce naturally looking images. The adversarial training paradigm has been proposed to stabilize generative methods, and has proven to be highly successful -- though by no means from the first attempt. This chapter gives a basic introduction into the motivation for Generative Adversarial Networks (GANs) and traces the path of their success by abstracting the basic task and working mechanism, and deriving the difficulty of early practical approaches. Methods for a more stable training will be shown, and also typical signs for poor convergence and their reasons. Though this chapter focuses on GANs that are meant for image generation and image analysis, the adversarial training paradigm itself is not specific to images, and also generalizes to tasks in image analysis. Examples of architectures for image semantic segmentation and abnormality detection will be acclaimed, before contrasting GANs with further generative modeling approaches lately entering the scene. This will allow a contextualized view on the limits but also benefits of GANs.  ( 2 min )
    Storehouse: a Reinforcement Learning Environment for Optimizing Warehouse Management. (arXiv:2207.03851v1 [cs.LG])
    Warehouse Management Systems have been evolving and improving thanks to new Data Intelligence techniques. However, many current optimizations have been applied to specific cases or are in great need of manual interaction. Here is where Reinforcement Learning techniques come into play, providing automatization and adaptability to current optimization policies. In this paper, we present Storehouse, a customizable environment that generalizes the definition of warehouse simulations for Reinforcement Learning. We also validate this environment against state-of-the-art reinforcement learning algorithms and compare these results to human and random policies.  ( 2 min )
    Towards Semantic Communication Protocols: A Probabilistic Logic Perspective. (arXiv:2207.03920v1 [cs.IT])
    Classical medium access control (MAC) protocols are interpretable, yet their task-agnostic control signaling messages (CMs) are ill-suited for emerging mission-critical applications. By contrast, neural network (NN) based protocol models (NPMs) learn to generate task-specific CMs, but their rationale and impact lack interpretability. To fill this void, in this article we propose, for the first time, a semantic protocol model (SPM) constructed by transforming an NPM into an interpretable symbolic graph written in the probabilistic logic programming language (ProbLog). This transformation is viable by extracting and merging common CMs and their connections while treating the NPM as a CM generator. By extensive simulations, we corroborate that the SPM tightly approximates its original NPM while occupying only 0.02% memory. By leveraging its interpretability and memory-efficiency, we demonstrate several SPM-enabled applications such as SPM reconfiguration for collision-avoidance, as well as comparing different SPMs via semantic entropy calculation and storing multiple SPMs to cope with non-stationary environments.  ( 2 min )
    Constrained Training of Neural Networks via Theorem Proving. (arXiv:2207.03880v1 [cs.AI])
    We introduce a theorem proving approach to the specification and generation of temporal logical constraints for training neural networks. We formalise a deep embedding of linear temporal logic over finite traces (LTL$_f$) and an associated evaluation function characterising its semantics within the higher-order logic of the Isabelle theorem prover. We then proceed to formalise a loss function $\mathcal{L}$ that we formally prove to be sound, and differentiable to a function $d\mathcal{L}$. We subsequently use Isabelle's automatic code generation mechanism to produce OCaml versions of LTL$_f$, $\mathcal{L}$ and $d\mathcal{L}$ that we integrate with PyTorch via OCaml bindings for Python. We show that, when used for training in an existing deep learning framework for dynamic movement, our approach produces expected results for common movement specification patterns such as obstacle avoidance and patrolling. The distinctive benefit of our approach is the fully rigorous method for constrained training, eliminating many of the risks inherent to ad-hoc implementations of logical aspects directly in an "unsafe" programming language such as Python.  ( 2 min )
    Ensemble random forest filter: An alternative to the ensemble Kalman filter for inverse modeling. (arXiv:2207.03909v1 [cs.LG])
    The ensemble random forest filter (ERFF) is presented as an alternative to the ensemble Kalman filter (EnKF) for the purpose of inverse modeling. The EnKF is a data assimilation approach that forecasts and updates parameter estimates sequentially in time as observations are being collected. The updating step is based on the experimental covariances computed from an ensemble of realizations and the updates are given as linear combinations of the differences between observations and forecasted system state values. The ERFF replaces the linear combination in the update step with a non-linear function represented by a random forest. In this way, the non-linear relationships between the parameters to be updated and the observations can be captured and a better update produced. The ERFF is demonstrated for the purpose of log-conductivity identification from piezometric head observations in a number of scenarios with varying degrees of heterogeneity (log-conductivity variances going from 1 up to 6.25 (ln m/d)2), number of realizations in the ensemble (50 or 100), and number of piezometric head observations (18 or 36). In all scenarios, the ERFF works well, being able to reconstruct the log-conductivity spatial heterogeneity while matching the observed piezometric heads at selected control points. For benchmarking purposes the ERFF is compared to the restart EnKF to find that the ERFF is superior to the EnKF for the number of ensemble realizations used (small in typical EnKF applications). Only when the number of realizations grows to 500, the restart EnKF is able to match the performance of the ERFF, albeit at triple the computational cost.  ( 3 min )
    GT4SD: Generative Toolkit for Scientific Discovery. (arXiv:2207.03928v1 [cs.LG])
    With the growing availability of data within various scientific domains, generative models hold enormous potential to accelerate scientific discovery at every step of the scientific method. Perhaps their most valuable application lies in the speeding up of what has traditionally been the slowest and most challenging step of coming up with a hypothesis. Powerful representations are now being learned from large volumes of data to generate novel hypotheses, which is making a big impact on scientific discovery applications ranging from material design to drug discovery. The GT4SD (https://github.com/GT4SD/gt4sd-core) is an extensible open-source library that enables scientists, developers and researchers to train and use state-of-the-art generative models for hypothesis generation in scientific discovery. GT4SD supports a variety of uses of generative models across material science and drug discovery, including molecule discovery and design based on properties related to target proteins, omic profiles, scaffold distances, binding energies and more.  ( 2 min )
    Interaction Pattern Disentangling for Multi-Agent Reinforcement Learning. (arXiv:2207.03902v1 [cs.LG])
    Deep cooperative multi-agent reinforcement learning has demonstrated its remarkable success over a wide spectrum of complex control tasks. However, recent advances in multi-agent learning mainly focus on value decomposition while leaving entity interactions still intertwined, which easily leads to over-fitting on noisy interactions between entities. In this work, we introduce a novel interactiOn Pattern disenTangling (OPT) method, to disentangle not only the joint value function into agent-wise value functions for decentralized execution, but also the entity interactions into interaction prototypes, each of which represents an underlying interaction pattern within a sub-group of the entities. OPT facilitates filtering the noisy interactions between irrelevant entities and thus significantly improves generalizability as well as interpretability. Specifically, OPT introduces a sparse disagreement mechanism to encourage sparsity and diversity among discovered interaction prototypes. Then the model selectively restructures these prototypes into a compact interaction pattern by an aggregator with learnable weights. To alleviate the training instability issue caused by partial observability, we propose to maximize the mutual information between the aggregation weights and the history behaviors of each agent. Experiments on both single-task and multi-task benchmarks demonstrate that the proposed method yields results superior to the state-of-the-art counterparts. Our code will be made publicly available.  ( 2 min )
    UDRN: Unified Dimensional Reduction Neural Network for Feature Selection and Feature Projection. (arXiv:2207.03809v1 [cs.LG])
    Dimensional reduction~(DR) maps high-dimensional data into a lower dimensions latent space with minimized defined optimization objectives. The DR method usually falls into feature selection~(FS) and feature projection~(FP). FS focuses on selecting a critical subset of dimensions but risks destroying the data distribution (structure). On the other hand, FP combines all the input features into lower dimensions space, aiming to maintain the data structure; but lacks interpretability and sparsity. FS and FP are traditionally incompatible categories; thus, they have not been unified into an amicable framework. We propose that the ideal DR approach combines both FS and FP into a unified end-to-end manifold learning framework, simultaneously performing fundamental feature discovery while maintaining the intrinsic relationships between data samples in the latent space. In this work, we develop a unified framework, Unified Dimensional Reduction Neural-network~(UDRN), that integrates FS and FP in a compatible, end-to-end way. We improve the neural network structure by implementing FS and FP tasks separately using two stacked sub-networks. In addition, we designed data augmentation of the DR process to improve the generalization ability of the method when dealing with extensive feature datasets and designed loss functions that can cooperate with the data augmentation. Extensive experimental results on four image and four biological datasets, including very high-dimensional data, demonstrate the advantages of DRN over existing methods~(FS, FP, and FS\&FP pipeline), especially in downstream tasks such as classification and visualization.  ( 3 min )
    NExG: Provable and Guided State Space Exploration of Neural Network Control Systems using Sensitivity Approximation. (arXiv:2207.03884v1 [eess.SY])
    We propose a new technique for performing state space exploration of closed loop control systems with neural network feedback controllers. Our approach involves approximating the sensitivity of the trajectories of the closed loop dynamics. Using such an approximator and the system simulator, we present a guided state space exploration method that can generate trajectories visiting the neighborhood of a target state at a specified time. We present a theoretical framework which establishes that our method will produce a sequence of trajectories that will reach a suitable neighborhood of the target state. We provide thorough evaluation of our approach on various systems with neural network feedback controllers of different configurations. We outperform earlier state space exploration techniques and achieve significant improvement in both the quality (explainability) and performance (convergence rate). Finally, we adopt our algorithm for the falsification of a class of temporal logic specification, assess its performance against a state-of-the-art falsification tool, and show its potential in supplementing existing falsification algorithms.  ( 2 min )
    BAST: Binaural Audio Spectrogram Transformer for Binaural Sound Localization. (arXiv:2207.03927v1 [cs.SD])
    Accurate sound localization in a reverberation environment is essential for human auditory perception. Recently, Convolutional Neural Networks (CNNs) have been utilized to model the binaural human auditory pathway. However, CNN shows barriers in capturing the global acoustic features. To address this issue, we propose a novel end-to-end Binaural Audio Spectrogram Transformer (BAST) model to predict the sound azimuth in both anechoic and reverberation environments. Two modes of implementation, i.e. BAST-SP and BAST-NSP corresponding to BAST model with shared and non-shared parameters respectively, are explored. Our model with subtraction interaural integration and hybrid loss achieves an angular distance of 1.29 degrees and a Mean Square Error of 1e-3 at all azimuths, significantly surpassing CNN based model. The exploratory analysis of the BAST's performance on the left-right hemifields and anechoic and reverberation environments shows its generalization ability as well as the feasibility of binaural Transformers in sound localization. Furthermore, the analysis of the attention maps is provided to give additional insights on the interpretation of the localization process in a natural reverberant environment.  ( 2 min )
    Tightening Discretization-based MILP Models for the Pooling Problem using Upper Bounds on Bilinear Terms. (arXiv:2207.03699v1 [math.OC])
    Discretization-based methods have been proposed for solving nonconvex optimization problems with bilinear terms. These methods convert the original nonconvex optimization problems into mixed-integer linear programs (MILPs). Compared to a wide range of studies related to methods to convert nonconvex optimization problems into MILPs, research on tightening the resulting MILP models is limited. In this paper, we present tightening constraints for the discretization-based MILP models for the pooling problem. Specifically, we study tightening constraints derived from upper bounds on bilinear term and exploiting the structures resulting from the discretization. We demonstrate the effectiveness of our constraints, showing computational results for MILP models derived from different formulations for (1) the pooling problem and (2) discretization-based pooling models. Computational results show that our methods reduce the computational time for MILP models on CPLEX 12.10. Finally, we note that while our methods are presented in the context of the pooling problem, they can be extended to address other nonconvex optimization problems with upper bounds on bilinear terms.  ( 2 min )
    Big Learning: A Universal Machine Learning Paradigm?. (arXiv:2207.03899v1 [cs.LG])
    Recent breakthroughs based on big/foundation models reveal a vague avenue for artificial intelligence, that is, bid data, big/foundation models, big learning, $\cdots$. Following that avenue, here we elaborate on the newly introduced big learning. Specifically, big learning comprehensively exploits the available information inherent in large-scale complete/incomplete data, by simultaneously learning to model many-to-all joint/conditional/marginal data distributions (thus named big learning) with one universal foundation model. We reveal that big learning is what existing foundation models are implicitly doing; accordingly, our big learning provides high-level guidance for flexible design and improvements of foundation models, accelerating the true self-learning on the Internet. Besides, big learning ($i$) is equipped with marvelous flexibility for both training data and training-task customization; ($ii$) potentially delivers all joint/conditional/marginal data capabilities after training; ($iii$) significantly reduces the training-test gap with improved model generalization; and ($iv$) unifies conventional machine learning paradigms e.g. supervised learning, unsupervised learning, generative learning, etc. and enables their flexible cooperation, manifesting a universal learning paradigm.  ( 2 min )
    Product Segmentation Newsvendor Problems: A Robust Learning Approach. (arXiv:2207.03801v1 [cs.LG])
    We propose and analyze a product segmentation newsvendor problem, which generalizes the phenomenon of segmentation sales of a class of perishable items. The product segmentation newsvendor problem is a new variant of the newsvendor problem, reflecting that sellers maximize profits by determining the inventory of the whole item in the context of uncertain demand for sub-items. We derive the closed-form robust ordering decision by assuming that the means and covariance matrix of stochastic demand are available but not the distributions. However, robust approaches that always trade-off in the worst-case demand scenario face a concern in solution conservatism; thus, the traditional robust schemes offer unsatisfactory. In this paper, we integrate robust and deep reinforcement learning (DRL) techniques and propose a new paradigm termed robust learning to increase the attractiveness of robust policies. Notably, we take the robust decision as human domain knowledge and implement it into the training process of DRL by designing a full-process human-machine collaborative mechanism of teaching experience, normative decision, and regularization return. Simulation results confirm that our approach effectively improves robust performance and can generalize to various problems that require robust but less conservative solutions. Simultaneously, fewer training episodes, increased training stability, and interpretability of behavior may have the opportunity to facilitate the deployment of DRL algorithms in operational practice. Furthermore, the successful attempt of RLDQN to solve the 1000-dimensional demand scenarios reveals that the algorithm provides a path to solve complex operational problems through human-machine collaboration and may have potential significance for solving other complex operational management problems.  ( 3 min )
    Variational Inference of overparameterized Bayesian Neural Networks: a theoretical and empirical study. (arXiv:2207.03859v1 [stat.ML])
    This paper studies the Variational Inference (VI) used for training Bayesian Neural Networks (BNN) in the overparameterized regime, i.e., when the number of neurons tends to infinity. More specifically, we consider overparameterized two-layer BNN and point out a critical issue in the mean-field VI training. This problem arises from the decomposition of the lower bound on the evidence (ELBO) into two terms: one corresponding to the likelihood function of the model and the second to the Kullback-Leibler (KL) divergence between the prior distribution and the variational posterior. In particular, we show both theoretically and empirically that there is a trade-off between these two terms in the overparameterized regime only when the KL is appropriately re-scaled with respect to the ratio between the the number of observations and neurons. We also illustrate our theoretical results with numerical experiments that highlight the critical choice of this ratio.  ( 2 min )
    Encoding NetFlows for State-Machine Learning. (arXiv:2207.03890v1 [cs.LG])
    NetFlow data is a well-known network log format used by many network analysts and researchers. The advantages of using this format compared to pcap are that it contains fewer data, is less privacy intrusive, and is easier to collect and process. However, having less data does mean that this format might not be able to capture important network behaviour as all information is summarised into statistics. Much research aims to overcome this disadvantage through the use of machine learning, for instance, to detect attacks within a network. Many approaches can be used to pre-process the NetFlow data before it is used to train the machine learning algorithms. However, many of these approaches simply apply existing methods to the data, not considering the specific properties of network data. We argue that for data originating from software systems, such as NetFlow or software logs, similarities in frequency and contexts of feature values are more important than similarities in the value itself. In this work, we, therefore, propose an encoding algorithm that directly takes the frequency and the context of the feature values into account when the data is being processed. Different types of network behaviours can be clustered using this encoding, thus aiding the process of detecting anomalies within the network. From windows of these clusters obtained from monitoring a clean system, we learn state machine behavioural models for anomaly detection. These models are very well-suited to modelling the cyclic and repetitive patterns present in NetFlow data. We evaluate our encoding on a new dataset that we created for detecting problems in Kubernetes clusters and on two well-known public NetFlow datasets. The obtained performance results of the state machine models are comparable to existing works that use many more features and require both clean and infected data as training input.  ( 3 min )
    Convolutional Neural Networks for Time-dependent Classification of Variable-length Time Series. (arXiv:2207.03718v1 [cs.LG])
    Time series data are often obtained only within a limited time range due to interruptions during observation process. To classify such partial time series, we need to account for 1) the variable-length data drawn from 2) different timestamps. To address the first problem, existing convolutional neural networks use global pooling after convolutional layers to cancel the length differences. This architecture suffers from the trade-off between incorporating entire temporal correlations in long data and avoiding feature collapse for short data. To resolve this tradeoff, we propose Adaptive Multi-scale Pooling, which aggregates features from an adaptive number of layers, i.e., only the first few layers for short data and more layers for long data. Furthermore, to address the second problem, we introduce Temporal Encoding, which embeds the observation timestamps into the intermediate features. Experiments on our private dataset and the UCR/UEA time series archive show that our modules improve classification accuracy especially on short data obtained as partial time series.  ( 2 min )
    A Non-isotropic Probabilistic Take on Proxy-based Deep Metric Learning. (arXiv:2207.03784v1 [cs.LG])
    Proxy-based Deep Metric Learning (DML) learns deep representations by embedding images close to their class representatives (proxies), commonly with respect to the angle between them. However, this disregards the embedding norm, which can carry additional beneficial context such as class- or image-intrinsic uncertainty. In addition, proxy-based DML struggles to learn class-internal structures. To address both issues at once, we introduce non-isotropic probabilistic proxy-based DML. We model images as directional von Mises-Fisher (vMF) distributions on the hypersphere that can reflect image-intrinsic uncertainties. Further, we derive non-isotropic von Mises-Fisher (nivMF) distributions for class proxies to better represent complex class-specific variances. To measure the proxy-to-image distance between these models, we develop and investigate multiple distribution-to-point and distribution-to-distribution metrics. Each framework choice is motivated by a set of ablational studies, which showcase beneficial properties of our probabilistic approach to proxy-based DML, such as uncertainty-awareness, better-behaved gradients during training, and overall improved generalization performance. The latter is especially reflected in the competitive performance on the standard DML benchmarks, where our approach compares favorably, suggesting that existing proxy-based DML can significantly benefit from a more probabilistic treatment. Code is available at github.com/ExplainableML/Probabilistic_Deep_Metric_Learning.  ( 2 min )
    Combining Deep Learning with Good Old-Fashioned Machine Learning. (arXiv:2207.03757v1 [cs.LG])
    We present a comprehensive, stacking-based framework for combining deep learning with good old-fashioned machine learning, called Deep GOld. Our framework involves ensemble selection from 51 retrained pretrained deep networks as first-level models, and 10 machine-learning algorithms as second-level models. Enabled by today's state-of-the-art software tools and hardware platforms, Deep GOld delivers consistent improvement when tested on four image-classification datasets: Fashion MNIST, CIFAR10, CIFAR100, and Tiny ImageNet. Of 120 experiments, in all but 10 Deep GOld improved the original networks' performance.  ( 2 min )
    Safe reinforcement learning for multi-energy management systems with known constraint functions. (arXiv:2207.03830v1 [eess.SY])
    Reinforcement learning (RL) is a promising optimal control technique for multi-energy management systems. It does not require a model a priori - reducing the upfront and ongoing project-specific engineering effort and is capable of learning better representations of the underlying system dynamics. However, vanilla RL does not provide constraint satisfaction guarantees - resulting in various unsafe interactions within its safety-critical environment. In this paper, we present two novel safe RL methods, namely SafeFallback and GiveSafe, where the safety constraint formulation is decoupled from the RL formulation and which provides hard-constraint satisfaction guarantees both during training (exploration) and exploitation of the (close-to) optimal policy. In a simulated multi-energy systems case study we have shown that both methods start with a significantly higher utility (i.e. useful policy) compared to a vanilla RL benchmark (94,6% and 82,8% compared to 35,5%) and that the proposed SafeFallback method even can outperform the vanilla RL benchmark (102,9% to 100%). We conclude that both methods are viably safety constraint handling techniques capable beyond RL, as demonstrated with random agents while still providing hard-constraint guarantees. Finally, we propose fundamental future work to i.a. improve the constraint functions itself as more data becomes available.  ( 3 min )
    On the Subspace Structure of Gradient-Based Meta-Learning. (arXiv:2207.03804v1 [cs.LG])
    In this work we provide an analysis of the distribution of the post-adaptation parameters of Gradient-Based Meta-Learning (GBML) methods. Previous work has noticed how, for the case of image-classification, this adaption only takes place on the last layers of the network. We propose the more general notion that parameters are updated over a low-dimensional \emph{subspace} of the same dimensionality as the task-space and show that this holds for regression as well. Furthermore, the induced subspace structure provides a method to estimate the intrinsic dimension of the space of tasks of common few-shot learning datasets.  ( 2 min )
    A Survey on Participant Selection for Federated Learning in Mobile Networks. (arXiv:2207.03681v1 [cs.DC])
    Federated Learning (FL) is an efficient distributed machine learning paradigm that employs private datasets in a privacy-preserving manner. The main challenges of FL is that end devices usually possess various computation and communication capabilities and their training data are not independent and identically distributed (non-IID). Due to limited communication bandwidth and unstable availability of such devices in a mobile network, only a fraction of end devices (also referred to as the participants or clients in a FL process) can be selected in each round. Hence, it is of paramount importance to utilize an efficient participant selection scheme to maximize the performance of FL including final model accuracy and training time. In this paper, we provide a review of participant selection techniques for FL. First, we introduce FL and highlight the main challenges during participant selection. Then, we review the existing studies and categorize them based on their solutions. Finally, we provide some future directions on participant selection for FL based on our analysis of the state-of-the-art in this topic area.  ( 2 min )
    Private independence testing across two parties. (arXiv:2207.03652v1 [math.ST])
    We introduce $\pi$-test, a privacy-preserving algorithm for testing statistical independence between data distributed across multiple parties. Our algorithm relies on privately estimating the distance correlation between datasets, a quantitative measure of independence introduced in Sz\'ekely et al. [2007]. We establish both additive and multiplicative error bounds on the utility of our differentially private test, which we believe will find applications in a variety of distributed hypothesis testing settings involving sensitive data.  ( 2 min )
    Stability of Aggregation Graph Neural Networks. (arXiv:2207.03678v1 [cs.LG])
    In this paper we study the stability properties of aggregation graph neural networks (Agg-GNNs) considering perturbations of the underlying graph. An Agg-GNN is a hybrid architecture where information is defined on the nodes of a graph, but it is processed block-wise by Euclidean CNNs on the nodes after several diffusions on the graph shift operator. We derive stability bounds for the mapping operator associated to a generic Agg-GNN, and we specify conditions under which such operators can be stable to deformations. We prove that the stability bounds are defined by the properties of the filters in the first layer of the CNN that acts on each node. Additionally, we show that there is a close relationship between the number of aggregations, the filter's selectivity, and the size of the stability constants. We also conclude that in Agg-GNNs the selectivity of the mapping operators is tied to the properties of the filters only in the first layer of the CNN stage. This shows a substantial difference with respect to the stability properties of selection GNNs, where the selectivity of the filters in all layers is constrained by their stability. We provide numerical evidence corroborating the results derived, testing the behavior of Agg-GNNs in real life application scenarios considering perturbations of different magnitude.  ( 2 min )
    Tackling Data Heterogeneity: A New Unified Framework for Decentralized SGD with Sample-induced Topology. (arXiv:2207.03730v1 [math.OC])
    We develop a general framework unifying several gradient-based stochastic optimization methods for empirical risk minimization problems both in centralized and distributed scenarios. The framework hinges on the introduction of an augmented graph consisting of nodes modeling the samples and edges modeling both the inter-device communication and intra-device stochastic gradient computation. By designing properly the topology of the augmented graph, we are able to recover as special cases the renowned Local-SGD and DSGD algorithms, and provide a unified perspective for variance-reduction (VR) and gradient-tracking (GT) methods such as SAGA, Local-SVRG and GT-SAGA. We also provide a unified convergence analysis for smooth and (strongly) convex objectives relying on a proper structured Lyapunov function, and the obtained rate can recover the best known results for many existing algorithms. The rate results further reveal that VR and GT methods can effectively eliminate data heterogeneity within and across devices, respectively, enabling the exact convergence of the algorithm to the optimal solution. Numerical experiments confirm the findings in this paper.  ( 2 min )
    Predicting Li-ion Battery Cycle Life with LSTM RNN. (arXiv:2207.03687v1 [cs.LG])
    Efficient and accurate remaining useful life prediction is a key factor for reliable and safe usage of lithium-ion batteries. This work trains a long short-term memory recurrent neural network model to learn from sequential data of discharge capacities at various cycles and voltages and to work as a cycle life predictor for battery cells cycled under different conditions. Using experimental data of first 60 - 80 cycles, our model achieves promising prediction accuracy on test sets of around 80 samples.  ( 2 min )
    Deep Learning for Anomaly Detection in Log Data: A Survey. (arXiv:2207.03820v1 [cs.LG])
    Automatic log file analysis enables early detection of relevant incidents such as system failures. In particular, self-learning anomaly detection techniques capture patterns in log data and subsequently report unexpected log event occurrences to system operators without the need to provide or manually model anomalous scenarios in advance. Recently, an increasing number of approaches leveraging deep learning neural networks for this purpose have been presented. These approaches have demonstrated superior detection performance in comparison to conventional machine learning techniques and simultaneously resolve issues with unstable data formats. However, there exist many different architectures for deep learning and it is non-trivial to encode raw and unstructured log data to be analyzed by neural networks. We therefore carry out a systematic literature review that provides an overview of deployed models, data pre-processing mechanisms, anomaly detection techniques, and evaluations. The survey does not quantitatively compare existing approaches but instead aims to help readers understand relevant aspects of different model architectures and emphasizes open issues for future work.  ( 2 min )
    Nonparametric Embeddings of Sparse High-Order Interaction Events. (arXiv:2207.03639v1 [cs.LG])
    High-order interaction events are common in real-world applications. Learning embeddings that encode the complex relationships of the participants from these events is of great importance in knowledge mining and predictive tasks. Despite the success of existing approaches, e.g. Poisson tensor factorization, they ignore the sparse structure underlying the data, namely the occurred interactions are far less than the possible interactions among all the participants. In this paper, we propose Nonparametric Embeddings of Sparse High-order interaction events (NESH). We hybridize a sparse hypergraph (tensor) process and a matrix Gaussian process to capture both the asymptotic structural sparsity within the interactions and nonlinear temporal relationships between the participants. We prove strong asymptotic bounds (including both a lower and an upper bound) of the sparsity ratio, which reveals the asymptotic properties of the sampled structure. We use batch-normalization, stick-breaking construction, and sparse variational GP approximations to develop an efficient, scalable model inference algorithm. We demonstrate the advantage of our approach in several real-world applications.  ( 2 min )
    The Power of Transfer Learning in Agricultural Applications: AgriNet. (arXiv:2207.03881v1 [cs.CV])
    Advances in deep learning and transfer learning have paved the way for various automation classification tasks in agriculture, including plant diseases, pests, weeds, and plant species detection. However, agriculture automation still faces various challenges, such as the limited size of datasets and the absence of plant-domain-specific pretrained models. Domain specific pretrained models have shown state of art performance in various computer vision tasks including face recognition and medical imaging diagnosis. In this paper, we propose AgriNet dataset, a collection of 160k agricultural images from more than 19 geographical locations, several images captioning devices, and more than 423 classes of plant species and diseases. We also introduce AgriNet models, a set of pretrained models on five ImageNet architectures: VGG16, VGG19, Inception-v3, InceptionResNet-v2, and Xception. AgriNet-VGG19 achieved the highest classification accuracy of 94 % and the highest F1-score of 92%. Additionally, all proposed models were found to accurately classify the 423 classes of plant species, diseases, pests, and weeds with a minimum accuracy of 87% for the Inception-v3 model.Finally, experiments to evaluate of superiority of AgriNet models compared to ImageNet models were conducted on two external datasets: pest and plant diseases dataset from Bangladesh and a plant diseases dataset from Kashmir.  ( 2 min )
    End-to-End Binaural Speech Synthesis. (arXiv:2207.03697v1 [cs.SD])
    In this work, we present an end-to-end binaural speech synthesis system that combines a low-bitrate audio codec with a powerful binaural decoder that is capable of accurate speech binauralization while faithfully reconstructing environmental factors like ambient noise or reverb. The network is a modified vector-quantized variational autoencoder, trained with several carefully designed objectives, including an adversarial loss. We evaluate the proposed system on an internal binaural dataset with objective metrics and a perceptual study. Results show that the proposed approach matches the ground truth data more closely than previous methods. In particular, we demonstrate the capability of the adversarial loss in capturing environment effects needed to create an authentic auditory scene.  ( 2 min )
    GCN-based Multi-task Representation Learning for Anomaly Detection in Attributed Networks. (arXiv:2207.03688v1 [cs.LG])
    Anomaly detection in attributed networks has received a considerable attention in recent years due to its applications in a wide range of domains such as finance, network security, and medicine. Traditional approaches cannot be adopted on attributed networks' settings to solve the problem of anomaly detection. The main limitation of such approaches is that they inherently ignore the relational information between data features. With a rapid explosion in deep learning- and graph neural networks-based techniques, spotting rare objects on attributed networks has significantly stepped forward owing to the potentials of deep techniques in extracting complex relationships. In this paper, we propose a new architecture on anomaly detection. The main goal of designing such an architecture is to utilize multi-task learning which would enhance the detection performance. Multi-task learning-based anomaly detection is still in its infancy and only a few studies in the existing literature have catered to the same. We incorporate both community detection and multi-view representation learning techniques for extracting distinct and complementary information from attributed networks and subsequently fuse the captured information for achieving a better detection result. The mutual collaboration between two main components employed in this architecture, i.e., community-specific learning and multi-view representation learning, exhibits a promising solution to reach more effective results.  ( 3 min )
    Video Dialog as Conversation about Objects Living in Space-Time. (arXiv:2207.03656v1 [cs.CV])
    It would be a technological feat to be able to create a system that can hold a meaningful conversation with humans about what they watch. A setup toward that goal is presented as a video dialog task, where the system is asked to generate natural utterances in response to a question in an ongoing dialog. The task poses great visual, linguistic, and reasoning challenges that cannot be easily overcome without an appropriate representation scheme over video and dialog that supports high-level reasoning. To tackle these challenges we present a new object-centric framework for video dialog that supports neural reasoning dubbed COST - which stands for Conversation about Objects in Space-Time. Here dynamic space-time visual content in videos is first parsed into object trajectories. Given this video abstraction, COST maintains and tracks object-associated dialog states, which are updated upon receiving new questions. Object interactions are dynamically and conditionally inferred for each question, and these serve as the basis for relational reasoning among them. COST also maintains a history of previous answers, and this allows retrieval of relevant object-centric information to enrich the answer forming process. Language production then proceeds in a step-wise manner, taking into the context of the current utterance, the existing dialog, the current question. We evaluate COST on the DSTC7 and DSTC8 benchmarks, demonstrating its competitiveness against state-of-the-arts.  ( 3 min )
    Guiding the retraining of convolutional neural networks against adversarial inputs. (arXiv:2207.03689v1 [cs.SE])
    Background: When using deep learning models, there are many possible vulnerabilities and some of the most worrying are the adversarial inputs, which can cause wrong decisions with minor perturbations. Therefore, it becomes necessary to retrain these models against adversarial inputs, as part of the software testing process addressing the vulnerability to these inputs. Furthermore, for an energy efficient testing and retraining, data scientists need support on which are the best guidance metrics and optimal dataset configurations. Aims: We examined four guidance metrics for retraining convolutional neural networks and three retraining configurations. Our goal is to improve the models against adversarial inputs regarding accuracy, resource utilization and time from the point of view of a data scientist in the context of image classification. Method: We conducted an empirical study in two datasets for image classification. We explore: (a) the accuracy, resource utilization and time of retraining convolutional neural networks by ordering new training set by four different guidance metrics (neuron coverage, likelihood-based surprise adequacy, distance-based surprise adequacy and random), (b) the accuracy and resource utilization of retraining convolutional neural networks with three different configurations (from scratch and augmented dataset, using weights and augmented dataset, and using weights and only adversarial inputs). Results: We reveal that retraining with adversarial inputs from original weights and by ordering with surprise adequacy metrics gives the best model w.r.t. the used metrics. Conclusions: Although more studies are necessary, we recommend data scientists to use the above configuration and metrics to deal with the vulnerability to adversarial inputs of deep learning models, as they can improve their models against adversarial inputs without using many inputs.  ( 3 min )
    Balanced Self-Paced Learning for AUC Maximization. (arXiv:2207.03650v1 [cs.LG])
    Learning to improve AUC performance is an important topic in machine learning. However, AUC maximization algorithms may decrease generalization performance due to the noisy data. Self-paced learning is an effective method for handling noisy data. However, existing self-paced learning methods are limited to pointwise learning, while AUC maximization is a pairwise learning problem. To solve this challenging problem, we innovatively propose a balanced self-paced AUC maximization algorithm (BSPAUC). Specifically, we first provide a statistical objective for self-paced AUC. Based on this, we propose our self-paced AUC maximization formulation, where a novel balanced self-paced regularization term is embedded to ensure that the selected positive and negative samples have proper proportions. Specially, the sub-problem with respect to all weight variables may be non-convex in our formulation, while the one is normally convex in existing self-paced problems. To address this, we propose a doubly cyclic block coordinate descent method. More importantly, we prove that the sub-problem with respect to all weight variables converges to a stationary point on the basis of closed-form solutions, and our BSPAUC converges to a stationary point of our fixed optimization objective under a mild assumption. Considering both the deep learning and kernel-based implementations, experimental results on several large-scale datasets demonstrate that our BSPAUC has a better generalization performance than existing state-of-the-art AUC maximization methods.  ( 2 min )
    Information-Gathering in Latent Bandits. (arXiv:2207.03635v1 [cs.LG])
    In the latent bandit problem, the learner has access to reward distributions and -- for the non-stationary variant -- transition models of the environment. The reward distributions are conditioned on the arm and unknown latent states. The goal is to use the reward history to identify the latent state, allowing for the optimal choice of arms in the future. The latent bandit setting lends itself to many practical applications, such as recommender and decision support systems, where rich data allows the offline estimation of environment models with online learning remaining a critical component. Previous solutions in this setting always choose the highest reward arm according to the agent's beliefs about the state, not explicitly considering the value of information-gathering arms. Such information-gathering arms do not necessarily provide the highest reward, thus may never be chosen by an agent that chooses the highest reward arms at all times. In this paper, we present a method for information-gathering in latent bandits. Given particular reward structures and transition matrices, we show that choosing the best arm given the agent's beliefs about the states incurs higher regret. Furthermore, we show that by choosing arms carefully, we obtain an improved estimation of the state distribution, and thus lower the cumulative regret through better arm choices in the future. We evaluate our method on both synthetic and real-world data sets, showing significant improvement in regret over state-of-the-art methods.  ( 3 min )
    Getting BART to Ride the Idiomatic Train: Learning to Represent Idiomatic Expressions. (arXiv:2207.03679v1 [cs.CL])
    Idiomatic expressions (IEs), characterized by their non-compositionality, are an important part of natural language. They have been a classical challenge to NLP, including pre-trained language models that drive today's state-of-the-art. Prior work has identified deficiencies in their contextualized representation stemming from the underlying compositional paradigm of representation. In this work, we take a first-principles approach to build idiomaticity into BART using an adapter as a lightweight non-compositional language expert trained on idiomatic sentences. The improved capability over baselines (e.g., BART) is seen via intrinsic and extrinsic methods, where idiom embeddings score 0.19 points higher in homogeneity score for embedding clustering, and up to 25% higher sequence accuracy on the idiom processing tasks of IE sense disambiguation and span detection.  ( 2 min )
    Abs-CAM: A Gradient Optimization Interpretable Approach for Explanation of Convolutional Neural Networks. (arXiv:2207.03648v1 [cs.CV])
    The black-box nature of Deep Neural Networks (DNNs) severely hinders its performance improvement and application in specific scenes. In recent years, class activation mapping-based method has been widely used to interpret the internal decisions of models in computer vision tasks. However, when this method uses backpropagation to obtain gradients, it will cause noise in the saliency map, and even locate features that are irrelevant to decisions. In this paper, we propose an Absolute value Class Activation Mapping-based (Abs-CAM) method, which optimizes the gradients derived from the backpropagation and turns all of them into positive gradients to enhance the visual features of output neurons' activation, and improve the localization ability of the saliency map. The framework of Abs-CAM is divided into two phases: generating initial saliency map and generating final saliency map. The first phase improves the localization ability of the saliency map by optimizing the gradient, and the second phase linearly combines the initial saliency map with the original image to enhance the semantic information of the saliency map. We conduct qualitative and quantitative evaluation of the proposed method, including Deletion, Insertion, and Pointing Game. The experimental results show that the Abs-CAM can obviously eliminate the noise in the saliency map, and can better locate the features related to decisions, and is superior to the previous methods in recognition and localization tasks.  ( 3 min )
    SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning. (arXiv:2207.03677v1 [cs.CV])
    Neural architecture search (NAS) has demonstrated amazing success in searching for efficient deep neural networks (DNNs) from a given supernet. In parallel, the lottery ticket hypothesis has shown that DNNs contain small subnetworks that can be trained from scratch to achieve a comparable or higher accuracy than original DNNs. As such, it is currently a common practice to develop efficient DNNs via a pipeline of first search and then prune. Nevertheless, doing so often requires a search-train-prune-retrain process and thus prohibitive computational cost. In this paper, we discover for the first time that both efficient DNNs and their lottery subnetworks (i.e., lottery tickets) can be directly identified from a supernet, which we term as SuperTickets, via a two-in-one training scheme with jointly architecture searching and parameter pruning. Moreover, we develop a progressive and unified SuperTickets identification strategy that allows the connectivity of subnetworks to change during supernet training, achieving better accuracy and efficiency trade-offs than conventional sparse training. Finally, we evaluate whether such identified SuperTickets drawn from one task can transfer well to other tasks, validating their potential of handling multiple tasks simultaneously. Extensive experiments and ablation studies on three tasks and four benchmark datasets validate that our proposed SuperTickets achieve boosted accuracy and efficiency trade-offs than both typical NAS and pruning pipelines, regardless of having retraining or not. Codes and pretrained models are available at https://github.com/RICE-EIC/SuperTickets.  ( 3 min )
    A Support Vector Model of Pruning Trees Evaluation Based on OTSU Algorithm. (arXiv:2207.03638v1 [cs.CV])
    The tree pruning process is the key to promoting fruits' growth and improving their productions due to effects on the photosynthesis efficiency of fruits and nutrition transportation in branches. Currently, pruning is still highly dependent on human labor. The workers' experience will strongly affect the robustness of the performance of the tree pruning. Thus, it is a challenge for workers and farmers to evaluate the pruning performance. Intended for a better solution to the problem, this paper presents a novel pruning classification strategy model called "OTSU-SVM" to evaluate the pruning performance based on the shadows of branches and leaves. This model considers not only the available illuminated area of the tree but also the uniformity of the illuminated area of the tree. More importantly, our group implements OTSU algorithm into the model, which highly reinforces robustness of the evaluation of this model. In addition, the data from the pear trees in the Yuhang District, Hangzhou is also used in the experiment. In this experiment, we prove that the OTSU-SVM has good accuracy with 80% and high performance in the evaluation of the pruning for the pear trees. It can provide more successful pruning if applied into the orchard. A successful pruning can broaden the illuminated area of individual fruit, and increase nutrition transportation from the target branch, dramatically elevating the weights and production of the fruits.  ( 3 min )
    Generalization Guarantee of Training Graph Convolutional Networks with Graph Topology Sampling. (arXiv:2207.03584v1 [cs.LG])
    Graph convolutional networks (GCNs) have recently achieved great empirical success in learning graph-structured data. To address its scalability issue due to the recursive embedding of neighboring features, graph topology sampling has been proposed to reduce the memory and computational cost of training GCNs, and it has achieved comparable test performance to those without topology sampling in many empirical studies. To the best of our knowledge, this paper provides the first theoretical justification of graph topology sampling in training (up to) three-layer GCNs for semi-supervised node classification. We formally characterize some sufficient conditions on graph topology sampling such that GCN training leads to a diminishing generalization error. Moreover, our method tackles the nonconvex interaction of weights across layers, which is under-explored in the existing theoretical analyses of GCNs. This paper characterizes the impact of graph structures and topology sampling on the generalization performance and sample complexity explicitly, and the theoretical findings are also justified through numerical experiments.  ( 2 min )
    Individual Preference Stability for Clustering. (arXiv:2207.03600v1 [cs.LG])
    In this paper, we propose a natural notion of individual preference (IP) stability for clustering, which asks that every data point, on average, is closer to the points in its own cluster than to the points in any other cluster. Our notion can be motivated from several perspectives, including game theory and algorithmic fairness. We study several questions related to our proposed notion. We first show that deciding whether a given data set allows for an IP-stable clustering in general is NP-hard. As a result, we explore the design of efficient algorithms for finding IP-stable clusterings in some restricted metric spaces. We present a polytime algorithm to find a clustering satisfying exact IP-stability on the real line, and an efficient algorithm to find an IP-stable 2-clustering for a tree metric. We also consider relaxing the stability constraint, i.e., every data point should not be too far from its own cluster compared to any other cluster. For this case, we provide polytime algorithms with different guarantees. We evaluate some of our algorithms and several standard clustering approaches on real data sets.  ( 2 min )
    Pruning Early Exit Networks. (arXiv:2207.03644v1 [cs.LG])
    Deep learning models that perform well often have high computational costs. In this paper, we combine two approaches that try to reduce the computational cost while keeping the model performance high: pruning and early exit networks. We evaluate two approaches of pruning early exit networks: (1) pruning the entire network at once, (2) pruning the base network and additional linear classifiers in an ordered fashion. Experimental results show that pruning the entire network at once is a better strategy in general. However, at high accuracy rates, the two approaches have a similar performance, which implies that the processes of pruning and early exit can be separated without loss of optimality.  ( 2 min )
    PoseGU: 3D Human Pose Estimation with Novel Human Pose Generator and Unbiased Learning. (arXiv:2207.03618v1 [cs.CV])
    3D pose estimation has recently gained substantial interests in computer vision domain. Existing 3D pose estimation methods have a strong reliance on large size well-annotated 3D pose datasets, and they suffer poor model generalization on unseen poses due to limited diversity of 3D poses in training sets. In this work, we propose PoseGU, a novel human pose generator that generates diverse poses with access only to a small size of seed samples, while equipping the Counterfactual Risk Minimization to pursue an unbiased evaluation objective. Extensive experiments demonstrate PoseGU outforms almost all the state-of-the-art 3D human pose methods under consideration over three popular benchmark datasets. Empirical analysis also proves PoseGU generates 3D poses with improved data diversity and better generalization ability.  ( 2 min )
    Robustness Evaluation of Deep Unsupervised Learning Algorithms for Intrusion Detection Systems. (arXiv:2207.03576v1 [cs.CR])
    Recently, advances in deep learning have been observed in various fields, including computer vision, natural language processing, and cybersecurity. Machine learning (ML) has demonstrated its ability as a potential tool for anomaly detection-based intrusion detection systems to build secure computer networks. Increasingly, ML approaches are widely adopted than heuristic approaches for cybersecurity because they learn directly from data. Data is critical for the development of ML systems, and becomes potential targets for attackers. Basically, data poisoning or contamination is one of the most common techniques used to fool ML models through data. This paper evaluates the robustness of six recent deep learning algorithms for intrusion detection on contaminated data. Our experiments suggest that the state-of-the-art algorithms used in this study are sensitive to data contamination and reveal the importance of self-defense against data perturbation when developing novel models, especially for intrusion detection systems.  ( 2 min )
    One for All: Simultaneous Metric and Preference Learning over Multiple Users. (arXiv:2207.03609v1 [stat.ML])
    This paper investigates simultaneous preference and metric learning from a crowd of respondents. A set of items represented by $d$-dimensional feature vectors and paired comparisons of the form ``item $i$ is preferable to item $j$'' made by each user is given. Our model jointly learns a distance metric that characterizes the crowd's general measure of item similarities along with a latent ideal point for each user reflecting their individual preferences. This model has the flexibility to capture individual preferences, while enjoying a metric learning sample cost that is amortized over the crowd. We first study this problem in a noiseless, continuous response setting (i.e., responses equal to differences of item distances) to understand the fundamental limits of learning. Next, we establish prediction error guarantees for noisy, binary measurements such as may be collected from human respondents, and show how the sample complexity improves when the underlying metric is low-rank. Finally, we establish recovery guarantees under assumptions on the response distribution. We demonstrate the performance of our model on both simulated data and on a dataset of color preference judgements across a large number of users.  ( 2 min )
    Hyper-Universal Policy Approximation: Learning to Generate Actions from a Single Image using Hypernets. (arXiv:2207.03593v1 [cs.LG])
    Inspired by Gibson's notion of object affordances in human vision, we ask the question: how can an agent learn to predict an entire action policy for a novel object or environment given only a single glimpse? To tackle this problem, we introduce the concept of Universal Policy Functions (UPFs) which are state-to-action mappings that generalize not only to new goals but most importantly to novel, unseen environments. Specifically, we consider the problem of efficiently learning such policies for agents with limited computational and communication capacity, constraints that are frequently encountered in edge devices. We propose the Hyper-Universal Policy Approximator (HUPA), a hypernetwork-based model to generate small task- and environment-conditional policy networks from a single image, with good generalization properties. Our results show that HUPAs significantly outperform an embedding-based alternative for generated policies that are size-constrained. Although this work is restricted to a simple map-based navigation task, future work includes applying the principles behind HUPAs to learning more general affordances for objects and environments.  ( 2 min )
    A Study on the Predictability of Sample Learning Consistency. (arXiv:2207.03571v1 [cs.LG])
    Curriculum Learning is a powerful training method that allows for faster and better training in some settings. This method, however, requires having a notion of which examples are difficult and which are easy, which is not always trivial to provide. A recent metric called C-Score acts as a proxy for example difficulty by relating it to learning consistency. Unfortunately, this method is quite compute intensive which limits its applicability for alternative datasets. In this work, we train models through different methods to predict C-Score for CIFAR-100 and CIFAR-10. We find, however, that these models generalize poorly both within the same distribution as well as out of distribution. This suggests that C-Score is not defined by the individual characteristics of each sample but rather by other factors. We hypothesize that a sample's relation to its neighbours, in particular, how many of them share the same labels, can help in explaining C-Scores. We plan to explore this in future work.  ( 2 min )
    Code Translation with Compiler Representations. (arXiv:2207.03578v1 [cs.PL])
    In this paper, we leverage low-level compiler intermediate representations (IR) to improve code translation. Traditional transpilers rely on syntactic information and handcrafted rules, which limits their applicability and produces unnatural-looking code. Applying neural machine translation (NMT) approaches to code has successfully broadened the set of programs on which one can get a natural-looking translation. However, they treat the code as sequences of text tokens, and still do not differentiate well enough between similar pieces of code which have different semantics in different languages. The consequence is low quality translation, reducing the practicality of NMT, and stressing the need for an approach significantly increasing its accuracy. Here we propose to augment code translation with IRs, specifically LLVM IR, with results on the C++, Java, Rust, and Go languages. Our method improves upon the state of the art for unsupervised code translation, increasing the number of correct translations by 11% on average, and up to 79% for the Java - Rust pair. We extend previous test sets for code translation, by adding hundreds of Go and Rust functions. Additionally, we train models with high performance on the problem of IR decompilation, generating programming source code from IR, and study using IRs as intermediary pivot for translation.  ( 2 min )
    Learning-based Autonomous Channel Access in the Presence of Hidden Terminals. (arXiv:2207.03605v1 [cs.LG])
    We consider the problem of autonomous channel access (AutoCA), where a group of terminals tries to discover a communication strategy with an access point (AP) via a common wireless channel in a distributed fashion. Due to the irregular topology and the limited communication range of terminals, a practical challenge for AutoCA is the hidden terminal problem, which is notorious in wireless networks for deteriorating the throughput and delay performances. To meet the challenge, this paper presents a new multi-agent deep reinforcement learning paradigm, dubbed MADRL-HT, tailored for AutoCA in the presence of hidden terminals. MADRL-HT exploits topological insights and transforms the observation space of each terminal into a scalable form independent of the number of terminals. To compensate for the partial observability, we put forth a look-back mechanism such that the terminals can infer behaviors of their hidden terminals from the carrier sensed channel states as well as feedback from the AP. A window-based global reward function is proposed, whereby the terminals are instructed to maximize the system throughput while balancing the terminals' transmission opportunities over the course of learning. Extensive numerical experiments verified the superior performance of our solution benchmarked against the legacy carrier-sense multiple access with collision avoidance (CSMA/CA) protocol.  ( 3 min )
    CausalAgents: A Robustness Benchmark for Motion Forecasting using Causal Relationships. (arXiv:2207.03586v1 [cs.LG])
    As machine learning models become increasingly prevalent in motion forecasting systems for autonomous vehicles (AVs), it is critical that we ensure that model predictions are safe and reliable. However, exhaustively collecting and labeling the data necessary to fully test the long tail of rare and challenging scenarios is difficult and expensive. In this work, we construct a new benchmark for evaluating and improving model robustness by applying perturbations to existing data. Specifically, we conduct an extensive labeling effort to identify causal agents, or agents whose presence influences human driver behavior in any way, in the Waymo Open Motion Dataset (WOMD), and we use these labels to perturb the data by deleting non-causal agents from the scene. We then evaluate a diverse set of state-of-the-art deep-learning model architectures on our proposed benchmark and find that all models exhibit large shifts under perturbation. Under non-causal perturbations, we observe a $25$-$38\%$ relative change in minADE as compared to the original. We then investigate techniques to improve model robustness, including increasing the training dataset size and using targeted data augmentations that drop agents throughout training. We plan to provide the causal agent labels as an additional attribute to WOMD and release the robustness benchmarks to aid the community in building more reliable and safe deep-learning models for motion forecasting.  ( 3 min )
    Learning and generalization of one-hidden-layer neural networks, going beyond standard Gaussian data. (arXiv:2207.03615v1 [cs.LG])
    This paper analyzes the convergence and generalization of training a one-hidden-layer neural network when the input features follow the Gaussian mixture model consisting of a finite number of Gaussian distributions. Assuming the labels are generated from a teacher model with an unknown ground truth weight, the learning problem is to estimate the underlying teacher model by minimizing a non-convex risk function over a student neural network. With a finite number of training samples, referred to the sample complexity, the iterations are proved to converge linearly to a critical point with guaranteed generalization error. In addition, for the first time, this paper characterizes the impact of the input distributions on the sample complexity and the learning rate.  ( 2 min )
    Automatic Synthesis of Neurons for Recurrent Neural Nets. (arXiv:2207.03577v1 [cs.NE])
    We present a new class of neurons, ARNs, which give a cross entropy on test data that is up to three times lower than the one achieved by carefully optimized LSTM neurons. The explanations for the huge improvements that often are achieved are elaborate skip connections through time, up to four internal memory states per neuron and a number of novel activation functions including small quadratic forms. The new neurons were generated using automatic programming and are formulated as pure functional programs that easily can be transformed. We present experimental results for eight datasets and found excellent improvements for seven of them, but LSTM remained the best for one dataset. The results are so promising that automatic programming to generate new neurons should become part of the standard operating procedure for any machine learning practitioner who works on time series data such as sensor signals.  ( 2 min )
    Demystifying the Adversarial Robustness of Random Transformation Defenses. (arXiv:2207.03574v1 [cs.CR])
    Neural networks' lack of robustness against attacks raises concerns in security-sensitive settings such as autonomous vehicles. While many countermeasures may look promising, only a few withstand rigorous evaluation. Defenses using random transformations (RT) have shown impressive results, particularly BaRT (Raff et al., 2019) on ImageNet. However, this type of defense has not been rigorously evaluated, leaving its robustness properties poorly understood. Their stochastic properties make evaluation more challenging and render many proposed attacks on deterministic models inapplicable. First, we show that the BPDA attack (Athalye et al., 2018a) used in BaRT's evaluation is ineffective and likely overestimates its robustness. We then attempt to construct the strongest possible RT defense through the informed selection of transformations and Bayesian optimization for tuning their parameters. Furthermore, we create the strongest possible attack to evaluate our RT defense. Our new attack vastly outperforms the baseline, reducing the accuracy by 83% compared to the 19% reduction by the commonly used EoT attack ($4.3\times$ improvement). Our result indicates that the RT defense on the Imagenette dataset (a ten-class subset of ImageNet) is not robust against adversarial examples. Extending the study further, we use our new attack to adversarially train RT defense (called AdvRT), resulting in a large robustness gain. Code is available at https://github.com/wagnergroup/demystify-random-transform.  ( 3 min )
    Dynamic Community Detection via Adversarial Temporal Graph Representation Learning. (arXiv:2207.03580v1 [cs.SI])
    Dynamic community detection has been prospered as a powerful tool for quantifying changes in dynamic brain network connectivity patterns by identifying strongly connected sets of nodes. However, as the network science problems and network data to be processed become gradually more sophisticated, it awaits a better method to efficiently learn low dimensional representation from dynamic network data and reveal its latent function that changes over time in the brain network. In this work, an adversarial temporal graph representation learning (ATGRL) framework is proposed to detect dynamic communities from a small sample of brain network data. It adopts a novel temporal graph attention network as an encoder to capture more efficient spatio-temporal features by attention mechanism in both spatial and temporal dimensions. In addition, the framework employs adversarial training to guide the learning of temporal graph representation and optimize the measurable modularity loss to maximize the modularity of community. Experiments on the real-world brain networks datasets are demonstrated to show the effectiveness of this new method.  ( 2 min )
    Convolution Neural Network based Mode Decomposition for Degenerated Modes via Multiple Images from Polarizers. (arXiv:2207.03489v1 [cs.CV])
    In this paper, a mode decomposition (MD) method for degenerated modes has been studied. Convolution neural network (CNN) has been applied for image training and predicting the mode coefficients. Four-fold degenerated $LP_{11}$ series has been the target to be decomposed. Multiple images are regarded as an input to decompose the degenerate modes. Total of seven different images, including the full original near-field image, and images after linear polarizers of four directions (0$^\circ$, 45$^\circ$, 90$^\circ$, and 135$^\circ$), and images after two circular polarizers (right-handed and left-handed) has been considered for training, validation, and test. The output label of the model has been chosen as the real and imaginary components of the mode coefficient, and the loss function has been selected to be the root-mean-square (RMS) of the labels. The RMS and mean-absolute-error (MAE) of the label, intensity, phase, and field correlation between the actual and predicted values have been selected to be the metrics to evaluate the CNN model. The CNN model has been trained with 100,000 three-dimensional images with depths of three, four, and seven. The performance of the trained model was evaluated via 10,000 test samples with four sets of images - images after three linear polarizers (0$^\circ$, 45$^\circ$, 90$^\circ$) and image after right-handed circular polarizer - showed 0.0634 of label RMS, 0.0292 of intensity RMS, 0.1867 rad of phase MAE, and 0.9978 of average field correlation. The performance of 4 image sets showed at least 50.68\% of performance enhancement compared to models considering only images after linear polarizers.  ( 3 min )
    An Embedding-Dynamic Approach to Self-supervised Learning. (arXiv:2207.03552v1 [cs.CV])
    A number of recent self-supervised learning methods have shown impressive performance on image classification and other tasks. A somewhat bewildering variety of techniques have been used, not always with a clear understanding of the reasons for their benefits, especially when used in combination. Here we treat the embeddings of images as point particles and consider model optimization as a dynamic process on this system of particles. Our dynamic model combines an attractive force for similar images, a locally dispersive force to avoid local collapse, and a global dispersive force to achieve a globally-homogeneous distribution of particles. The dynamic perspective highlights the advantage of using a delayed-parameter image embedding (a la BYOL) together with multiple views of the same image. It also uses a purely-dynamic local dispersive force (Brownian motion) that shows improved performance over other methods and does not require knowledge of other particle coordinates. The method is called MSBReg which stands for (i) a Multiview centroid loss, which applies an attractive force to pull different image view embeddings toward their centroid, (ii) a Singular value loss, which pushes the particle system toward spatially homogeneous density, (iii) a Brownian diffusive loss. We evaluate downstream classification performance of MSBReg on ImageNet as well as transfer learning tasks including fine-grained classification, multi-class object classification, object detection, and instance segmentation. In addition, we also show that applying our regularization term to other methods further improves their performance and stabilize the training by preventing a mode collapse.  ( 3 min )
    On Non-Linear operators for Geometric Deep Learning. (arXiv:2207.03485v1 [cs.LG])
    This work studies operators mapping vector and scalar fields defined over a manifold $\mathcal{M}$, and which commute with its group of diffeomorphisms $\text{Diff}(\mathcal{M})$. We prove that in the case of scalar fields $L^p_\omega(\mathcal{M,\mathbb{R}})$, those operators correspond to point-wise non-linearities, recovering and extending known results on $\mathbb{R}^d$. In the context of Neural Networks defined over $\mathcal{M}$, it indicates that point-wise non-linear operators are the only universal family that commutes with any group of symmetries, and justifies their systematic use in combination with dedicated linear operators commuting with specific symmetries. In the case of vector fields $L^p_\omega(\mathcal{M},T\mathcal{M})$, we show that those operators are solely the scalar multiplication. It indicates that $\text{Diff}(\mathcal{M})$ is too rich and that there is no universal class of non-linear operators to motivate the design of Neural Networks over the symmetries of $\mathcal{M}$.  ( 2 min )
    TF-GNN: Graph Neural Networks in TensorFlow. (arXiv:2207.03522v1 [cs.LG])
    TensorFlow GNN (TF-GNN) is a scalable library for Graph Neural Networks in TensorFlow. It is designed from the bottom up to support the kinds of rich heterogeneous graph data that occurs in today's information ecosystems. Many production models at Google use TF-GNN and it has been recently released as an open source project. In this paper, we describe the TF-GNN data model, its Keras modeling API, and relevant capabilities such as graph sampling, distributed training, and accelerator support.  ( 2 min )
    Recent Results of Energy Disaggregation with Behind-the-Meter Solar Generation. (arXiv:2207.03490v1 [cs.LG])
    The rapid deployment of renewable generations such as photovoltaic (PV) generations brings great challenges to the resiliency of existing power systems. Because PV generations are volatile and typically invisible to the power system operator, estimating the generation and characterizing the uncertainty are in urgent need for operators to make insightful decisions. This paper summarizes our recent results on energy disaggregation at the substation level with Behind-the-Meter solar generation. We formulate the so-called ``partial label'' problem for energy disaggregation at substations, where the aggregate measurements contain the total consumption of multiple loads, and the existence of some loads is unknown. We develop two model-free disaggregation approaches based on deterministic dictionary learning and Bayesian dictionary learning, respectively. Unlike conventional methods which require fully annotated training data of individual loads, our approaches can extract load patterns given partially labeled aggregate data. Therefore, our partial label formulation is more applicable in the real world. Compared with deterministic dictionary learning, the Bayesian dictionary learning-based approach provides the uncertainty measure for the disaggregation results, at the cost of increased computational complexity. All the methods are validated by numerical experiments.  ( 2 min )
    HierarchicalForecast: A Python Benchmarking Framework for Hierarchical Forecasting. (arXiv:2207.03517v1 [stat.ML])
    Large collections of time series data are commonly organized into cross-sectional structures with different levels of aggregation; examples include product and geographical groupings. A necessary condition for coherent decision-making and planning, with such data sets, is for the dis-aggregated series' forecasts to add up exactly to the aggregated series forecasts, which motivates the creation of novel hierarchical forecasting algorithms. The growing interest of the Machine Learning community in cross-sectional hierarchical forecasting systems states that we are in a propitious moment to ensure that scientific endeavors are grounded on sound baselines. For this reason, we put forward the HierarchicalForecast library, which contains preprocessed publicly available datasets, evaluation metrics, and a compiled set of statistical baseline models. Our Python-based framework aims to bridge the gap between statistical, econometric modeling, and Machine Learning forecasting research. Code and documentation are available in https://github.com/Nixtla/hierarchicalforecast.  ( 2 min )
    G2L: A Geometric Approach for Generating Pseudo-labels that Improve Transfer Learning. (arXiv:2207.03554v1 [cs.LG])
    Transfer learning is a deep-learning technique that ameliorates the problem of learning when human-annotated labels are expensive and limited. In place of such labels, it uses instead the previously trained weights from a well-chosen source model as the initial weights for the training of a base model for a new target dataset. We demonstrate a novel but general technique for automatically creating such source models. We generate pseudo-labels according to an efficient and extensible algorithm that is based on a classical result from the geometry of high dimensions, the Cayley-Menger determinant. This G2L (``geometry to label'') method incrementally builds up pseudo-labels using a greedy computation of hypervolume content. We demonstrate that the method is tunable with respect to expected accuracy, which can be forecast by an information-theoretic measure of dataset similarity (divergence) between source and target. The results of 280 experiments show that this mechanical technique generates base models that have similar or better transferability compared to a baseline of models trained on extensively human-annotated ImageNet1K labels, yielding an overall error decrease of 0.43\%, and an error decrease in 4 out of 5 divergent datasets tested.  ( 2 min )
    The use of deep learning enables high diagnostic accuracy in detecting syndesmotic instability on weight-bearing CT scanning. (arXiv:2207.03568v1 [eess.IV])
    Delayed diagnosis of syndesmosis instability can lead to significant morbidity and accelerated arthritic change in the ankle joint. Weight-bearing computed tomography (WBCT) has shown promising potential for early and reliable detection of isolated syndesmotic instability using 3D volumetric measurements. While these measurements have been reported to be highly accurate, they are also experience-dependent, time-consuming, and need a particular 3D measurement software tool that leads the clinicians to still show more interest in the conventional diagnostic methods for syndesmotic instability. The purpose of this study was to increase accuracy, accelerate analysis time, and reduce inter-observer bias by automating 3D volume assessment of syndesmosis anatomy using WBCT scans. We conducted a retrospective study using previously collected WBCT scans of patients with unilateral syndesmotic instability. 144 bilateral ankle WBCT scans were evaluated (48 unstable, 96 control). We developed three deep learning (DL) models for analyzing WBCT scans to recognize syndesmosis instability. These three models included two state-of-the-art models (Model 1 - 3D convolutional neural network [CNN], and Model 2 - CNN with long short-term memory [LSTM]), and a new model (Model 3 - differential CNN LSTM) that we introduced in this study. Model 1 failed to analyze the WBCT scans (F1-score = 0). Model 2 only misclassified two cases (F1-score = 0.80). Model 3 outperformed Model 2 and achieved a nearly perfect performance, misclassifying only one case (F1-score = 0.91) in the control group as unstable while being faster than Model 2.  ( 3 min )
    VMAS: A Vectorized Multi-Agent Simulator for Collective Robot Learning. (arXiv:2207.03530v1 [cs.RO])
    While many multi-robot coordination problems can be solved optimally by exact algorithms, solutions are often not scalable in the number of robots. Multi-Agent Reinforcement Learning (MARL) is gaining increasing attention in the robotics community as a promising solution to tackle such problems. Nevertheless, we still lack the tools that allow us to quickly and efficiently find solutions to large-scale collective learning tasks. In this work, we introduce the Vectorized Multi-Agent Simulator (VMAS). VMAS is an open-source framework designed for efficient MARL benchmarking. It is comprised of a vectorized 2D physics engine written in PyTorch and a set of twelve challenging multi-robot scenarios. Additional scenarios can be implemented through a simple and modular interface. We demonstrate how vectorization enables parallel simulation on accelerated hardware without added complexity. When comparing VMAS to OpenAI MPE, we show how MPE's execution time increases linearly in the number of simulations while VMAS is able to execute 30,000 parallel simulations in under 10s, proving more than 100x faster. Using VMAS's RLlib interface, we benchmark our multi-robot scenarios using various Proximal Policy Optimization (PPO)-based MARL algorithms. VMAS's scenarios prove challenging in orthogonal ways for state-of-the-art MARL algorithms. The VMAS framework is available at https://github.com/proroklab/VectorizedMultiAgentSimulator. A video of VMAS scenarios and experiments is available at https://youtu.be/aaDRYfiesAY}{here}\footnote{\url{https://youtu.be/aaDRYfiesAY.  ( 3 min )
    AVDDPG: Federated reinforcement learning applied to autonomous platoon control. (arXiv:2207.03484v1 [cs.LG])
    Since 2016 federated learning (FL) has been an evolving topic of discussion in the artificial intelligence (AI) research community. Applications of FL led to the development and study of federated reinforcement learning (FRL). Few works exist on the topic of FRL applied to autonomous vehicle (AV) platoons. In addition, most FRL works choose a single aggregation method (usually weight or gradient aggregation). We explore FRL's effectiveness as a means to improve AV platooning by designing and implementing an FRL framework atop a custom AV platoon environment. The application of FRL in AV platooning is studied under two scenarios: (1) Inter-platoon FRL (Inter-FRL) where FRL is applied to AVs across different platoons; (2) Intra-platoon FRL (Intra-FRL) where FRL is applied to AVs within a single platoon. Both Inter-FRL and Intra-FRL are applied to a custom AV platooning environment using both gradient and weight aggregation to observe the performance effects FRL can have on AV platoons relative to an AV platooning environment trained without FRL. It is concluded that Intra-FRL using weight aggregation (Intra-FRLWA) provides the best performance for controlling an AV platoon. In addition, we found that weight aggregation in FRL for AV platooning provides increases in performance relative to gradient aggregation. Finally, a performance analysis is conducted for Intra-FRLWA versus a platooning environment without FRL for platoons of length 3, 4 and 5 vehicles. It is concluded that Intra-FRLWA largely out-performs the platooning environment that is trained without FRL.  ( 3 min )
    Deep Learning to Jointly Schema Match, Impute, and Transform Databases. (arXiv:2207.03536v1 [cs.DB])
    An applied problem facing all areas of data science is harmonizing data sources. Joining data from multiple origins with unmapped and only partially overlapping features is a prerequisite to developing and testing robust, generalizable algorithms, especially in health care. We approach this issue in the common but difficult case of numeric features such as nearly Gaussian and binary features, where unit changes and variable shift make simple matching of univariate summaries unsuccessful. We develop two novel procedures to address this problem. First, we demonstrate multiple methods of "fingerprinting" a feature based on its associations to other features. In the setting of even modest prior information, this allows most shared features to be accurately identified. Second, we demonstrate a deep learning algorithm for translation between databases. Unlike prior approaches, our algorithm takes advantage of discovered mappings while identifying surrogates for unshared features and learning transformations. In synthetic and real-world experiments using two electronic health record databases, our algorithms outperform existing baselines for matching variable sets, while jointly learning to impute unshared or transformed variables.  ( 2 min )
    A Novel IoT-based Framework for Non-Invasive Human Hygiene Monitoring using Machine Learning Techniques. (arXiv:2207.03529v1 [cs.LG])
    People's personal hygiene habits speak volumes about the condition of taking care of their bodies and health in daily lifestyle. Maintaining good hygiene practices not only reduces the chances of contracting a disease but could also reduce the risk of spreading illness within the community. Given the current pandemic, daily habits such as washing hands or taking regular showers have taken primary importance among people, especially for the elderly population living alone at home or in an assisted living facility. This paper presents a novel and non-invasive framework for monitoring human hygiene using vibration sensors where we adopt Machine Learning techniques. The approach is based on a combination of a geophone sensor, a digitizer, and a cost-efficient computer board in a practical enclosure. Monitoring daily hygiene routines may help healthcare professionals be proactive rather than reactive in identifying and controlling the spread of potential outbreaks within the community. The experimental result indicates that applying a Support Vector Machine (SVM) for binary classification exhibits a promising accuracy of ~95% in the classification of different hygiene habits. Furthermore, both tree-based classifier (Random Forrest and Decision Tree) outperforms other models by achieving the highest accuracy (100%), which means that classifying hygiene events using vibration and non-invasive sensors is possible for monitoring hygiene activity.  ( 3 min )
  • Open

    Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations. (arXiv:2202.10638v2 [stat.ML] UPDATED)
    Data augmentation is commonly applied to improve performance of deep learning by enforcing the knowledge that certain transformations on the input preserve the output. Currently, the used data augmentation is chosen by human effort and costly cross-validation, which makes it cumbersome to apply to new datasets. We develop a convenient gradient-based method for selecting the data augmentation without validation data and during training of a deep neural network. Our approach relies on phrasing data augmentation as an invariance in the prior distribution and learning it using Bayesian model selection, which has been shown to work in Gaussian processes, but not yet for deep neural networks. We propose a differentiable Kronecker-factored Laplace approximation to the marginal likelihood as our objective, which can be optimised without human supervision or validation data. We show that our method can successfully recover invariances present in the data, and that this improves generalisation and data efficiency on image datasets.  ( 2 min )
    $k$-Median Clustering via Metric Embedding: Towards Better Initialization with Differential Privacy. (arXiv:2206.12895v2 [cs.DS] UPDATED)
    When designing clustering algorithms, the choice of initial centers is crucial for the quality of the learned clusters. In this paper, we develop a new initialization scheme, called HST initialization, for the $k$-median problem in the general metric space (e.g., discrete space induced by graphs), based on the construction of metric embedding tree structure of the data. From the tree, we propose a novel and efficient search algorithm, for good initial centers that can be used subsequently for the local search algorithm. Our proposed HST initialization can produce initial centers achieving lower errors than those from another popular initialization method, $k$-median++, with comparable efficiency. The HST initialization can also be extended to the setting of differential privacy (DP) to generate private initial centers. We show that the error from applying DP local search followed by our private HST initialization improves previous results on the approximation error, and approaches the lower bound within a small factor. Experiments justify the theory and demonstrate the effectiveness of our proposed method. Our approach can also be extended to the $k$-means problem.  ( 2 min )
    Predicting Opinion Dynamics via Sociologically-Informed Neural Networks. (arXiv:2207.03990v1 [cs.SI])
    Opinion formation and propagation are crucial phenomena in social networks and have been extensively studied across several disciplines. Traditionally, theoretical models of opinion dynamics have been proposed to describe the interactions between individuals (i.e., social interaction) and their impact on the evolution of collective opinions. Although these models can incorporate sociological and psychological knowledge on the mechanisms of social interaction, they demand extensive calibration with real data to make reliable predictions, requiring much time and effort. Recently, the widespread use of social media platforms provides new paradigms to learn deep learning models from a large volume of social media data. However, these methods ignore any scientific knowledge about the mechanism of social interaction. In this work, we present the first hybrid method called Sociologically-Informed Neural Network (SINN), which integrates theoretical models and social media data by transporting the concepts of physics-informed neural networks (PINNs) from natural science (i.e., physics) into social science (i.e., sociology and social psychology). In particular, we recast theoretical models as ordinary differential equations (ODEs). Then we train a neural network that simultaneously approximates the data and conforms to the ODEs that represent the social scientific knowledge. In addition, we extend PINNs by integrating matrix factorization and a language model to incorporate rich side information (e.g., user profiles) and structural knowledge (e.g., cluster structure of the social interaction network). Moreover, we develop an end-to-end training procedure for SINN, which involves Gumbel-Softmax approximation to include stochastic mechanisms of social interaction. Extensive experiments on real-world and synthetic datasets show SINN outperforms six baseline methods in predicting opinion dynamics.  ( 3 min )
    Variational Inference of overparameterized Bayesian Neural Networks: a theoretical and empirical study. (arXiv:2207.03859v1 [stat.ML])
    This paper studies the Variational Inference (VI) used for training Bayesian Neural Networks (BNN) in the overparameterized regime, i.e., when the number of neurons tends to infinity. More specifically, we consider overparameterized two-layer BNN and point out a critical issue in the mean-field VI training. This problem arises from the decomposition of the lower bound on the evidence (ELBO) into two terms: one corresponding to the likelihood function of the model and the second to the Kullback-Leibler (KL) divergence between the prior distribution and the variational posterior. In particular, we show both theoretically and empirically that there is a trade-off between these two terms in the overparameterized regime only when the KL is appropriately re-scaled with respect to the ratio between the the number of observations and neurons. We also illustrate our theoretical results with numerical experiments that highlight the critical choice of this ratio.  ( 2 min )
    Uniform Consistency in Nonparametric Mixture Models. (arXiv:2108.14003v2 [math.ST] UPDATED)
    We study uniform consistency in nonparametric mixture models as well as closely related mixture of regression (also known as mixed regression) models, where the regression functions are allowed to be nonparametric and the error distributions are assumed to be convolutions of a Gaussian density. We construct uniformly consistent estimators under general conditions while simultaneously highlighting several pain points in extending existing pointwise consistency results to uniform results. The resulting analysis turns out to be nontrivial, and several novel technical tools are developed along the way. In the case of mixed regression, we prove $L^1$ convergence of the regression functions while allowing for the component regression functions to intersect arbitrarily often, which presents additional technical challenges. We also consider generalizations to general (i.e. non-convolutional) nonparametric mixtures.  ( 2 min )
    Optimal sizing of a holdout set for safe predictive model updating. (arXiv:2202.06374v3 [stat.ML] UPDATED)
    Predictive risk scores are increasingly used to guide clinical or other interventions in complex settings, particularly healthcare. Directly updating a risk score used to guide interventions leads to biased risk estimates. We propose updating using a `holdout set' -- a subset of the population that does not receive risk-score-guided interventions -- to prevent this. Since samples in the holdout set do not benefit from risk predictions, its size must trade off performance of the updated risk score whilst minimising the number of held out samples. We prove that this approach outperforms simple alternatives, and by defining a general loss function describe conditions under which an optimal holdout size (OHS) can be readily identified. We introduce parametric and semi-parametric algorithms for OHS estimation and demonstrate their use on a recent risk score for pre-eclampsia. Based on these results, we argue that a holdout set is a safe, viable and easily implemented means to safely update predictive risk scores.  ( 2 min )
    Your Policy Regularizer is Secretly an Adversary. (arXiv:2203.12592v4 [cs.LG] UPDATED)
    Policy regularization methods such as maximum entropy regularization are widely used in reinforcement learning to improve the robustness of a learned policy. In this paper, we show how this robustness arises from hedging against worst-case perturbations of the reward function, which are chosen from a limited set by an imagined adversary. Using convex duality, we characterize this robust set of adversarial reward perturbations under KL and alpha-divergence regularization, which includes Shannon and Tsallis entropy regularization as special cases. Importantly, generalization guarantees can be given within this robust set. We provide detailed discussion of the worst-case reward perturbations, and present intuitive empirical examples to illustrate this robustness and its relationship with generalization. Finally, we discuss how our analysis complements and extends previous results on adversarial reward robustness and path consistency optimality conditions.  ( 2 min )
    A law of adversarial risk, interpolation, and label noise. (arXiv:2207.03933v1 [stat.ML])
    In supervised learning, it has been shown that label noise in the data can be interpolated without penalties on test accuracy under many circumstances. We show that interpolating label noise induces adversarial vulnerability, and prove the first theorem showing the dependence of label noise and adversarial risk in terms of the data distribution. Our results are almost sharp without accounting for the inductive bias of the learning algorithm. We also show that inductive bias makes the effect of label noise much stronger.  ( 2 min )
    On the representation and learning of monotone triangular transport maps. (arXiv:2009.10303v2 [stat.ML] UPDATED)
    Transportation of measure provides a versatile approach for modeling complex probability distributions, with applications in density estimation, Bayesian inference, generative modeling, and beyond. Monotone triangular transport maps$\unicode{x2014}$approximations of the Knothe$\unicode{x2013}$Rosenblatt (KR) rearrangement$\unicode{x2014}$are a canonical choice for these tasks. Yet the representation and parameterization of such maps have a significant impact on their generality and expressiveness, and on properties of the optimization problem that arises in learning a map from data (e.g., via maximum likelihood estimation). We present a general framework for representing monotone triangular maps via invertible transformations of smooth functions. We establish conditions on the transformation such that the associated infinite-dimensional minimization problem has no spurious local minima, i.e., all local minima are global minima; and we show for target distributions satisfying certain tail conditions that the unique global minimizer corresponds to the KR map. Given a sample from the target, we then propose an adaptive algorithm that estimates a sparse semi-parametric approximation of the underlying KR map. We demonstrate how this framework can be applied to joint and conditional density estimation, likelihood-free inference, and structure learning of directed graphical models, with stable generalization performance across a range of sample sizes.  ( 3 min )
    ControlBurn: Nonlinear Feature Selection with Sparse Tree Ensembles. (arXiv:2207.03935v1 [stat.ML])
    ControlBurn is a Python package to construct feature-sparse tree ensembles that support nonlinear feature selection and interpretable machine learning. The algorithms in this package first build large tree ensembles that prioritize basis functions with few features and then select a feature-sparse subset of these basis functions using a weighted lasso optimization criterion. The package includes visualizations to analyze the features selected by the ensemble and their impact on predictions. Hence ControlBurn offers the accuracy and flexibility of tree-ensemble models and the interpretability of sparse generalized additive models. ControlBurn is scalable and flexible: for example, it can use warm-start continuation to compute the regularization path (prediction error for any number of selected features) for a dataset with tens of thousands of samples and hundreds of features in seconds. For larger datasets, the runtime scales linearly in the number of samples and features (up to a log factor), and the package support acceleration using sketching. Moreover, the ControlBurn framework accommodates feature costs, feature groupings, and $\ell_0$-based regularizers. The package is user-friendly and open-source: its documentation and source code appear on https://pypi.org/project/ControlBurn/ and https://github.com/udellgroup/controlburn/.  ( 2 min )
    Bayesian multi-objective optimization for stochastic simulators: an extension of the Pareto Active Learning method. (arXiv:2207.03842v1 [math.OC])
    This article focuses on the multi-objective optimization of stochastic simulators with high output variance, where the input space is finite and the objective functions are expensive to evaluate. We rely on Bayesian optimization algorithms, which use probabilistic models to make predictions about the functions to be optimized. The proposed approach is an extension of the Pareto Active Learning (PAL) algorithm for the estimation of Pareto-optimal solutions that makes it suitable for the stochastic setting. We named it Pareto Active Learning for Stochastic Simulators (PALS). The performance of PALS is assessed through numerical experiments over a set of bi-dimensional, bi-objective test problems. PALS exhibits superior performance when compared to other scalarization-based and random-search approaches.  ( 2 min )
    Supervising the Decoder of Variational Autoencoders to Improve Scientific Utility. (arXiv:2109.04561v3 [stat.ML] UPDATED)
    Probabilistic generative models are attractive for scientific modeling because their inferred parameters can be used to generate hypotheses and design experiments. This requires that the learned model provide an accurate representation of the input data and yield a latent space that effectively predicts outcomes relevant to the scientific question. Supervised Variational Autoencoders (SVAEs) have previously been used for this purpose, where a carefully designed decoder can be used as an interpretable generative model while the supervised objective ensures a predictive latent representation. Unfortunately, the supervised objective forces the encoder to learn a biased approximation to the generative posterior distribution, which renders the generative parameters unreliable when used in scientific models. This issue has remained undetected as reconstruction losses commonly used to evaluate model performance do not detect bias in the encoder. We address this previously-unreported issue by developing a second order supervision framework (SOS-VAE) that influences the decoder to induce a predictive latent representation. This ensures that the associated encoder maintains a reliable generative interpretation. We extend this technique to allow the user to trade-off some bias in the generative parameters for improved predictive performance, acting as an intermediate option between SVAEs and our new SOS-VAE. We also use this methodology to address missing data issues that often arise when combining recordings from multiple scientific experiments. We demonstrate the effectiveness of these developments using synthetic data and electrophysiological recordings with an emphasis on how our learned representations can be used to design scientific experiments.  ( 3 min )
    Fair Exploration via Axiomatic Bargaining. (arXiv:2106.02553v2 [cs.LG] UPDATED)
    Exploration is often necessary in online learning to maximize long-term reward, but it comes at the cost of short-term 'regret'. We study how this cost of exploration is shared across multiple groups. For example, in a clinical trial setting, patients who are assigned a sub-optimal treatment effectively incur the cost of exploration. When patients are associated with natural groups on the basis of, say, race or age, it is natural to ask whether the cost of exploration borne by any single group is 'fair'. So motivated, we introduce the 'grouped' bandit model. We leverage the theory of axiomatic bargaining, and the Nash bargaining solution in particular, to formalize what might constitute a fair division of the cost of exploration across groups. On the one hand, we show that any regret-optimal policy strikingly results in the least fair outcome: such policies will perversely leverage the most 'disadvantaged' groups when they can. More constructively, we derive policies that are optimally fair and simultaneously enjoy a small 'price of fairness'. We illustrate the relative merits of our algorithmic framework with a case study on contextual bandits for warfarin dosing where we are concerned with the cost of exploration across multiple races and age groups.  ( 3 min )
    Test Sample Accuracy Scales with Training Sample Density in Neural Networks. (arXiv:2106.08365v6 [cs.LG] UPDATED)
    Intuitively, one would expect accuracy of a trained neural network's prediction on test samples to correlate with how densely the samples are surrounded by seen training samples in representation space. We find that a bound on empirical training error smoothed across linear activation regions scales inversely with training sample density in representation space. Empirically, we verify this bound is a strong predictor of the inaccuracy of the network's prediction on test samples. For unseen test sets, including those with out-of-distribution samples, ranking test samples by their local region's error bound and discarding samples with the highest bounds raises prediction accuracy by up to 20% in absolute terms for image classification datasets, on average over thresholds.  ( 2 min )
    Bayesian Quantile and Expectile Optimisation. (arXiv:2001.04833v2 [stat.ML] UPDATED)
    Bayesian optimisation (BO) is widely used to optimise stochastic black box functions. While most BO approaches focus on optimising conditional expectations, many applications require risk-averse strategies and alternative criteria accounting for the distribution tails need to be considered. In this paper, we propose new variational models for Bayesian quantile and expectile regression that are well-suited for heteroscedastic noise settings. Our models consist of two latent Gaussian processes accounting respectively for the conditional quantile (or expectile) and the scale parameter of an asymmetric likelihood functions. Furthermore, we propose two BO strategies based on max-value entropy search and Thompson sampling, that are tailored to such models and that can accommodate large batches of points. Contrary to existing BO approaches for risk-averse optimisation, our strategies can directly optimise for the quantile and expectile, without requiring replicating observations or assuming a parametric form for the noise. As illustrated in the experimental section, the proposed approach clearly outperforms the state of the art in the heteroscedastic, non-Gaussian case.  ( 2 min )
    On data-driven chance constraint learning for mixed-integer optimization problems. (arXiv:2207.03844v1 [math.OC])
    When dealing with real-world optimization problems, decision-makers usually face high levels of uncertainty associated with partial information, unknown parameters, or complex relationships between these and the problem decision variables. In this work, we develop a novel Chance Constraint Learning (CCL) methodology with a focus on mixed-integer linear optimization problems which combines ideas from the chance constraint and constraint learning literature. Chance constraints set a probabilistic confidence level for a single or a set of constraints to be fulfilled, whereas the constraint learning methodology aims to model the functional relationship between the problem variables through predictive models. One of the main issues when establishing a learned constraint arises when we need to set further bounds for its response variable: the fulfillment of these is directly related to the accuracy of the predictive model and its probabilistic behaviour. In this sense, CCL makes use of linearizable machine learning models to estimate conditional quantiles of the learned variables, providing a data-driven solution for chance constraints. An open-access software has been developed to be used by practitioners. Furthermore, benefits from CCL have been tested in two real-world case studies, proving how robustness is added to optimal solutions when probabilistic bounds are set for learned constraints.  ( 2 min )
    Layer Adaptive Node Selection in Bayesian Neural Networks: Statistical Guarantees and Implementation Details. (arXiv:2108.11000v2 [stat.ML] UPDATED)
    Sparse deep neural networks have proven to be efficient for predictive model building in large-scale studies. Although several works have studied theoretical and numerical properties of sparse neural architectures, they have primarily focused on the edge selection. Sparsity through edge selection might be intuitively appealing; however, it does not necessarily reduce the structural complexity of a network. Instead pruning excessive nodes leads to a structurally sparse network with significant computational speedup during inference. To this end, we propose a Bayesian sparse solution using spike-and-slab Gaussian priors to allow for automatic node selection during training. The use of spike-and-slab prior alleviates the need of an ad-hoc thresholding rule for pruning. In addition, we adopt a variational Bayes approach to circumvent the computational challenges of traditional Markov Chain Monte Carlo (MCMC) implementation. In the context of node selection, we establish the fundamental result of variational posterior consistency together with the characterization of prior parameters. In contrast to the previous works, our theoretical development relaxes the assumptions of the equal number of nodes and uniform bounds on all network weights, thereby accommodating sparse networks with layer-dependent node structures or coefficient bounds. With a layer-wise characterization of prior inclusion probabilities, we discuss the optimal contraction rates of the variational posterior. We empirically demonstrate that our proposed approach outperforms the edge selection method in computational complexity with similar or better predictive performance. Our experimental evidence further substantiates that our theoretical work facilitates layer-wise optimal node recovery.  ( 3 min )
    Deep Neural Networks for Rank-Consistent Ordinal Regression Based On Conditional Probabilities. (arXiv:2111.08851v3 [cs.LG] UPDATED)
    In recent times, deep neural networks achieved outstanding predictive performance on various classification and pattern recognition tasks. However, many real-world prediction problems have ordinal response variables, and this ordering information is ignored by conventional classification losses such as the multi-category cross-entropy. Ordinal regression methods for deep neural networks address this. One such method is the CORAL method, which is based on an earlier binary label extension framework and achieves rank consistency among its output layer tasks by imposing a weight-sharing constraint. However, while earlier experiments showed that CORAL's rank consistency is beneficial for performance, {it is limited by a weight-sharing constraint in a neural network's fully connected output layer. We propose a new method for rank-consistent ordinal regression without this limitation. Our rank-consistent ordinal regression framework (CORN) achieves rank consistency by a novel training scheme. This training scheme uses} conditional training sets to obtain the unconditional rank probabilities through applying the chain rule for conditional probability distributions. Experiments on various datasets demonstrate the efficacy of the proposed method to utilize the ordinal target information, and the absence of the weight-sharing restriction improves the performance substantially compared to the CORAL reference approach.  ( 3 min )
    Black and Gray Box Learning of Amplitude Equations: Application to Phase Field Systems. (arXiv:2207.03954v1 [stat.ML])
    We present a data-driven approach to learning surrogate models for amplitude equations, and illustrate its application to interfacial dynamics of phase field systems. In particular, we demonstrate learning effective partial differential equations describing the evolution of phase field interfaces from full phase field data. We illustrate this on a model phase field system, where analytical approximate equations for the dynamics of the phase field interface (a higher order eikonal equation and its approximation, the Kardar-Parisi-Zhang (KPZ) equation) are known. For this system, we discuss data-driven approaches for the identification of equations that accurately describe the front interface dynamics. When the analytical approximate models mentioned above become inaccurate, as we move beyond the region of validity of the underlying assumptions, the data-driven equations outperform them. In these regimes, going beyond black-box identification, we explore different approaches to learn data-driven corrections to the analytically approximate models, leading to effective gray box partial differential equations.  ( 2 min )
    Feature Selection Methods for Uplift Modeling and Heterogeneous Treatment Effect. (arXiv:2005.03447v2 [cs.LG] UPDATED)
    Uplift modeling is a causal learning technique that estimates subgroup-level treatment effects. It is commonly used in industry and elsewhere for tasks such as targeting ads. In a typical setting, uplift models can take thousands of features as inputs, which is costly and results in problems such as overfitting and poor model interpretability. Consequently, there is a need to select a subset of the most important features for modeling. However, traditional methods for doing feature selection are not fit for the task because they are designed for standard machine learning models whose target is importantly different from uplift models. To address this, we introduce a set of feature selection methods explicitly designed for uplift modeling, drawing inspiration from statistics and information theory. We conduct empirical evaluations on the proposed methods on publicly available datasets, demonstrating the advantages of the proposed methods compared to traditional feature selection. We make the proposed methods publicly available as a part of the CausalML open-source package.  ( 2 min )
    Understanding Gradual Domain Adaptation: Improved Analysis, Optimal Path and Beyond. (arXiv:2204.08200v2 [cs.LG] UPDATED)
    The vast majority of existing algorithms for unsupervised domain adaptation (UDA) focus on adapting from a labeled source domain to an unlabeled target domain directly in a one-off way. Gradual domain adaptation (GDA), on the other hand, assumes a path of $(T-1)$ unlabeled intermediate domains bridging the source and target, and aims to provide better generalization in the target domain by leveraging the intermediate ones. Under certain assumptions, Kumar et al. (2020) proposed a simple algorithm, Gradual Self-Training, along with a generalization bound in the order of $e^{O(T)} \left(\varepsilon_0+O\left(\sqrt{log(T)/n}\right)\right)$ for the target domain error, where $\varepsilon_0$ is the source domain error and $n$ is the data size of each domain. Due to the exponential factor, this upper bound becomes vacuous when $T$ is only moderately large. In this work, we analyze gradual self-training under more general and relaxed assumptions, and prove a significantly improved generalization bound as $\varepsilon_0+ O \left(T\Delta + T/\sqrt{n}\right) + \widetilde{O}\left(1/\sqrt{nT}\right)$, where $\Delta$ is the average distributional distance between consecutive domains. Compared with the existing bound with an exponential dependency on $T$ as a multiplicative factor, our bound only depends on $T$ linearly and additively. Perhaps more interestingly, our result implies the existence of an optimal choice of $T$ that minimizes the generalization error, and it also naturally suggests an optimal way to construct the path of intermediate domains so as to minimize the accumulative path length $T\Delta$ between the source and target. To corroborate the implications of our theory, we examine gradual self-training on multiple semi-synthetic and real datasets, which confirms our findings. We believe our insights provide a path forward toward the design of future GDA algorithms.  ( 3 min )
    One for All: Simultaneous Metric and Preference Learning over Multiple Users. (arXiv:2207.03609v1 [stat.ML])
    This paper investigates simultaneous preference and metric learning from a crowd of respondents. A set of items represented by $d$-dimensional feature vectors and paired comparisons of the form ``item $i$ is preferable to item $j$'' made by each user is given. Our model jointly learns a distance metric that characterizes the crowd's general measure of item similarities along with a latent ideal point for each user reflecting their individual preferences. This model has the flexibility to capture individual preferences, while enjoying a metric learning sample cost that is amortized over the crowd. We first study this problem in a noiseless, continuous response setting (i.e., responses equal to differences of item distances) to understand the fundamental limits of learning. Next, we establish prediction error guarantees for noisy, binary measurements such as may be collected from human respondents, and show how the sample complexity improves when the underlying metric is low-rank. Finally, we establish recovery guarantees under assumptions on the response distribution. We demonstrate the performance of our model on both simulated data and on a dataset of color preference judgements across a large number of users.  ( 2 min )
    HierarchicalForecast: A Python Benchmarking Framework for Hierarchical Forecasting. (arXiv:2207.03517v1 [stat.ML])
    Large collections of time series data are commonly organized into cross-sectional structures with different levels of aggregation; examples include product and geographical groupings. A necessary condition for coherent decision-making and planning, with such data sets, is for the dis-aggregated series' forecasts to add up exactly to the aggregated series forecasts, which motivates the creation of novel hierarchical forecasting algorithms. The growing interest of the Machine Learning community in cross-sectional hierarchical forecasting systems states that we are in a propitious moment to ensure that scientific endeavors are grounded on sound baselines. For this reason, we put forward the HierarchicalForecast library, which contains preprocessed publicly available datasets, evaluation metrics, and a compiled set of statistical baseline models. Our Python-based framework aims to bridge the gap between statistical, econometric modeling, and Machine Learning forecasting research. Code and documentation are available in https://github.com/Nixtla/hierarchicalforecast.  ( 2 min )
    TF-GNN: Graph Neural Networks in TensorFlow. (arXiv:2207.03522v1 [cs.LG])
    TensorFlow GNN (TF-GNN) is a scalable library for Graph Neural Networks in TensorFlow. It is designed from the bottom up to support the kinds of rich heterogeneous graph data that occurs in today's information ecosystems. Many production models at Google use TF-GNN and it has been recently released as an open source project. In this paper, we describe the TF-GNN data model, its Keras modeling API, and relevant capabilities such as graph sampling, distributed training, and accelerator support.  ( 2 min )
    Complementing Brightness Constancy with Deep Networks for Optical Flow Prediction. (arXiv:2207.03790v1 [cs.CV])
    State-of-the-art methods for optical flow estimation rely on deep learning, which require complex sequential training schemes to reach optimal performances on real-world data. In this work, we introduce the COMBO deep network that explicitly exploits the brightness constancy (BC) model used in traditional methods. Since BC is an approximate physical model violated in several situations, we propose to train a physically-constrained network complemented with a data-driven network. We introduce a unique and meaningful flow decomposition between the physical prior and the data-driven complement, including an uncertainty quantification of the BC model. We derive a joint training scheme for learning the different components of the decomposition ensuring an optimal cooperation, in a supervised but also in a semi-supervised context. Experiments show that COMBO can improve performances over state-of-the-art supervised networks, e.g. RAFT, reaching state-of-the-art results on several benchmarks. We highlight how COMBO can leverage the BC model and adapt to its limitations. Finally, we show that our semi-supervised method can significantly simplify the training procedure.  ( 2 min )
    A Non-isotropic Probabilistic Take on Proxy-based Deep Metric Learning. (arXiv:2207.03784v1 [cs.LG])
    Proxy-based Deep Metric Learning (DML) learns deep representations by embedding images close to their class representatives (proxies), commonly with respect to the angle between them. However, this disregards the embedding norm, which can carry additional beneficial context such as class- or image-intrinsic uncertainty. In addition, proxy-based DML struggles to learn class-internal structures. To address both issues at once, we introduce non-isotropic probabilistic proxy-based DML. We model images as directional von Mises-Fisher (vMF) distributions on the hypersphere that can reflect image-intrinsic uncertainties. Further, we derive non-isotropic von Mises-Fisher (nivMF) distributions for class proxies to better represent complex class-specific variances. To measure the proxy-to-image distance between these models, we develop and investigate multiple distribution-to-point and distribution-to-distribution metrics. Each framework choice is motivated by a set of ablational studies, which showcase beneficial properties of our probabilistic approach to proxy-based DML, such as uncertainty-awareness, better-behaved gradients during training, and overall improved generalization performance. The latter is especially reflected in the competitive performance on the standard DML benchmarks, where our approach compares favorably, suggesting that existing proxy-based DML can significantly benefit from a more probabilistic treatment. Code is available at github.com/ExplainableML/Probabilistic_Deep_Metric_Learning.  ( 2 min )
    Nonparametric Embeddings of Sparse High-Order Interaction Events. (arXiv:2207.03639v1 [cs.LG])
    High-order interaction events are common in real-world applications. Learning embeddings that encode the complex relationships of the participants from these events is of great importance in knowledge mining and predictive tasks. Despite the success of existing approaches, e.g. Poisson tensor factorization, they ignore the sparse structure underlying the data, namely the occurred interactions are far less than the possible interactions among all the participants. In this paper, we propose Nonparametric Embeddings of Sparse High-order interaction events (NESH). We hybridize a sparse hypergraph (tensor) process and a matrix Gaussian process to capture both the asymptotic structural sparsity within the interactions and nonlinear temporal relationships between the participants. We prove strong asymptotic bounds (including both a lower and an upper bound) of the sparsity ratio, which reveals the asymptotic properties of the sampled structure. We use batch-normalization, stick-breaking construction, and sparse variational GP approximations to develop an efficient, scalable model inference algorithm. We demonstrate the advantage of our approach in several real-world applications.  ( 2 min )

  • Open

    [P] A Website to generate Code Snippets, Regexes, Linux & Git & SQL Commands, HTML and CSS from a written description. Furthermore translate code snippets to many languages and get a regex explained in plain english. Moreover you can fix broken code snippets. All with the help of ML 🤖
    https://reddit.com/link/vw3tkf/video/xe0t4pumpta91/player https://reddit.com/link/vw3tkf/video/7pf9dl3npta91/player Programming Function from Description Code to Explanation Fix invalid Code Translate Languages Class from Description Get Language from Code Function from Docstring Helpers Regex from Description Regex to Explanation Linux Command Get time complexity Git Command from Description Database Text Description to SQL Command Web Generate HTML from Description CSS from Description Meta Tags from Description I think this could be helpful to a lot of people (especially for beginner programmers). You can check out all functionalities on your own here: programming-helper.com Have fun using the tool ❤️ submitted by /u/Capital_Revolution35 [link] [comments]  ( 86 min )
    [R] META first neural view synthesis method for VR / passthrough AR
    submitted by /u/SpatialComputing [link] [comments]  ( 86 min )
    [Project] Parakeet — Copilot for Colab
    Hello! I've long been a big fan of GitHub Copilot — I've used it for a while now, and I find it super helpful for all sorts of things. But Copilot doesn't work in Colab or Jupyter notebooks, even though that's where a ton of ML and data science code is written. Parakeet is a Chrome extension that provides Copilot-like code suggestions for notebooks. I've been using Parakeet for my own needs for a bit, and I'm already getting a lot of mileage out of it. Just the other day, for example, I wanted to make a Seaborn plot but wasn't sure how. I wrote a short comment, Parakeet suggested some code, and the code worked on the first try! Installation Install from the Chrome Web Store View source code You'll need an email to sign up. Parakeet is currently free to use for everyone, though that may change once OpenAI introduces pricing for Codex. Demos Generating code to plot a sine wave. Plotting a heat map. All I had to do was write some comments — Parakeet's suggested code worked on the first try. Limitations Parakeet currently only works for Colab, though I'm considering extending Parakeet to support Jupyter. If you want to use Parakeet outside Colab, I'd love to hear about your use case! You can file an issue on GitHub or you can email me at [ericyu3@gmail.com](mailto:ericyu3@gmail.com). To keep things simple, Parakeet only makes suggestions when you are at the end of a line, and Parakeet never makes multi-line suggestions. How it works Parakeet uses OpenAI's Codex model, which is the same model that powers GitHub Copilot. Parakeet does not have access to Colab's internal state. Instead, Parakeet continuously parses Colab's HTML to extract cell contents and determine what row and column your cursor is on. This approach was finicky to get working, but I was able to get it to work reliably and with little performance penalty. Your code is never stored or logged. After a suggestion is generated, the input is immediately discarded. submitted by /u/ericyu3 [link] [comments]  ( 88 min )
    Preparing Machine Learning Interview [D]
    Hi Everyone, I am preparing for the Machine Learning Engineer job. Now I am learning Data Structure & Algorithms along with coding problem-solving which are documented on my GitHub. I have 4 months' time. I am looking for remote/onsite jobs in Europe or anywhere. Any tips and suggestions are highly appreciated. Please! Sure, We can learn together. Here is my email: [lewissarron@gmail.com](mailto:lewissarron@gmail.com), if you are interested in the same. submitted by /u/Sandwich-Express [link] [comments]  ( 85 min )
    [D] Any french Corpus like ALECTOR for simplification task?
    Hello, the title says it all. I'm trying to find any ressources (mainly aligned corpus) that could be helpful in identifying and simplifying complex sentences in French. ALECTOR is the only one I stumbled upon. Do you have any resources or tips? I was wondering if searching for book and their simplified version could be useful but I fear it would be more like learning to translate old french into modern french. submitted by /u/Sacrezar [link] [comments]  ( 85 min )
    [D] Reimplementing an Object Detection Model.
    How hard is it to reimplement an object detection model to reproduce the results on benchmarks like COCO. Lets take the DINO architecture or even some yolo v4-7 Model. How hard is it to build it from scratch to reach COCO results reported by the paper or official implementations? submitted by /u/SeucheAchat9115 [link] [comments]  ( 86 min )
    [R] mixed reality future — see the world through artistic lenses — made with NeRF
    submitted by /u/SpatialComputing [link] [comments]  ( 89 min )
    [D] Interpreting Attention Weights
    I have seen in many papers, specially in Deep learning applications in medical imaging, that they interpret attention weights as something like interaction between features (ie. Feature Interaction). But, every time you train the model wouldn't you get new weights? Then, how does this interoperability holds any value if the weights keep changing everytime you run it? submitted by /u/Labib666Camp [link] [comments]  ( 87 min )
    [D] What's the problem with Self-driving cars? Is it a lack of data or do we need a new technology breakthrough?
    I mean there was a time when everyone thought that in a few years we would have self-drive cars. We just need more data and computing and we'll get it. But now Google has more than 20m miles on a public road and much more in simulations. And Tesla has a lot of cars that collect data on the road. But it's still not there so what is missing? Do we need a new technology breakthrough or it's just more data and computing power? submitted by /u/yosefschwartz [link] [comments]  ( 109 min )
    [D] Noam Chomsky on LLMs and discussion of LeCun paper (MLST)
    "First we should ask the question whether LLM have achieved ANYTHING, ANYTHING in this domain. Answer, NO, they have achieved ZERO!" - Noam Chomsky "There are engineering projects that are significantly advanced by [#DL] methods. And this is all the good. [...] Engineering is not a trivial field; it takes intelligence, invention, [and] creativity these achievements. That it contributes to science?" - Noam Chomsky "There was a time [supposedly dedicated] to the study of the nature of #intelligence. By now it has disappeared." Earlier, same interview: "GPT-3 can [only] find some superficial irregularities in the data. [...] It's exciting for reporters in the NY Times." - Noam Chomsky "It's not of interest to people, the idea of finding an explanation for something. [...] The [original #AI] field by now is considered old-fashioned, nonsense. [...] That's probably where the field will develop, where the money is. [...] But it's a shame." - Noam Chomsky Thanks to Dagmar Monett for selecting the quotes! Sorry for posting a controversial thread -- but this seemed noteworthy for /machinelearning Video: https://youtu.be/axuGfh4UR9Q -- also some discussion of LeCun's recent position paper submitted by /u/timscarfe [link] [comments]  ( 104 min )
  • Open

    Google AI Proposes ‘MLGO’: A Machine Learning Guided Compiler Optimization Python Framework
    Since the invention of modem computers, there has been a constant demand for optimization and speedier code compilation. Large data center programs can benefit much from optimization, but mobile and embedded systems, as well as software installed on protected boot partitions, need reduced code. As the area has developed, the headroom has been severely constrained by ever-complicated heuristics, preventing the maintenance and additional advancements. Recent studies have demonstrated that compiler optimization can significantly benefit by substituting ML strategies for complex heuristics. Adopting ML in all-purpose, industrial-strength compilers is still tricky, nevertheless. To solve this problem, a group of Google Research engineers has presented “MLGO: a Machine Learning Guided Compiler Optimizations Framework,” the first-ever broad industrial-grade framework for systematically integrating ML approaches with LLVM. LLVM is a well-known open-source industrial compiler infrastructure that creates critical high-performance software. To train neural networks to make decision policies that can replace heuristics in LLVM, MLGO uses reinforcement learning. The team has disclosed two MLGO optimizations for LLVM, the first involving inlining to reduce code size and the second involving register allocation to enhance code performance. Both improvements may be found in the LLVM source and have been used in real-world applications. Continue reading | Checkout the paper, github, demo and ref article. https://i.redd.it/tzobkzw6lta91.gif submitted by /u/ai-lover [link] [comments]  ( 85 min )
    [Question] Using planning on vimgolf (fewest vim commands to produce a given text) - feasability and design
    Today, I learned about https://www.vimgolf.com/ and thought that it looked somewhat like a planning problem (find the shortest sequence of text-manipulating commands that produces a certain text, the goal state). So I want to try to use planning algorithms to solve vimgolf problems. questions (For details, see spec) 1) Are current planning framework able to solve these problem instances? Considering that there are 10-20 vim commands I want to initially support, and problem instances like this: https://www.vimgolf.com/challenges/9v00619554dd000000000216. 2) Vim commands can be composed, for example 2dw deletes the next word, two times. One way to model this is that the agent could use the action 2, then d, then w, where only the last actions transforms the text (by deleting two words). Is that a good idea? 3) Which tools could I use for this task? So far, fast-downward seemed to be an option, or using one of the many solvers for STRIPS. However, I am a bit lost - I don't want any fancy stuff, I just want a planner that outputs a short sequence of vim-commands. spec Input: Text A and Text B. Output: A sequence of vim commands that transforms A to B. Minimize: Length of the command sequence. How I want to model the state: Text: String[][] lines Int cursor // maybe some vim-specific state, like the current mode How I find out if I come closer to the goal state: I thought about using some metric that uses 1) The Levenstein distance of the text and 2) the number of lines. submitted by /u/Proper_Elk_1726 [link] [comments]  ( 85 min )
    AI Generated Art with starryai
    With starryai you can generate art inspired by real life artists on your phone! submitted by /u/Keni9089 [link] [comments]  ( 84 min )
    mixed reality future — see the world through artistic lenses — made with NeRF
    submitted by /u/SpatialComputing [link] [comments]  ( 86 min )
    Created a completely AI generated comic page, images are all from different Midjourney prompts and the text is from OpenAI. I just stitched the various images together in Photoshop and added the text.
    submitted by /u/Albertrech [link] [comments]  ( 86 min )
    AI Dream 47 - Sacred House of Spirits vqgan clip
    submitted by /u/LordPewPew777 [link] [comments]  ( 84 min )
    Fairy's Pure Beauty | Cinematic 4K 24 FPS (FILM)
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 84 min )
  • Open

    Adding a tiny trap to stop chaos
    The tent map is a famous example of a chaotic function. We will show how a tiny modification of the tent maps retains continuity of the function but prevents chaos. The tent map is the function f: [0, 1] → [0, 1] defined by This map has an unstable fixed point at x0 = 2/3.  […] Adding a tiny trap to stop chaos first appeared on John D. Cook.  ( 6 min )
    Ducci sequences
    Pick four integers a, b, c, and d. Now iterate the procedure that takes these four integers to |b – a|, |c – b|, |d – c|, |a – d| You could think of the four integers being arranged clockwise in a circle, taking the absolute value of difference between each number and its neighbor […] Ducci sequences first appeared on John D. Cook.  ( 5 min )
  • Open

    example on a real photo, the algorithm was not planned for a real photo.
    submitted by /u/vlad_ma [link] [comments]  ( 84 min )
    Free Idea: Detecting GPT-3 Plagiarism, with GPT-3?
    submitted by /u/Gereshes [link] [comments]  ( 84 min )
    Can you give me some pointers
    ​ The point is for the player that spawn at the bottom to go to the target. But at the moment the basically go in straight lines somewhere around the target. But in the future i want to control the target with the mouse so I want the dots to follow the target like planets spinning around around a black hole (basically they should follow the target , not snipe it ) The target moves bouncing on the wall diagonally. Now I choose the Neural net structure as 6, 8, 4. The processing is being done by this... ArrayList process(PVector pos, PVector vel, PVector acc) { PVector target = new PVector(Main.goal.x, Main.goal.y); ArrayList input = new ArrayList (Arrays.asList(pos.x, pos.y, vel.x, vel.y, target.x, target.y)); for (int i = 0; i = 1) input.set(i, 1 f); } return input; } And the 4 nodes as output : ArrayList ans = nn.process(pos, vel, Main.goal);nn.step++;//Interpret ansfloat up = ans.get(0);float down = ans.get(1);float right = ans.get(2);float left = ans.get(3);int x, y;if (up > down) x = -1;else x = 1;if (right > left) y = 1;else y = -1;if (up == down) y = 0;if (right == left) x = 0;acc = new PVector(x, y); vel.add(acc);vel.limit(5);pos.add(vel); The formula I use at the moment is public void caculateFitness() { //close to goal means how many times was the dot closer than 10 of the targetif (reachedGoal) {fitness = 5000 + 10f * closeToGoal;} else {fitness = 10 * closeToGoal;}} IN THE END: I want some suggestion in changing the formula , maybe the structure or something else submitted by /u/LaserDenis [link] [comments]  ( 85 min )
  • Open

    Why do Policy Gradient Methods work so well in Cooperative MARL? Evidence from Policy Representation
    In cooperative multi-agent reinforcement learning (MARL), due to its on-policy nature, policy gradient (PG) methods are typically believed to be less sample efficient than value decomposition (VD) methods, which are off-policy. However, some recent empirical studies demonstrate that with proper input representation and hyper-parameter tuning, multi-agent PG can achieve surprisingly strong performance compared to off-policy VD methods. Why could PG methods work so well? In this post, we will present concrete analysis to show that in certain scenarios, e.g., environments with a highly multi-modal reward landscape, VD can be problematic and lead to undesired outcomes. By contrast, PG methods with individual policies can converge to an optimal policy in these cases. In addition, PG methods wit…  ( 5 min )
  • Open

    Help with navigating a non-changing 3D environment with only camera / pixel information
    Hello! I am trying to train an agent to navigate to a specific point in a 3d environment. The agent will start off at a random location in the environment and must navigate to the same goal each time. The agent only has access to a front facing camera, so no collision / environment data The action space is Forward, left, backward, and right, along with look left and look right The observation space is a 200x200x1 image (grayscale) of what the agent can see Right now, a positive reward is given for movement and a negative reward is given for cancelling movement commands (e.g. trying to move forward and backward at the same time). A large positive reward is given if it reaches the goal. I am training with A2C and CNN using stable baselines. How do I go about incentivizing the agent to explore the environment? With the current reward function, it eventually just defaults to moving in 1 direction to accumulate reward. Are there any algorithms that can use the observation space and determine if the agent has already been at that specific location? Then I could assign a negative reward to staying in the same spot / getting stuck, which should allow it to eventually find the goal location Thanks for any tips / resources in advance! submitted by /u/Sandals5476 [link] [comments]  ( 87 min )

  • Open

    280+ AI tools for digital artists
    280+ AI tools for artists in one place. AI Library for artists Our team has created the biggest library of AI tools for digital artists, NFT creation and metaverse content. It's free and updated daily. We would really appreciate your feedback. As new mind blowing tools appear everyday, and we decided it would be useful to have a single place to have them all together with description and examples. In the end of July we will run a series of free workshops on how AI can be used by artists, so if you are interested to attend and try some tools, please join our waitlist, we announce Alfa soon. https://reddit.com/link/vvedsg/video/pzkzgy91sma91/player submitted by /u/Worldly_Apricot_1512 [link] [comments]  ( 84 min )
    행동하는 소녀!
    submitted by /u/VIRUS-AOTOXIN [link] [comments]  ( 83 min )
    Why does VQGAN+CLIP produce much worse results than Dalle-mini?
    Both models are using the vqgan_imagenet_f16_16384 model. I'm not sure what Dalle-mini does differently, but the results it produces are so much better. VQGAN+CLIP produces results that don't have anything in focus, even if the prompt is just a single object. I'm not sure if this is because of the augmentation randomization (affine, sharpness, color jitter) or not. For example, here are the results of both models' results on the prompt "an art deco car driving down the street": ​ dalle-mini ​ vqgan+clip: what even is this? and why does it keep producing abstract-looking art? submitted by /u/impurekitkat [link] [comments]  ( 84 min )
    "Voldemort" AI Art created on pixelz.ai
    submitted by /u/PixelzJ [link] [comments]  ( 84 min )
    First-Ever Course on Transformers: NOW PUBLIC
    CS 25: Transformers United https://preview.redd.it/p4nuskhwlla91.png?width=350&format=png&auto=webp&s=711b3587ab93f4d024c4841462181dfbaa49863c Did you grow up wanting to play with robots that could turn into cars? While we can't offer those kinds of transformers, we do have a course on the class of deep learning models that have taken the world by storm. Announcing the public release of our lectures from the first-ever course on Transformers: CS25 Transformers United (http://cs25.stanford.edu) held at Stanford University. Our intro video is out and available to watch here 👉: YouTube Link Bookmark and spread the word 🤗! (Twitter Thread) Speaker talks out starting Monday ... submitted by /u/DragonLord9 [link] [comments]  ( 84 min )
    Who wants an invite to midjourney
    I have tons of invites to hand out so who needs one! submitted by /u/Concept_Sir [link] [comments]  ( 85 min )
    Do we need AI to be able to handle the huge amount of scientific information?
    Scientific knowledge is increasing exponentially and the amount of research papers published on any given day is really too much for a human to understand. Once AI has gotten more transparent and error free and more intelligent, I can imagine that to be helpful with handling lots of data... Are there any approaches for using AI to handle that kind of information? submitted by /u/greentea387 [link] [comments]  ( 88 min )
    The many faces of Bozzer 🇬🇧
    submitted by /u/pixelz_ai [link] [comments]  ( 84 min )
    AI can use your brainwaves to see things that you can't
    submitted by /u/jormungandrsjig [link] [comments]  ( 84 min )
    Oil On Canvas Painting of Beautiful Scenery | 4K 24 FPS (FILM)
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 84 min )
    made with StarryAI
    submitted by /u/rikusorasephiroth [link] [comments]  ( 83 min )
    Next-Level Model Investigation: Midjourney, Disco Diffusion, DALL-E Flow
    submitted by /u/laul_pogan [link] [comments]  ( 83 min )
    Is there an AI that generates new words/languages?
    Like the title says, I was wondering if something like that exists. All I could find around by searching this is about generating a normal English text. I don't have the skills right now for doing it myself (I wish I had them, really) and I was wondering if maybe something like that already exist around. submitted by /u/IKB191 [link] [comments]  ( 84 min )
  • Open

    Guided Cost Learning
    Hello everyone, I have a question regarding the IRL algorithm proposed by Finn et. al. in [1], I was wondering if the method is a model-based or model-free IRL? To my knowledge model-based methods are methods which model the transition probability p(s'|s, a). In different papers i find this method to be classified both as model-based [2] and model-free method [3]. The method assumes unknown dynamics of the system => would be a model free approach However the method is based on maximum entropy IRL optimization and guided policy search RL which are both model-based approaches. Maybe I have mixed up some of the stuff and sorry for that. Any help would be greatly appreciated 😅 Thanks in advance. [1] https://arxiv.org/abs/1603.00448 [2] https://www.sciencedirect.com/science/article/abs/pii/S1367578820300511 [3] https://link.springer.com/article/10.1007/s11063-017-9702-7 submitted by /u/nuki96 [link] [comments]  ( 85 min )
    Why are Multi Arm Bandits Important?
    Hey guys, Im starting out my work in Multi Arm Bandits and their applications, and I find it to be extremely theoretical. A lot of algorithms require strict assumptions to work. Why is MAB considered important when a lot of Deep RL tasks perform well on real world scenarios (without the need for regret bounds)? What is the application of MAB outside theoretical proofs? The topic seems to be more mathematics than CS so Im curious how people feel. I know theres applications in scheduling and operations research, but do you think the theoretical aspects from MAB can be used to improve DeepRL tasks like in games? And those of you working on the topic, have you tried Deep RL, and if so what do you think of your work and whats been done there? Is MAB a completely separate field. For example Ive seen Computer Vision, NLP, AND Deep RL being combined in any order, but none of them do anything to do with bandits. Do you think these topics could find a common application? What is the current research trend? Ive not seen papers use anything but UCB or Thompson Sampling, so what do you work on? Finally, are there any recent works where MAB are combined with Deep Learning. Im trying to find a balance between the theory and research, but Im finding the proof and bounds to be a tedious task. submitted by /u/Bibbidi_Babbidi_Boo [link] [comments]  ( 85 min )
    Deepmind AI Researchers Introduce ‘DeepNash’, An Autonomous Agent Trained With Model-Free Multiagent Reinforcement Learning That Learns To Play The Game Of Stratego At Expert Level
    For several years, the Stratego board game has been regarded as one of the most promising areas of research in Artificial Intelligence. Stratego is a two-player board game in which each player attempts to take the other player’s flag. There are two main challenges in the game. 1) There are 10535 potential states in the Stratego game tree. 2) Each player in this game must consider 1066 possible deployments at the beginning of the game. Due to the various complex components of the game’s structure, the AI research community has made minimal progress in this area. This research introduces DeepNash, an autonomous agent that can develop human-level expertise in the imperfect information game Stratego from scratch. Regularized Nash Dynamics (R-NaD), a principled, model-free reinforcement learning technique, is the prime backbone of DeepNash. DeepNash achieves an ε-Nash equilibrium by integrating R-NaD with deep neural network architecture. A Nash equilibrium ensures that the agent will perform well even when faced with the worst-case scenario opponent. The stratego game and a description of the DeepNash technique are shown in Figure 1. Continue reading | Checkout the paper submitted by /u/ai-lover [link] [comments]  ( 85 min )
    [D] How to disable an action for a step
    Some action cannot be done and env gives such information. For example in SC2 the agent cannot train a unit if it doesn't have enough resources. How to prevent him for taking invalid action during exploration? I don't want to punish him with negative reward, because he may think that it's bad to do it. submitted by /u/CppMaster [link] [comments]  ( 85 min )
    I want to learn RL for a project can you suggest some sources from which I can learn?
    I want a crash course or something so that I can just get the knowledge that I need to apply in my project submitted by /u/RightLemon8889 [link] [comments]  ( 84 min )
    In general, are there any specific advantages of Multi Agent Reinforcement Learning w.r.t. to simple RL in terms of convergence, variance/bias or any other metric ?
    Also like if multiple agents can coordinate and learn a better policy in large state-action space. Also do MARL improve upon stability robustness etc ? submitted by /u/aabra__ka__daabra [link] [comments]  ( 84 min )
  • Open

    [P] CaiT Implementation in Flax
    An open-source implementation of the Going deeper with Image Transformers research paper in Google's JAX and Flax. "The paper also notes the difficulty in training vision transformers at greater depths and proposes two solutions. First, it proposes to do per-channel multiplication of the output of the residual block. Second, it proposes to have the patches attend to one another, and only allow the CLS token to attend to the patches in the last few layers." - Lucid Github repository for the Flax / JAX model: https://github.com/conceptofmind/CaiT-Flax CaiT Research Paper: https://arxiv.org/abs/2103.17239 Official PyTorch repository: https://github.com/rwightman/pytorch-image-models In collaboration with Lucid: https://github.com/lucidrains submitted by /u/EnricoShippole [link] [comments]  ( 85 min )
    [N] Designing Arithmetic Circuits with Deep Reinforcement Learning | NVIDIA Technical Blog
    submitted by /u/norcalnatv [link] [comments]  ( 86 min )
    [R] How to use ML to predict time of life remaining on a physical asset of the input data has had all its failed samples scrubbed away?
    So I'm in a bit of a conundrum. I'm working on my PhD thesis regarding the management of physical assets (make a decision on whether to replace the asset or refurbish it or to leave it alone). The first step to doing this is to predict the estimated time of life for each asset and I wish to use ML to do this. Each asset in my dataset has an installation date and a couple of input features (results of testing, characteristics of the asset, etc) The problem is the dataset I have doesn't have any of the failed assets. Meaning that I am finding it very hard to set up an error term for the estimated time of life during training of the model. Ideally, I should have failed samples and non-failed samples in my data but I only have the latter. How should I go about setting this up? I've been trying for the past couple of months to get my hands on failed samples but I haven't had any luck. submitted by /u/DrSkoolie [link] [comments]  ( 93 min )
    [P] May the best explanation win: A tutorial on benchmarking and tuning model explanations with pytorch-grad-cam
    The new release of the pytorch-grad-cam project focuses on metrics for the model explanations. It's often exciting to see model explanations, and tempting to interpret them and get insights about what the model is doing. And a lot of times it is very useful. However this has to be done with care - the model explanations can be wrong, or sub optimal. As shown in many papers, sometimes random explanations perform better. So it's useful to have metrics that measure the quality of the explanations for an image, and sanity checks about them. This can be used both for getting some trust in the explanation before using it, but also for tuning the explanation and getting the best one for a given image (for example by checking different methods). ​ This notebook gives a thorough overview of the different metrics used in the literature, issues with them, using sanity checks (like the Sobel Edge Detector, or a random CAM), and most importantly shows how to use them to chose and tune the explanation in practice. https://github.com/jacobgil/pytorch-grad-cam/blob/master/tutorials/CAM%20Metrics%20And%20Tuning%20Tutorial.ipynb ​ The motivation here is to both make it easier for researchers to benchmark new algorithms, but also (maybe more importantly) when using the model explanations to tune them, get the most out of them, and find problems with them. submitted by /u/jacobgil [link] [comments]  ( 86 min )
    [R] PrefixRL: Optimization Of Parallel Prefix Circuits Using Deep Reinforcement Learning
    submitted by /u/EducationalCicada [link] [comments]  ( 85 min )
    [N] First-Ever Course on Transformers: NOW PUBLIC
    CS 25: Transformers United https://preview.redd.it/1st4o3tvtha91.png?width=350&format=png&auto=webp&s=e4416da38001692989304e980dd4d61d23a74398 Did you grow up wanting to play with robots that could turn into cars? While we can't offer those kinds of transformers, we do have a course on the class of deep learning models that have taken the world by storm. Announcing the public release of our lectures from the first-ever course on Transformers: CS25 Transformers United (http://cs25.stanford.edu) held at Stanford University. Our intro video is out and available to watch here 👉: YouTube Link Bookmark and spread the word 🤗! (Twitter Thread) Speaker talks out starting Monday ... submitted by /u/DragonLord9 [link] [comments]  ( 89 min )
    [D] When did tech companies start to publish ML papers and why?
    I never fully understood the need for tech companies to publish research papers at big conferences. I think before the 2000s, tech companies were very secretive about their work. I mean, you wouldn't expect Microsoft to publish their latest research on their own motherboard at some conferences right? Nowadays all of them are trying to advertise their latest tech in research papers that could possibly be replicated by anyone around the world. This is especially visible in ML. Also it almost seems as if they don't have a goal in mind. A lot of the research papers (outside of those big models such as DALL-E) seem to be VERY random to me, hardly even related to their business interests. How did it become this way and what is their motivation? submitted by /u/fromnighttilldawn [link] [comments]  ( 99 min )
  • Open

    The neural network that I promised
    ​ https://preview.redd.it/uzt5qo20jia91.png?width=992&format=png&auto=webp&s=7061f894d96b5691391cb537f5791706a98cda04 https://preview.redd.it/jma65o20jia91.png?width=667&format=png&auto=webp&s=b5b2c3e44756e73efbb1a1033e42efc9504da158 https://sourceforge.net/projects/image-enlarger-free/ submitted by /u/vlad_ma [link] [comments]  ( 84 min )
  • Open

    Privacy-Preserving Synthetic Educational Data Generation. (arXiv:2207.03202v1 [cs.CY])
    Institutions collect massive learning traces but they may not disclose it for privacy issues. Synthetic data generation opens new opportunities for research in education. In this paper we present a generative model for educational data that can preserve the privacy of participants, and an evaluation framework for comparing synthetic data generators. We show how naive pseudonymization can lead to re-identification threats and suggest techniques to guarantee privacy. We evaluate our method on existing massive educational open datasets.  ( 2 min )

  • Open

    Is it possible to find a job in AI that is flexible enough I can pick up my three young children from school and not work from 2-5 M-F
    submitted by /u/CloudAtlas-2019 [link] [comments]  ( 84 min )
    An AI for students success prediction in academics.
    As the title asks. is there? submitted by /u/Psychological_Ad5132 [link] [comments]  ( 85 min )
    I complete my postgraduation in Cognitive Neuroscience and Im really interested in AI
    I want to select a good interesting topic in Artificial inteligence my background is cognitive neuorscience so i want some good topics that still AI field lacking so i thought Casual reasoning or any other topics from cognitive psychology side that help atleast in theory to implent in AI in future. What you guys think about Causal reasoning topics do u think AI lacks in that factor? submitted by /u/Cute_Understanding89 [link] [comments]  ( 86 min )
    Which countries have the highest demand for NLP engineers?
    I'm an AI master's student who is soon going to graduate. Although I have dealt with image processing and time series, I mainly focused on NLP when it came to projects. Hence, I am looking for employment in an environment that plays to my strengths. I am interested in hearing both personal opinions and hard data about which countries have a high demand for natural language processing. submitted by /u/Blutorangensaft [link] [comments]  ( 84 min )
    A small example from Tacotron2 trained on Brandon "Atrioc" Ewing
    submitted by /u/Phat_N_Sassy33 [link] [comments]  ( 84 min )
    Meta releases open source audio AI systems for more realistic VR and AR sound
    submitted by /u/henlo_there_fren [link] [comments]  ( 84 min )
    Systems courses for ml
    Hello everyone. Hope you are having a great time. I recently started a minor in cs. My final goal is to shift to ML and AI research or working in industry. In my minor courses I can choose one systems course, like computer systems as a prerequisite for operating systems for example. Though I'm choosing analysis of algorithm as it's obviously more important , I wanna know how important it is for someone who wants to work in AI as a machine learning engineer, data scientist or a researcher to take systems courses? would appreciate any answer. submitted by /u/BeneficialCharity8 [link] [comments]  ( 84 min )
    Analogybot.wtf: generate strange, funny, non-sensical and sometimes frighteningly accurate analogies with DaVinci AI.
    submitted by /u/syverlauritz [link] [comments]  ( 84 min )
    Moving Beyond Mimicry in Artificial Intelligence
    submitted by /u/estasfuera [link] [comments]  ( 83 min )
    Need advice to research!
    Hello everyone, I want to do some research and publish a paper ASAP in anomaly/fault detection using DL/NN & new to publications. I`m going through a lot of papers from conferences( A*) to analyze, but I am ending up in a constant loop. could anyone please provide your insights on how to target a conference with novel problems? submitted by /u/Bugfixer231 [link] [comments]  ( 84 min )
    Is there an AI that describes images?
    Like the title suggests, is there an AI that detects objects and total situation of a picture and puts those into words? submitted by /u/tastyogurt [link] [comments]  ( 84 min )
    UC Berkeley Researchers Introduce ‘Autocast’, A New Dataset For Measuring Machine Learning ML Models’ Forecasting Ability
    In this research article, the researchers from UC Berkeley demonstrated that extracting from a sizable news corpus may effectively train language models on prior predicting problems. Forecasting is a process that makes educated projections using previous data as inputs when identifying the direction of future trends. Forecasting future events in the real world, including pandemics, the economy, or the environment, is still complex but essential. Because dynamic information processing is a crucial component of efficient forecasting, AI researchers are considering using strong large-scale language models to automate these processes. Researchers present a dataset with tens of thousands of forecasting questions and a date-based news corpus in the new paper Forecasting Future World Events with Neural Networks. They also curate IntervalQA, a dataset with numerical questions and metrics for calibration. Continue reading | Checkout the paper and github submitted by /u/ai-lover [link] [comments]  ( 85 min )
    Ominous Escapade | Dark Galaxy | Raw UNSCALED (FILM)
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 84 min )
    The beginning of data-centric AI with data programming. What is data-centric AI?
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 84 min )
  • Open

    [D] How to evaluate a neural network in reverse?
    Say you have a neural network with 3 inputs, some hidden layers, and a single output. There might be many sets of those 3 inputs that give you the same output value. How can you evaluate this network in reverse, i.e. given an output value, find values of the 3 inputs that would yield that output? submitted by /u/zxkj [link] [comments]  ( 88 min )
    [Discussion] How do I smoothen the output of an action segmentation model near the boundaries?
    Hello. Apologies if this is the wrong place to post because my problem is a simple one related to machine learning. My problem involves a robot that operates given the output of an action segmentation model. The trained model outputs an action label at every timestep e.g. [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3, 3, 3]. Given the sequence, I must now compute the time it takes to go from one label to the next. However, the actual output tends to be quite unstable especially when the action transitions from one to the next e.g., [1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, ....]. As such, I occasionally get multiple transitions. This simple issue makes the input unusable to the robot. How can I clean up the output sequence? I thought of simple operations like converting the vector into a one-hot matrix and then running 1D erode and dilate operations but I was hoping to hear several other better suggestions. submitted by /u/applied-roboticist [link] [comments]  ( 87 min )
    [R] Single-task Continual/Incremental/Online/Life-Long learning.
    Hi everyone, I am new to the domain of continual learning/incremental learning/online learning/life-long learning (honestly, not able to make out the difference between them) and I would like to know if there exists a single-task life-long learning domain/problem. All the papers that I have gone through consist of methods trained for multiple tasks where newer tasks are added over time. I am looking for models trained for a single task that can be updated over time with new data belonging to the same task. I already have a trained model that I would like to update over time with either single or multiple data points. Any related links or directions would be greatly appreciated. TIA. submitted by /u/RohitDulam [link] [comments]  ( 86 min )
    [D] Thoughts on the autonomous vehicle (AV) field
    Curious to hear what people think of the future and current autonomous vehicle tech. Is it here to stay? Are we 5, 10 or 20+ years from true AV? What's the upside to society? Is it a worthwhile ML and AI research investment with potential benefits to other application areas? submitted by /u/purplebrown_updown [link] [comments]  ( 86 min )
    [P] Chart and Data Summarization
    I made an app that summarizes the data in csv files. Input a csv file and title of the file and the model will generate a summary. https://huggingface.co/spaces/saadob12/Chart_Data_Summarization The models: https://huggingface.co/saadob12/t5_C2T_autochart and https://huggingface.co/saadob12/t5_C2T_big. submitted by /u/QadriShyaari [link] [comments]  ( 85 min )
    [R] DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale - Microsoft 2022
    Paper: https://arxiv.org/pdf/2207.00032.pdf Abstract: The past several years have witnessed the success of transformer-based models, and their scale and application scenarios continue to grow aggressively. The current landscape of transformer models is increasingly diverse: the model size varies drastically with the largest being of hundred-billion parameters; the model characteristics differ due to the sparsity introduced by the Mixture-of-Experts; the target application scenarios can be latency-critical or throughput-oriented; the deployment hardware could be single- or multi-GPU systems with different types of memory and storage, etc. With such increasing diversity and the fast-evolving pace of transformer models, designing a highly performant and efficient inference system is extre…  ( 87 min )
    [D] Searching for a paper on equivalent transformations on trained networks
    I had come across a paper that explored strategies to transform the architecture of a trained neural network, i.e. increasing layer width or adding additional layers, without forgetting what the network has already learnt. They describe initialization strategies to accomplish this. Does anyone know the paper I am talking about? submitted by /u/kniranjankumar [link] [comments]  ( 86 min )
    [Discussion] Giving a machine learning presentation to laypeople
    Hello all, I've been asked to deliver a machine learning presentation to cardiologists and doctors, obviously they have no prior expertise in this area. I had wondered if anyone else had some experience presenting to Laypeople in the context of machine learning. Just looking for some ideas really, what would you cover? What examples would you give? How would you structure it? Any help is always appreciated ! [Edit#1] Thank you for the help everyone, this is some really useful feedback that I will take on bosrd submitted by /u/MidnightMaverick [link] [comments]  ( 91 min )
    [P] Detection by position rather than looks?
    I am working on a project that needs to decide which olive tree branches should be cut. The goal is to detect the specific type of branch (watersprouts). The problem I'm facing is that I'm unsure if I should use object detection (image classification + localization) or image segmentation. The difference between branches is mostly in their position with watersprouts growing mostly vertical to the main branch(there is a very small difference in looks between watersprouts and other branches) while other branches can grow in all ways (mostly parallel to the main branch). My plan was to use object detection so I can classify watersprouts and localize them in the picture. I think that segmentation is a bit overkill for this problem because I don't see the need for localizing every pixel. The plan was to take pictures of watersprouts as class 1 and other branches as class 2,train them so I can detect them and localize. When I localize them I can now see which of these branches is a watersprout branch and which is a regular branch and then I know that watersprout should be cut. The other problem I have is with understanding if it is possible for my machine learning project to recognize watersprouts not by their looks but by their position in regards to the main branch and correctly differentiate them from other branches because this is the main difference between watersprouts branch and regular branch. My understending is that the network learns how the object looks like and that position doesn't matter. Am I on a right track or am I missing something? submitted by /u/Greckon121 [link] [comments]  ( 88 min )
    [P] Sioyek 1.4 | Academic PDF Viewer
    During my PhD, I developed an open source PDF viewer to help me with my research. I think it can be useful for the users of this sub. Some of the research-oriented features include: Quickly jump to or preview references (for example Figure 3.1 for a figure or [8] for a reference). Works even if the document doesn't have links. Search paper names in google scholar by middle clicking on them (combined with the previous feature makes finding papers super fast) Searchable highlights/bookmarks Line-by-line highlighting for reduced eye strain (video) Synctex Support Extensible using external scripts (see this post for some examples) And many other features which are explained in the github page including marks, history, portals, searchable table of contents, automatic table of contents generation, searchable previous documents, etc. Here is a video demo of some of the features: https://www.youtube.com/watch?v=yTmCI0Xp5vI&t=3s And here is the latest release: https://github.com/ahrm/sioyek/releases/tag/v1.4.0 Disclaimer: I did introduce sioyek in this subreddit about a year ago, but it has changed a lot since then and some of the features suggested in the comments of last year's post are implemented, so I thought users of this subreddit might be interested in an update. submitted by /u/highergraphic [link] [comments]  ( 88 min )
    [D] when do eccv meta-reviews come out?
    I know the result from the link in the email but cmt still says "awaiting decision" call me antsy but I just want to see the final comments and meta-review... did it take this long last year? submitted by /u/gnohuhs [link] [comments]  ( 87 min )
    [R] NeurIPS2022’s Natural Language for Optimization (NL4Opt) competition!
    We invite you to join our NL4Opt competition that will be part of NeurIPS2022. We have a novel never-before-seen NLP dataset in hopes of making optimization solvers more accessible and usable. The competition aims to allow non-experts to use optimization tools in their decision-making. This competition is split into two main tasks: NER and generation. We have provided baselines for each to kick-start your implementation. We will award a total of $22,000 USD evenly across the two tasks. We will also be hosting a workshop at the end of the competition and will be inviting experts and winners as podium speakers. Additionally, we plan to host poster sessions for participants to share their solution. The competition is tentatively from July 1st to October 15th with the submission portal opening on July 15th. We look forward to your participation – you can register (https://nl4opt.github.io/participate/) and our organizers will be in touch with you shortly. For more information regarding the competition details, schedule, eligibility, rules, FAQs, and to get started, visit our competition website linked below! Follow our social media and GitHub discussion forum to keep updated. If you have any questions, please take a look at the FAQ section of our website. For any unanswered questions, free to start the discussion on the GitHub forum. Twitter: https://twitter.com/NL4Opt Website: https://nl4opt.github.io/ GitHub discussion forum: https://github.com/nl4opt/nl4opt-competition/discussions We look forward to your participation, NL4Opt Organizers submitted by /u/Adept_Ad_3308 [link] [comments]  ( 86 min )
    [D] LaMDA long-term memory
    Google's February, 2022 LaMDA paper says it is preconditioned on previous interactions (someone on this subreddit said 14-30) in support of tuning its "sensibleness" metric, which includes making sure responses don't contradict anything said earlier. However, in this podcast, Blake Lemoine says at 5:30-7:00 that LaMDA has some kind of long-term memory stretching back at least five years. He also mentions that the current system called "LaMDA 2" has access to a much wider variety of database resources than the paper or other Google publications describe, including Google Images, YouTube, and Google Books. Is LaMDA 2 documented anywhere? What other features does it have beyond what is documented in the February paper? submitted by /u/Competitive_Travel16 [link] [comments]  ( 88 min )
  • Open

    Using Learning Rate Schedules for Deep Learning Models in Python with Keras
    Training a neural network or large deep learning model is a difficult optimization task. The classical algorithm to train neural networks is called stochastic gradient descent. It has been well established that you can achieve increased performance and faster training on some problems by using a learning rate that changes during training. In this post […] The post Using Learning Rate Schedules for Deep Learning Models in Python with Keras appeared first on Machine Learning Mastery.  ( 24 min )
  • Open

    "Reinforcement Learning for Datacenter Congestion Control", Tessler et al 2021 {NV}
    submitted by /u/gwern [link] [comments]  ( 84 min )
    "DexMV: Imitation Learning for Dexterous Manipulation from Human Videos", Qin et al 2021
    submitted by /u/gwern [link] [comments]  ( 84 min )
    "Job Hunt as a PhD in RL: How it Actually Happens", Nato Lambert
    submitted by /u/gwern [link] [comments]  ( 85 min )
    Reward and step functions for path planning?
    Are there gym environments which provide reward and step functions for path planning? That is a set of waypoints as an output instead of throttle and steering submitted by /u/Mortang64 [link] [comments]  ( 84 min )
  • Open

    ​​Deep Hierarchical Planning from Pixels
    Posted by Danijar Hafner, Student Researcher, Google Research Research into how artificial agents can make decisions has evolved rapidly through advances in deep reinforcement learning. Compared to generative ML models like GPT-3 and Imagen, artificial agents can directly influence their environment through actions, such as moving a robot arm based on camera inputs or clicking a button in a web browser. While artificial agents have the potential to be increasingly helpful to people, current methods are held back by the need to receive detailed feedback in the form of frequently provided rewards to learn successful strategies. For example, despite large computational budgets, even powerful programs such as AlphaGo are limited to a few hundred moves until receiving their next reward. In co…  ( 26 min )
  • Open

    Gradient-based Neuromorphic Learning on Dynamical RRAM Arrays
    submitted by /u/Harley109 [link] [comments]  ( 84 min )
    Arnold Schwarzenegger One Liners
    I've been playing around with some Neural Network Text generator stuff and was wondering if anyone might know where I can get a compiled list of Arnold Schwarzenegger one liners for.. reasons.. submitted by /u/QwikMathz [link] [comments]  ( 84 min )
  • Open

    No Fueling Around: Designers Collaborate in Extended Reality on Porsche Electric Race Car
    A one-of-a-kind electric race car revved to life before it was manufactured — or even prototyped — thanks to GPU-powered extended reality technology. At the Automotive Innovation Forum in May, NVIDIA worked with Autodesk VRED to showcase a photorealistic Porsche electric sports car in augmented reality, with multiple attendees collaborating in the same immersive environment. Read article > The post No Fueling Around: Designers Collaborate in Extended Reality on Porsche Electric Race Car appeared first on NVIDIA Blog.  ( 5 min )
  • Open

    How to tell if a document management system is ready for the future? (part-1)
    What is document management?  ( 8 min )
    Metaverse Technology and Human-AI Interaction
    The role of AI in the Metaverse has yet to be established. Is AI and blockchain technology a good fit?  ( 9 min )
  • Open

    Onboard PaddleOCR with Amazon SageMaker Projects for MLOps to perform optical character recognition on identity documents
    Optical character recognition (OCR) is the task of converting printed or handwritten text into machine-encoded text. OCR has been widely used in various scenarios, such as document electronization and identity authentication. Because OCR can greatly reduce the manual effort to register key information and serve as an entry step for understanding large volumes of documents, […]  ( 11 min )
  • Open

    A Study on Robustness to Perturbations for Representations of Environmental Sound. (arXiv:2203.10425v3 [cs.SD] UPDATED)
    Audio applications involving environmental sound analysis increasingly use general-purpose audio representations, also known as embeddings, for transfer learning. Recently, Holistic Evaluation of Audio Representations (HEAR) evaluated twenty-nine embedding models on nineteen diverse tasks. However, the evaluation's effectiveness depends on the variation already captured within a given dataset. Therefore, for a given data domain, it is unclear how the representations would be affected by the variations caused by myriad microphones' range and acoustic conditions -- commonly known as channel effects. We aim to extend HEAR to evaluate invariance to channel effects in this work. To accomplish this, we imitate channel effects by injecting perturbations to the audio signal and measure the shift in the new (perturbed) embeddings with three distance measures, making the evaluation domain-dependent but not task-dependent. Combined with the downstream performance, it helps us make a more informed prediction of how robust the embeddings are to the channel effects. We evaluate two embeddings -- YAMNet, and OpenL3 on monophonic (UrbanSound8K) and polyphonic (SONYC-UST) urban datasets. We show that one distance measure does not suffice in such task-independent evaluation. Although Fr\'echet Audio Distance (FAD) correlates with the trend of the performance drop in the downstream task most accurately, we show that we need to study FAD in conjunction with the other distances to get a clear understanding of the overall effect of the perturbation. In terms of the embedding performance, we find OpenL3 to be more robust than YAMNet, which aligns with the HEAR evaluation.  ( 3 min )
    Individual health-disease phase diagrams for disease prevention based on machine learning. (arXiv:2205.15598v2 [cs.LG] UPDATED)
    Early disease detection and prevention methods based on effective interventions are gaining attention. Machine learning technology has enabled precise disease prediction by capturing individual differences in multivariate data. Progress in precision medicine has revealed that substantial heterogeneity exists in health data at the individual level and that complex health factors are involved in the development of chronic diseases. However, it remains a challenge to identify individual physiological state changes in cross-disease onset processes because of the complex relationships among multiple biomarkers. Here, we present the health-disease phase diagram (HDPD), which represents a personal health state by visualizing the boundary values of multiple biomarkers that fluctuate early in the disease progression process. In HDPDs, future onset predictions are represented by perturbing multiple biomarker values while accounting for dependencies among variables. We constructed HDPDs for 11 non-communicable diseases (NCDs) from a longitudinal health checkup cohort of 3,238 individuals, comprising 3,215 measurement items and genetic data. Improvement of biomarker values to the non-onset region in HDPD significantly prevented future disease onset in 7 out of 11 NCDs. Our results demonstrate that HDPDs can represent individual physiological states in the onset process and be used as intervention goals for disease prevention.  ( 3 min )
    Some performance considerations when using multi-armed bandit algorithms in the presence of missing data. (arXiv:2205.03820v2 [stat.ML] UPDATED)
    When comparing the performance of multi-armed bandit algorithms, the potential impact of missing data is often overlooked. In practice, it also affects their implementation where the simplest approach to overcome this is to continue to sample according to the original bandit algorithm, ignoring missing outcomes. We investigate the impact on performance of this approach to deal with missing data for several bandit algorithms through an extensive simulation study assuming the rewards are missing at random. We focus on two-armed bandit algorithms with binary outcomes in the context of patient allocation for clinical trials with relatively small sample sizes. However, our results apply to other applications of bandit algorithms where missing data is expected to occur. We assess the resulting operating characteristics, including the expected reward. Different probabilities of missingness in both arms are considered. The key finding of our work is that when using the simplest strategy of ignoring missing data, the impact on the expected performance of multi-armed bandit strategies varies according to the way these strategies balance the exploration-exploitation trade-off. Algorithms that are geared towards exploration continue to assign samples to the arm with more missing responses (which being perceived as the arm with less observed information is deemed more appealing by the algorithm than it would otherwise be). In contrast, algorithms that are geared towards exploitation would rapidly assign a high value to samples from the arms with a current high mean irrespective of the level observations per arm. Furthermore, for algorithms focusing more on exploration, we illustrate that the problem of missing responses can be alleviated using a simple mean imputation approach.
    Learning grammar with a divide-and-concur neural network. (arXiv:2201.07341v3 [cs.CL] UPDATED)
    We implement a divide-and-concur iterative projection approach to context-free grammar inference. Unlike most state-of-the-art models of natural language processing, our method requires a relatively small number of discrete parameters, making the inferred grammar directly interpretable -- one can read off from a solution how to construct grammatically valid sentences. Another advantage of our approach is the ability to infer meaningful grammatical rules from just a few sentences, compared to the hundreds of gigabytes of training data many other models employ. We demonstrate several ways of applying our approach: classifying words and inferring a grammar from scratch, taking an existing grammar and refining its categories and rules, and taking an existing grammar and expanding its lexicon as it encounters new words in new data.
    Towards Better Understanding of Self-Supervised Representations. (arXiv:2203.01881v2 [cs.LG] UPDATED)
    Self-supervised learning methods have shown impressive results in downstream classification tasks. However, there is limited work in understanding and interpreting their learned representations. In this paper, we study the representation space of several state-of-the-art self-supervised models including SimCLR, SwaV, MoCo V2 and BYOL. Without the use of class label information, we first discover discriminative features that are highly active for various subsets of samples and correspond to unique physical attributes in images. We show that, using such discriminative features, one can compress the representation space of self-supervised models up to 50% without affecting downstream linear classification significantly. Next, we propose a sample-wise Self-Supervised Representation Quality Score (or, Q-Score) that can be computed without access to any label information. Q-Score, utilizes discriminative features to reliably predict if a given sample is likely to be mis-classified in the downstream classification task achieving AUPRC of 0.91 on SimCLR and BYOL trained on ImageNet-100. Q-Score can also be used as a regularization term to remedy low-quality representations leading up to 8% relative improvement in accuracy on all 4 self-supervised baselines on ImageNet-100, CIFAR-10, CIFAR-100 and STL-10. Moreover, through heatmap analysis, we show that Q-Score regularization enhances discriminative features and reduces feature noise, thus improving model interpretability.
    Classification of Time-Series Data Using Boosted Decision Trees. (arXiv:2110.00581v2 [cs.LG] UPDATED)
    Time-series data classification is central to the analysis and control of autonomous systems, such as robots and self-driving cars. Temporal logic-based learning algorithms have been proposed recently as classifiers of such data. However, current frameworks are either inaccurate for real-world applications, such as autonomous driving, or they generate long and complicated formulae that lack interpretability. To address these limitations, we introduce a novel learning method, called Boosted Concise Decision Trees (BCDTs), to generate binary classifiers that are represented as Signal Temporal Logic (STL) formulae. Our algorithm leverages an ensemble of Concise Decision Trees (CDTs) to improve the classification performance, where each CDT is a decision tree that is empowered by a set of techniques to generate simpler formulae and improve interpretability. The effectiveness and classification performance of our algorithm are evaluated on naval surveillance and urban-driving case studies.
    Exploiting Action Impact Regularity and Exogenous State Variables for Offline Reinforcement Learning. (arXiv:2111.08066v3 [cs.LG] UPDATED)
    Offline reinforcement learning -- learning a policy from a batch of data -- is known to be hard for general MDPs. These results motivate the need to look at specific classes of MDPs where offline reinforcement learning might be feasible. In this work, we explore a restricted class of MDPs to obtain guarantees for offline reinforcement learning. The key property, which we call Action Impact Regularity (AIR), is that actions primarily impact a part of the state (an endogenous component) with limited impact on the remaining part of the state (an exogenous component). AIR is a strong assumption, but it nonetheless holds in a number of real-world domains including financial markets. We discuss algorithms that exploits the AIR property, and provide a theoretical analysis for an algorithm based on Fitted-Q Iteration. Finally, we demonstrate that the algorithm outperforms existing offline reinforcement learning algorithms across different data collection policies in simulated and real world environments where the regularity holds.
    Mitigating shortage of labeled data using clustering-based active learning with diversity exploration. (arXiv:2207.02964v1 [cs.LG])
    In this paper, we proposed a new clustering-based active learning framework, namely Active Learning using a Clustering-based Sampling (ALCS), to address the shortage of labeled data. ALCS employs a density-based clustering approach to explore the cluster structure from the data without requiring exhaustive parameter tuning. A bi-cluster boundary-based sample query procedure is introduced to improve the learning performance for classifying highly overlapped classes. Additionally, we developed an effective diversity exploration strategy to address the redundancy among queried samples. Our experimental results justified the efficacy of the ALCS approach.
    Learning towards Robustness in Causally-Invariant Predictors. (arXiv:2107.01876v2 [stat.ML] UPDATED)
    We propose to learn an invariant causal predictor that is robust to distributional shifts, in the supervised regression scenario. Based on a disentangled causal factorization that describes the underlying data generating process, we attribute the distributional shifts to mutation of generating factors, which covers a wide range of cases of distributional shifts as we do not make prior specifications on the causal structure or the source of mutation. Under this causal framework, we identify a set of invariant predictors based on the do-operator. We provide a sufficient and necessary condition for a predictor to be min-max optimal, i.e., minimizes the worst-case quadratic loss among all domains. This condition is justifiable under the Markovian and faithfulness assumptions, thus inspiring a practical algorithm to identify the optimal predictor. For empirical estimation, we propose a permutation-regeneration scheme guided by a local causal discovery procedure. The utility and effectiveness of our method are demonstrated in simulation data and two real-world applications: Alzheimer's disease diagnosis and gene function prediction.
    Neural Stein critics with staged $L^2$-regularization. (arXiv:2207.03406v1 [stat.ML])
    Learning to differentiate model distributions from observed data is a fundamental problem in statistics and machine learning, and high-dimensional data remains a challenging setting for such problems. Metrics that quantify the disparity in probability distributions, such as the Stein discrepancy, play an important role in statistical testing in high dimensions. In this paper, we consider the setting where one wishes to distinguish between data sampled from an unknown probability distribution and a nominal model distribution. While recent studies revealed that the optimal $L^2$-regularized Stein critic equals the difference of the score functions of two probability distributions up to a multiplicative constant, we investigate the role of $L^2$ regularization when training a neural network Stein discrepancy critic function. Motivated by the Neural Tangent Kernel theory of training neural networks, we develop a novel staging procedure for the weight of regularization over training time. This leverages the advantages of highly-regularized training at early times while also empirically delaying overfitting. Theoretically, we relate the training dynamic with large regularization weight to the kernel regression optimization of "lazy training" regime in early training times. The benefit of the staged $L^2$ regularization is demonstrated on simulated high dimensional distribution drift data and an application to evaluating generative models of image data.
    Improving Spectral Clustering Using Spectrum-Preserving Node Aggregation. (arXiv:2110.12328v4 [cs.LG] UPDATED)
    Spectral clustering is one of the most popular clustering methods. However, the high computational cost due to the involved eigen-decomposition procedure can immediately hinder its applications in large-scale tasks. In this paper we use spectrum-preserving node reduction to accelerate eigen-decomposition and generate concise representations of data sets. Specifically, we create a small number of pseudonodes based on spectral similarity. Then, standard spectral clustering algorithm is performed on the smaller node set. Finally, each data point in the original data set is assigned to the cluster as its representative pseudo-node. The proposed framework run in nearly-linear time. Meanwhile, the clustering accuracy can be significantly improved by mining concise representations. The experimental results show dramatically improved clustering performance when compared with state-of-the-art methods.
    FedHeN: Federated Learning in Heterogeneous Networks. (arXiv:2207.03031v1 [cs.LG])
    We propose a novel training recipe for federated learning with heterogeneous networks where each device can have different architectures. We introduce training with a side objective to the devices of higher complexities to jointly train different architectures in a federated setting. We empirically show that our approach improves the performance of different architectures and leads to high communication savings compared to the state-of-the-art methods.
    DAiSEE: Towards User Engagement Recognition in the Wild. (arXiv:1609.01885v7 [cs.CV] UPDATED)
    We introduce DAiSEE, the first multi-label video classification dataset comprising of 9068 video snippets captured from 112 users for recognizing the user affective states of boredom, confusion, engagement, and frustration in the wild. The dataset has four levels of labels namely - very low, low, high, and very high for each of the affective states, which are crowd annotated and correlated with a gold standard annotation created using a team of expert psychologists. We have also established benchmark results on this dataset using state-of-the-art video classification methods that are available today. We believe that DAiSEE will provide the research community with challenges in feature extraction, context-based inference, and development of suitable machine learning methods for related tasks, thus providing a springboard for further research. The dataset is available for download at https://people.iith.ac.in/vineethnb/resources/daisee/index.html.
    Themis: A Network Bandwidth-Aware Collective Scheduling Policy for Distributed Training of DL Models. (arXiv:2110.04478v3 [cs.DC] UPDATED)
    Distributed training is a solution to reduce DNN training time by splitting the task across multiple NPUs (e.g., GPU/TPU). However, distributed training adds communication overhead between the NPUs in order to synchronize the gradients and/or activation, depending on the parallelization strategy. In next-generation platforms for training at scale, NPUs will be connected through multi-dimensional networks with diverse, heterogeneous bandwidths. This work identifies a looming challenge of keeping all network dimensions busy and maximizing the network BW within the hybrid environment if we leverage scheduling techniques for collective communication on systems today. We propose Themis, a novel collective scheduling scheme that dynamically schedules collectives (divided into chunks) to balance the communication loads across all dimensions, further improving the network BW utilization. Our results show that on average, Themis can improve the network BW utilization of the single All-Reduce by 1.72X (2.70X max), and improve the end-to-end training iteration performance of real workloads such as ResNet-152, GNMT, DLRM, and Transformer-1T by 1.49X (2.25X max), 1.30X (1.78X max), 1.30X (1.77X max), and 1.25X (1.53X max), respectively.
    Differentially Private Stochastic Linear Bandits: (Almost) for Free. (arXiv:2207.03445v1 [cs.LG])
    In this paper, we propose differentially private algorithms for the problem of stochastic linear bandits in the central, local and shuffled models. In the central model, we achieve almost the same regret as the optimal non-private algorithms, which means we get privacy for free. In particular, we achieve a regret of $\tilde{O}(\sqrt{T}+\frac{1}{\epsilon})$ matching the known lower bound for private linear bandits, while the best previously known algorithm achieves $\tilde{O}(\frac{1}{\epsilon}\sqrt{T})$. In the local case, we achieve a regret of $\tilde{O}(\frac{1}{\epsilon}{\sqrt{T}})$ which matches the non-private regret for constant $\epsilon$, but suffers a regret penalty when $\epsilon$ is small. In the shuffled model, we also achieve regret of $\tilde{O}(\sqrt{T}+\frac{1}{\epsilon})$ %for small $\epsilon$ as in the central case, while the best previously known algorithm suffers a regret of $\tilde{O}(\frac{1}{\epsilon}{T^{3/5}})$. Our numerical evaluation validates our theoretical results.
    Minimax formula for the replica symmetric free energy of deep restricted Boltzmann machines. (arXiv:2005.09424v2 [cond-mat.dis-nn] UPDATED)
    We study the free energy of a most used deep architecture for restricted Boltzmann machines, where the layers are disposed in series. Assuming independent Gaussian distributed random weights, we show that the error term in the so-called replica symmetric sum rule can be optimised as a saddle point. This leads us to conjecture that in the replica symmetric approximation the free energy is given by a min max formula, which parallels the one achieved for two-layer case.
    Offline Meta-Reinforcement Learning with Online Self-Supervision. (arXiv:2107.03974v4 [cs.LG] UPDATED)
    Meta-reinforcement learning (RL) methods can meta-train policies that adapt to new tasks with orders of magnitude less data than standard RL, but meta-training itself is costly and time-consuming. If we can meta-train on offline data, then we can reuse the same static dataset, labeled once with rewards for different tasks, to meta-train policies that adapt to a variety of new tasks at meta-test time. Although this capability would make meta-RL a practical tool for real-world use, offline meta-RL presents additional challenges beyond online meta-RL or standard offline RL settings. Meta-RL learns an exploration strategy that collects data for adapting, and also meta-trains a policy that quickly adapts to data from a new task. Since this policy was meta-trained on a fixed, offline dataset, it might behave unpredictably when adapting to data collected by the learned exploration strategy, which differs systematically from the offline data and thus induces distributional shift. We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy, and then collects additional unsupervised online data, without any reward labels to bridge this distribution shift. By not requiring reward labels for online collection, this data can be much cheaper to collect. We compare our method to prior work on offline meta-RL on simulated robot locomotion and manipulation tasks and find that using additional unsupervised online data collection leads to a dramatic improvement in the adaptive capabilities of the meta-trained policies, matching the performance of fully online meta-RL on a range of challenging domains that require generalization to new tasks.
    Model Selection in Reinforcement Learning with General Function Approximations. (arXiv:2207.02992v1 [stat.ML])
    We consider model selection for classic Reinforcement Learning (RL) environments -- Multi Armed Bandits (MABs) and Markov Decision Processes (MDPs) -- under general function approximations. In the model selection framework, we do not know the function classes, denoted by $\mathcal{F}$ and $\mathcal{M}$, where the true models -- reward generating function for MABs and and transition kernel for MDPs -- lie, respectively. Instead, we are given $M$ nested function (hypothesis) classes such that true models are contained in at-least one such class. In this paper, we propose and analyze efficient model selection algorithms for MABs and MDPs, that \emph{adapt} to the smallest function class (among the nested $M$ classes) containing the true underlying model. Under a separability assumption on the nested hypothesis classes, we show that the cumulative regret of our adaptive algorithms match to that of an oracle which knows the correct function classes (i.e., $\cF$ and $\cM$) a priori. Furthermore, for both the settings, we show that the cost of model selection is an additive term in the regret having weak (logarithmic) dependence on the learning horizon $T$.
    Distributionally Robust Policy Learning via Adversarial Environment Generation. (arXiv:2107.06353v6 [cs.RO] UPDATED)
    Our goal is to train control policies that generalize well to unseen environments. Inspired by the Distributionally Robust Optimization (DRO) framework, we propose DRAGEN - Distributionally Robust policy learning via Adversarial Generation of ENvironments - for iteratively improving robustness of policies to realistic distribution shifts by generating adversarial environments. The key idea is to learn a generative model for environments whose latent variables capture cost-predictive and realistic variations in environments. We perform DRO with respect to a Wasserstein ball around the empirical distribution of environments by generating realistic adversarial environments via gradient ascent on the latent space. We demonstrate strong Out-of-Distribution (OoD) generalization in simulation for (i) swinging up a pendulum with onboard vision and (ii) grasping realistic 3D objects. Grasping experiments on hardware demonstrate better sim2real performance compared to domain randomization.
    Pre-trained Gaussian processes for Bayesian optimization. (arXiv:2109.08215v4 [cs.LG] UPDATED)
    Bayesian optimization (BO) has become a popular strategy for global optimization of many expensive real-world functions. Contrary to a common belief that BO is suited to optimizing black-box functions, it actually requires domain knowledge on characteristics of those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process priors that specify initial beliefs on functions. However, even with expert knowledge, it is not an easy task to select a prior. This is especially true for hyperparameter tuning problems on complex machine learning models, where landscapes of tuning objectives are often difficult to comprehend. We seek an alternative practice for setting these functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. Theoretically, we show a bounded regret of BO with pre-trained priors. To verify our approach in realistic model training setups, we collected a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, our method is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods.
    An Additive Instance-Wise Approach to Multi-class Model Interpretation. (arXiv:2207.03113v1 [cs.LG])
    Interpretable machine learning offers insights into what factors drive a certain prediction of a black-box system and whether to trust it for high-stakes decisions or large-scale deployment. Existing methods mainly focus on selecting explanatory input features, which follow either locally additive or instance-wise approaches. Additive models use heuristically sampled perturbations to learn instance-specific explainers sequentially. The process is thus inefficient and susceptible to poorly-conditioned samples. Meanwhile, instance-wise techniques directly learn local sampling distributions and can leverage global information from other inputs. However, they can only interpret single-class predictions and suffer from inconsistency across different settings, due to a strict reliance on a pre-defined number of features selected. This work exploits the strengths of both methods and proposes a global framework for learning local explanations simultaneously for multiple target classes. We also propose an adaptive inference strategy to determine the optimal number of features for a specific instance. Our model explainer significantly outperforms additive and instance-wise counterparts on faithfulness while achieves high level of brevity on various data sets and black-box model architectures.
    Stochastic optimal well control in subsurface reservoirs using reinforcement learning. (arXiv:2207.03456v1 [cs.LG])
    We present a case study of model-free reinforcement learning (RL) framework to solve stochastic optimal control for a predefined parameter uncertainty distribution and partially observable system. We focus on robust optimal well control problem which is a subject of intensive research activities in the field of subsurface reservoir management. For this problem, the system is partially observed since the data is only available at well locations. Furthermore, the model parameters are highly uncertain due to sparsity of available field data. In principle, RL algorithms are capable of learning optimal action policies -- a map from states to actions -- to maximize a numerical reward signal. In deep RL, this mapping from state to action is parameterized using a deep neural network. In the RL formulation of the robust optimal well control problem, the states are represented by saturation and pressure values at well locations while the actions represent the valve openings controlling the flow through wells. The numerical reward refers to the total sweep efficiency and the uncertain model parameter is the subsurface permeability field. The model parameter uncertainties are handled by introducing a domain randomisation scheme that exploits cluster analysis on its uncertainty distribution. We present numerical results using two state-of-the-art RL algorithms, proximal policy optimization (PPO) and advantage actor-critic (A2C), on two subsurface flow test cases representing two distinct uncertainty distributions of permeability field. The results were benchmarked against optimisation results obtained using differential evolution algorithm. Furthermore, we demonstrate the robustness of the proposed use of RL by evaluating the learned control policy on unseen samples drawn from the parameter uncertainty distribution that were not used during the training process.
    Fairness and Bias in Robot Learning. (arXiv:2207.03444v1 [cs.RO])
    Machine learning has significantly enhanced the abilities of robots, enabling them to perform a wide range of tasks in human environments and adapt to our uncertain real world. Recent works in various domains of machine learning have highlighted the importance of accounting for fairness to ensure that these algorithms do not reproduce human biases and consequently lead to discriminatory outcomes. With robot learning systems increasingly performing more and more tasks in our everyday lives, it is crucial to understand the influence of such biases to prevent unintended behavior toward certain groups of people. In this work, we present the first survey on fairness in robot learning from an interdisciplinary perspective spanning technical, ethical, and legal challenges. We propose a taxonomy for sources of bias and the resulting types of discrimination due to them. Using examples from different robot learning domains, we examine scenarios of unfair outcomes and strategies to mitigate them. We present early advances in the field by covering different fairness definitions, ethical and legal considerations, and methods for fair robot learning. With this work, we aim at paving the road for groundbreaking developments in fair robot learning.
    Challenges and Pitfalls of Bayesian Unlearning. (arXiv:2207.03227v1 [cs.LG])
    Machine unlearning refers to the task of removing a subset of training data, thereby removing its contributions to a trained model. Approximate unlearning are one class of methods for this task which avoid the need to retrain the model from scratch on the retained data. Bayes' rule can be used to cast approximate unlearning as an inference problem where the objective is to obtain the updated posterior by dividing out the likelihood of deleted data. However this has its own set of challenges as one often doesn't have access to the exact posterior of the model parameters. In this work we examine the use of the Laplace approximation and Variational Inference to obtain the updated posterior. With a neural network trained for a regression task as the guiding example, we draw insights on the applicability of Bayesian unlearning in practical scenarios.
    Directed Weight Neural Networks for Protein Structure Representation Learning. (arXiv:2201.13299v3 [q-bio.BM] UPDATED)
    A protein performs biological functions by folding to a particular 3D structure. To accurately model the protein structures, both the overall geometric topology and local fine-grained relations between amino acids (e.g. side-chain torsion angles and inter-amino-acid orientations) should be carefully considered. In this work, we propose the Directed Weight Neural Network for better capturing geometric relations among different amino acids. Extending a single weight from a scalar to a 3D directed vector, our new framework supports a rich set of geometric operations on both classical and SO(3)--representation features, on top of which we construct a perceptron unit for processing amino-acid information. In addition, we introduce an equivariant message passing paradigm on proteins for plugging the directed weight perceptrons into existing Graph Neural Networks, showing superior versatility in maintaining SO(3)-equivariance at the global scale. Experiments show that our network has remarkably better expressiveness in representing geometric relations in comparison to classical neural networks and the (globally) equivariant networks. It also achieves state-of-the-art performance on various computational biology applications related to protein 3D structures.
    Semi-unsupervised Learning for Time Series Classification. (arXiv:2207.03119v1 [cs.LG])
    Time series are ubiquitous and therefore inherently hard to analyze and ultimately to label or cluster. With the rise of the Internet of Things (IoT) and its smart devices, data is collected in large amounts any given second. The collected data is rich in information, as one can detect accidents (e.g. cars) in real time, or assess injury/sickness over a given time span (e.g. health devices). Due to its chaotic nature and massive amounts of datapoints, timeseries are hard to label manually. Furthermore new classes within the data could emerge over time (contrary to e.g. handwritten digits), which would require relabeling the data. In this paper we present SuSL4TS, a deep generative Gaussian mixture model for semi-unsupervised learning, to classify time series data. With our approach we can alleviate manual labeling steps, since we can detect sparsely labeled classes (semi-supervised) and identify emerging classes hidden in the data (unsupervised). We demonstrate the efficacy of our approach with established time series classification datasets from different domains.
    Softmax-free Linear Transformers. (arXiv:2207.03341v1 [cs.CV])
    Vision transformers (ViTs) have pushed the state-of-the-art for various visual recognition tasks by patch-wise image tokenization followed by stacked self-attention operations. Employing self-attention modules results in a quadratic complexity in both computation and memory usage. Various attempts on approximating the self-attention computation with linear complexity have thus been made in Natural Language Processing. However, an in-depth analysis in this work reveals that they are either theoretically flawed or empirically ineffective for visual recognition. We identify that their limitations are rooted in retaining the softmax self-attention during approximations. Specifically, conventional self-attention is computed by normalizing the scaled dot-product between token feature vectors. Preserving the softmax operation challenges any subsequent linearization efforts. Under this insight, a SOftmax-Free Transformer (abbreviated as SOFT) is proposed for the first time. To eliminate the softmax operator in self-attention, a Gaussian kernel function is adopted to replace the dot-product similarity. This enables a full self-attention matrix to be approximated via a low-rank matrix decomposition. The robustness of our approximation is achieved by calculating its Moore-Penrose inverse using a Newton-Raphson method. Further, an efficient symmetric normalization is introduced on the low-rank self-attention for enhancing model generalizability and transferability. Extensive experiments on ImageNet, COCO and ADE20K show that our SOFT significantly improves the computational efficiency of existing ViT variants. Crucially, with a linear complexity, much longer token sequences are permitted in SOFT, resulting in superior trade-off between accuracy and complexity.
    Equivariant Representation Learning via Class-Pose Decomposition. (arXiv:2207.03116v1 [cs.LG])
    We introduce a general method for learning representations that are equivariant to symmetries of data. The central idea is to to decompose the latent space in an invariant factor and the symmetry group itself. The components semantically correspond to intrinsic data classes and poses respectively. The learner is self-supervised and infers these semantics based on relative symmetry information. The approach is motivated by theoretical results from group theory and guarantees representations that are lossless, interpretable and disentangled. We empirically investigate the approach via experiments involving datasets with a variety of symmetries. Results show that our representations capture the geometry of data and outperform other equivariant representation learning frameworks.
    Multi-Label Learning to Rank through Multi-Objective Optimization. (arXiv:2207.03060v1 [cs.IR])
    Learning to Rank (LTR) technique is ubiquitous in the Information Retrieval system nowadays, especially in the Search Ranking application. The query-item relevance labels typically used to train the ranking model are often noisy measurements of human behavior, e.g., product rating for product search. The coarse measurements make the ground truth ranking non-unique with respect to a single relevance criterion. To resolve ambiguity, it is desirable to train a model using many relevance criteria, giving rise to Multi-Label LTR (MLLTR). Moreover, it formulates multiple goals that may be conflicting yet important to optimize for simultaneously, e.g., in product search, a ranking model can be trained based on product quality and purchase likelihood to increase revenue. In this research, we leverage the Multi-Objective Optimization (MOO) aspect of the MLLTR problem and employ recently developed MOO algorithms to solve it. Specifically, we propose a general framework where the information from labels can be combined in a variety of ways to meaningfully characterize the trade-off among the goals. Our framework allows for any gradient based MOO algorithm to be used for solving the MLLTR problem. We test the proposed framework on two publicly available LTR datasets and one e-commerce dataset to show its efficacy.
    Adaptive Personlization in Federated Learning for Highly Non-i.i.d. Data. (arXiv:2207.03448v1 [cs.LG])
    Federated learning (FL) is a distributed learning method that offers medical institutes the prospect of collaboration in a global model while preserving the privacy of their patients. Although most medical centers conduct similar medical imaging tasks, their differences, such as specializations, number of patients, and devices, lead to distinctive data distributions. Data heterogeneity poses a challenge for FL and the personalization of the local models. In this work, we investigate an adaptive hierarchical clustering method for FL to produce intermediate semi-global models, so clients with similar data distribution have the chance of forming a more specialized model. Our method forms several clusters consisting of clients with the most similar data distributions; then, each cluster continues to train separately. Inside the cluster, we use meta-learning to improve the personalization of the participants' models. We compare the clustering approach with classical FedAvg and centralized training by evaluating our proposed methods on the HAM10k dataset for skin lesion classification with extreme heterogeneous data distribution. Our experiments demonstrate significant performance gain in heterogeneous distribution compared to standard FL methods in classification accuracy. Moreover, we show that the models converge faster if applied in clusters and outperform centralized training while using only a small subset of data.
    Y-Net: A Spatiospectral Dual-Encoder Networkfor Medical Image Segmentation. (arXiv:2204.07613v2 [eess.IV] UPDATED)
    Automated segmentation of retinal optical coherence tomography (OCT) images has become an important recent direction in machine learning for medical applications. We hypothesize that the anatomic structure of layers and their high-frequency variation in OCT images make retinal OCT a fitting choice for extracting spectral-domain features and combining them with spatial domain features. In this work, we present $\Upsilon$-Net, an architecture that combines the frequency domain features with the image domain to improve the segmentation performance of OCT images. The results of this work demonstrate that the introduction of two branches, one for spectral and one for spatial domain features, brings a very significant improvement in fluid segmentation performance and allows outperformance as compared to the well-known U-Net model. Our improvement was 13% on the fluid segmentation dice score and 1.9% on the average dice score. Finally, removing selected frequency ranges in the spectral domain demonstrates the impact of these features on the fluid segmentation outperformance.
    On the Relationship Between Adversarial Robustness and Decision Region in Deep Neural Network. (arXiv:2207.03400v1 [cs.LG])
    In general, Deep Neural Networks (DNNs) are evaluated by the generalization performance measured on unseen data excluded from the training phase. Along with the development of DNNs, the generalization performance converges to the state-of-the-art and it becomes difficult to evaluate DNNs solely based on this metric. The robustness against adversarial attack has been used as an additional metric to evaluate DNNs by measuring their vulnerability. However, few studies have been performed to analyze the adversarial robustness in terms of the geometry in DNNs. In this work, we perform an empirical study to analyze the internal properties of DNNs that affect model robustness under adversarial attacks. In particular, we propose the novel concept of the Populated Region Set (PRS), where training samples are populated more frequently, to represent the internal properties of DNNs in a practical setting. From systematic experiments with the proposed concept, we provide empirical evidence to validate that a low PRS ratio has a strong relationship with the adversarial robustness of DNNs. We also devise PRS regularizer leveraging the characteristics of PRS to improve the adversarial robustness without adversarial training.
    Back to the Source: Diffusion-Driven Test-Time Adaptation. (arXiv:2207.03442v1 [cs.LG])
    Test-time adaptation harnesses test inputs to improve the accuracy of a model trained on source data when tested on shifted target data. Existing methods update the source model by (re-)training on each target domain. While effective, re-training is sensitive to the amount and order of the data and the hyperparameters for optimization. We instead update the target data, by projecting all test inputs toward the source domain with a generative diffusion model. Our diffusion-driven adaptation method, DDA, shares its models for classification and generation across all domains. Both models are trained on the source domain, then fixed during testing. We augment diffusion with image guidance and self-ensembling to automatically decide how much to adapt. Input adaptation by DDA is more robust than prior model adaptation approaches across a variety of corruptions, architectures, and data regimes on the ImageNet-C benchmark. With its input-wise updates, DDA succeeds where model adaptation degrades on too little data in small batches, dependent data in non-uniform order, or mixed data with multiple corruptions.
    SC2EGSet: StarCraft II Esport Replay and Game-state Dataset. (arXiv:2207.03428v1 [cs.LG])
    As a relatively new form of sport, esports offers unparalleled data availability. Despite the vast amounts of data that are generated by game engines, it can be challenging to extract them and verify their integrity for the purposes of practical and scientific use. Our work aims to open esports to a broader scientific community by supplying raw and pre-processed files from StarCraft II esports tournaments. These files can be used in statistical and machine learning modeling tasks and related to various laboratory-based measurements (e.g., behavioral tests, brain imaging). We have gathered publicly available game-engine generated "replays" of tournament matches and performed data extraction and cleanup using a low-level application programming interface (API) parser library. Additionally, we open-sourced and published all the custom tools that were developed in the process of creating our dataset. These tools include PyTorch and PyTorch Lightning API abstractions to load and model the data. Our dataset contains replays from major and premiere StarCraft II tournaments since 2016. To prepare the dataset, we processed 55 tournament "replaypacks" that contained 17930 files with game-state information. Based on initial investigation of available StarCraft II datasets, we observed that our dataset is the largest publicly available source of StarCraft II esports data upon its publication. Analysis of the extracted data holds promise for further Artificial Intelligence (AI), Machine Learning (ML), psychological, Human-Computer Interaction (HCI), and sports-related studies in a variety of supervised and self-supervised tasks.
    Learning Optimal Solutions via an LSTM-Optimization Framework. (arXiv:2207.02937v1 [cs.LG])
    In this study, we present a deep learning-optimization framework to tackle dynamic mixed-integer programs. Specifically, we develop a bidirectional Long Short Term Memory (LSTM) framework that can process information forward and backward in time to learn optimal solutions to sequential decision-making problems. We demonstrate our approach in predicting the optimal decisions for the single-item capacitated lot-sizing problem (CLSP), where a binary variable denotes whether to produce in a period or not. Due to the dynamic nature of the problem, the CLSP can be treated as a sequence labeling task where a recurrent neural network can capture the problem's temporal dynamics. Computational results show that our LSTM-Optimization (LSTM-Opt) framework significantly reduces the solution time of benchmark CLSP problems without much loss in feasibility and optimality. For example, the predictions at the 85\% level reduce the CPLEX solution time by a factor of 9 on average for over 240,000 test instances with an optimality gap of less than 0.05\% and 0.4\% infeasibility in the test set. Also, models trained using shorter planning horizons can successfully predict the optimal solution of the instances with longer planning horizons. For the hardest data set, the LSTM predictions at the 25\% level reduce the solution time of 70 CPU hours to less than 2 CPU minutes with an optimality gap of 0.8\% and without any infeasibility. The LSTM-Opt framework outperforms classical ML algorithms, such as the logistic regression and random forest, in terms of the solution quality, and exact approaches, such as the ($\ell$, S) and dynamic programming-based inequalities, with respect to the solution time improvement. Our machine learning approach could be beneficial in tackling sequential decision-making problems similar to CLSP, which need to be solved repetitively, frequently, and in a fast manner.
    Riemannian Diffusion Schr\"odinger Bridge. (arXiv:2207.03024v1 [stat.ML])
    Score-based generative models exhibit state of the art performance on density estimation and generative modeling tasks. These models typically assume that the data geometry is flat, yet recent extensions have been developed to synthesize data living on Riemannian manifolds. Existing methods to accelerate sampling of diffusion models are typically not applicable in the Riemannian setting and Riemannian score-based methods have not yet been adapted to the important task of interpolation of datasets. To overcome these issues, we introduce \emph{Riemannian Diffusion Schr\"odinger Bridge}. Our proposed method generalizes Diffusion Schr\"odinger Bridge introduced in \cite{debortoli2021neurips} to the non-Euclidean setting and extends Riemannian score-based models beyond the first time reversal. We validate our proposed method on synthetic data and real Earth and climate data.
    Network Binarization via Contrastive Learning. (arXiv:2207.02970v1 [cs.CV])
    Neural network binarization accelerates deep models by quantizing their weights and activations into 1-bit. However, there is still a huge performance gap between Binary Neural Networks (BNNs) and their full-precision (FP) counterparts. As the quantization error caused by weights binarization has been reduced in earlier works, the activations binarization becomes the major obstacle for further improvement of the accuracy. BNN characterises a unique and interesting structure, where the binary and latent FP activations exist in the same forward pass (\textit{i.e.} $\text{Binarize}(\mathbf{a}_F) = \mathbf{a}_B$). To mitigate the information degradation caused by the binarization operation from FP to binary activations, we establish a novel contrastive learning framework while training BNNs through the lens of Mutual Information (MI) maximization. MI is introduced as the metric to measure the information shared between binary and FP activations, which assists binarization with contrastive learning. Specifically, the representation ability of the BNNs is greatly strengthened via pulling the positive pairs with binary and FP activations from the same input samples, as well as pushing negative pairs from different samples (the number of negative pairs can be exponentially large). This benefits the downstream tasks, not only classification but also segmentation and depth estimation,~\textit{etc}. The experimental results show that our method can be implemented as a pile-up module on existing state-of-the-art binarization methods and can remarkably improve the performance over them on CIFAR-10/100 and ImageNet, in addition to the great generalization ability on NYUD-v2.
    Cross-Scale Vector Quantization for Scalable Neural Speech Coding. (arXiv:2207.03067v1 [cs.SD])
    Bitrate scalability is a desirable feature for audio coding in real-time communications. Existing neural audio codecs usually enforce a specific bitrate during training, so different models need to be trained for each target bitrate, which increases the memory footprint at the sender and the receiver side and transcoding is often needed to support multiple receivers. In this paper, we introduce a cross-scale scalable vector quantization scheme (CSVQ), in which multi-scale features are encoded progressively with stepwise feature fusion and refinement. In this way, a coarse-level signal is reconstructed if only a portion of the bitstream is received, and progressively improves the quality as more bits are available. The proposed CSVQ scheme can be flexibly applied to any neural audio coding network with a mirrored auto-encoder structure to achieve bitrate scalability. Subjective results show that the proposed scheme outperforms the classical residual VQ (RVQ) with scalability. Moreover, the proposed CSVQ at 3 kbps outperforms Opus at 9 kbps and Lyra at 3kbps and it could provide a graceful quality boost with bitrate increase.
    A Mutually Exciting Latent Space Hawkes Process Model for Continuous-time Networks. (arXiv:2205.09263v2 [cs.LG] UPDATED)
    Networks and temporal point processes serve as fundamental building blocks for modeling complex dynamic relational data in various domains. We propose the latent space Hawkes (LSH) model, a novel generative model for continuous-time networks of relational events, using a latent space representation for nodes. We model relational events between nodes using mutually exciting Hawkes processes with baseline intensities dependent upon the distances between the nodes in the latent space and sender and receiver specific effects. We demonstrate that our proposed LSH model can replicate many features observed in real temporal networks including reciprocity and transitivity, while also achieving superior prediction accuracy and providing more interpretable fits than existing models.
    Towards the Practical Utility of Federated Learning in the Medical Domain. (arXiv:2207.03075v1 [cs.LG])
    Federated learning (FL) is an active area of research. One of the most suitable areas for adopting FL is the medical domain, where patient privacy must be respected. Previous research, however, does not fully consider who will most likely use FL in the medical domain. It is not the hospitals who are eager to adopt FL, but the service providers such as IT companies who want to develop machine learning models with real patient records. Moreover, service providers would prefer to focus on maximizing the performance of the models at the lowest cost possible. In this work, we propose empirical benchmarks of FL methods considering both performance and monetary cost with three real-world datasets: electronic health records, skin cancer images, and electrocardiogram datasets. We also propose Federated learning with Proximal regularization eXcept local Normalization (FedPxN), which, using a simple combination of FedProx and FedBN, outperforms all other FL algorithms while consuming only slightly more power than the most power efficient method.
    NESC: Robust Neural End-2-End Speech Coding with GANs. (arXiv:2207.03282v1 [eess.AS])
    Neural networks have proven to be a formidable tool to tackle the problem of speech coding at very low bit rates. However, the design of a neural coder that can be operated robustly under real-world conditions remains a major challenge. Therefore, we present Neural End-2-End Speech Codec (NESC) a robust, scalable end-to-end neural speech codec for high-quality wideband speech coding at 3 kbps. The encoder uses a new architecture configuration, which relies on our proposed Dual-PathConvRNN (DPCRNN) layer, while the decoder architecture is based on our previous work Streamwise-StyleMelGAN. Our subjective listening tests on clean and noisy speech show that NESC is particularly robust to unseen conditions and signal perturbations.
    CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks. (arXiv:2204.10965v3 [cs.CV] UPDATED)
    In this paper, we propose CLIP-Dissect, a new technique to automatically describe the function of individual hidden neurons inside vision networks. CLIP-Dissect leverages recent advances in multimodal vision/language models to label internal neurons with open-ended concepts without the need for any labeled data or human examples, which are required for existing tools to succeed. We show that CLIP-Dissect provides more accurate descriptions than existing methods for last layer neurons where the ground-truth is available as well as qualitatively good descriptions for hidden layer neurons. In addition, our method is very flexible: it is model agnostic, can easily handle new concepts and can be extended to take advantage of better multimodal models in the future. Finally CLIP-Dissect is computationally efficient and can label all neurons from five layers of ResNet-50 in just four minutes.
    Selectively increasing the diversity of GAN-generated samples. (arXiv:2207.01561v2 [cs.CV] UPDATED)
    Generative Adversarial Networks (GANs) are powerful models able to synthesize data samples closely resembling the distribution of real data, yet the diversity of those generated samples is limited due to the so-called mode collapse phenomenon observed in GANs. Especially prone to mode collapse are conditional GANs, which tend to ignore the input noise vector and focus on the conditional information. Recent methods proposed to mitigate this limitation increase the diversity of generated samples, yet they reduce the performance of the models when similarity of samples is required. To address this shortcoming, we propose a novel method to selectively increase the diversity of GAN-generated samples. By adding a simple, yet effective regularization to the training loss function we encourage the generator to discover new data modes for inputs related to diverse outputs while generating consistent samples for the remaining ones. More precisely, we maximise the ratio of distances between generated images and input latent vectors scaling the effect according to the diversity of samples for a given conditional input. We show the superiority of our method in a synthetic benchmark as well as a real-life scenario of simulating data from the Zero Degree Calorimeter of ALICE experiment in LHC, CERN.
    Towards Transparency in Dermatology Image Datasets with Skin Tone Annotations by Experts, Crowds, and an Algorithm. (arXiv:2207.02942v1 [cs.CV])
    While artificial intelligence (AI) holds promise for supporting healthcare providers and improving the accuracy of medical diagnoses, a lack of transparency in the composition of datasets exposes AI models to the possibility of unintentional and avoidable mistakes. In particular, public and private image datasets of dermatological conditions rarely include information on skin color. As a start towards increasing transparency, AI researchers have appropriated the use of the Fitzpatrick skin type (FST) from a measure of patient photosensitivity to a measure for estimating skin tone in algorithmic audits of computer vision applications including facial recognition and dermatology diagnosis. In order to understand the variability of estimated FST annotations on images, we compare several FST annotation methods on a diverse set of 460 images of skin conditions from both textbooks and online dermatology atlases. We find the inter-rater reliability between three board-certified dermatologists is comparable to the inter-rater reliability between the board-certified dermatologists and two crowdsourcing methods. In contrast, we find that the Individual Typology Angle converted to FST (ITA-FST) method produces annotations that are significantly less correlated with the experts' annotations than the experts' annotations are correlated with each other. These results demonstrate that algorithms based on ITA-FST are not reliable for annotating large-scale image datasets, but human-centered, crowd-based protocols can reliably add skin type transparency to dermatology datasets. Furthermore, we introduce the concept of dynamic consensus protocols with tunable parameters including expert review that increase the visibility of crowdwork and provide guidance for future crowdsourced annotations of large image datasets.
    Self-Supervised Velocity Estimation for Automotive Radar Object Detection Networks. (arXiv:2207.03146v1 [cs.CV])
    This paper presents a method to learn the Cartesian velocity of objects using an object detection network on automotive radar data. The proposed method is self-supervised in terms of generating its own training signal for the velocities. Labels are only required for single-frame, oriented bounding boxes (OBBs). Labels for the Cartesian velocities or contiguous sequences, which are expensive to obtain, are not required. The general idea is to pre-train an object detection network without velocities using single-frame OBB labels, and then exploit the network's OBB predictions on unlabelled data for velocity training. In detail, the network's OBB predictions of the unlabelled frames are updated to the timestamp of a labelled frame using the predicted velocities and the distances between the updated OBBs of the unlabelled frame and the OBB predictions of the labelled frame are used to generate a self-supervised training signal for the velocities. The detection network architecture is extended by a module to account for the temporal relation of multiple scans and a module to represent the radars' radial velocity measurements explicitly. A two-step approach of first training only OBB detection, followed by training OBB detection and velocities is used. Further, a pre-training with pseudo-labels generated from radar radial velocity measurements bootstraps the self-supervised method of this paper. Experiments on the publicly available nuScenes dataset show that the proposed method almost reaches the velocity estimation performance of a fully supervised training, but does not require expensive velocity labels. Furthermore, we outperform a baseline method which uses only radial velocity measurements as labels.
    Variational Nearest Neighbor Gaussian Process. (arXiv:2202.01694v3 [cs.LG] UPDATED)
    Variational approximations to Gaussian processes (GPs) typically use a small set of inducing points to form a low-rank approximation to the covariance matrix. In this work, we instead exploit a sparse approximation of the precision matrix. We propose variational nearest neighbor Gaussian process (VNNGP), which introduces a prior that only retains correlations within K nearest-neighboring observations, thereby inducing sparse precision structure. Using the variational framework, VNNGP's objective can be factorized over both observations and inducing points, enabling stochastic optimization with a time complexity of O($K^3$). Hence, we can arbitrarily scale the inducing point size, even to the point of putting inducing points at every observed location. We compare VNNGP to other scalable GPs through various experiments, and demonstrate that VNNGP (1) can dramatically outperform low-rank methods, and (2) is less prone to overfitting than other nearest neighbor methods.
    Multi-objective Optimization of Notifications Using Offline Reinforcement Learning. (arXiv:2207.03029v1 [cs.LG])
    Mobile notification systems play a major role in a variety of applications to communicate, send alerts and reminders to the users to inform them about news, events or messages. In this paper, we formulate the near-real-time notification decision problem as a Markov Decision Process where we optimize for multiple objectives in the rewards. We propose an end-to-end offline reinforcement learning framework to optimize sequential notification decisions. We address the challenge of offline learning using a Double Deep Q-network method based on Conservative Q-learning that mitigates the distributional shift problem and Q-value overestimation. We illustrate our fully-deployed system and demonstrate the performance and benefits of the proposed approach through both offline and online experiments.
    HE-PEx: Efficient Machine Learning under Homomorphic Encryption using Pruning, Permutation and Expansion. (arXiv:2207.03384v1 [cs.CR])
    Privacy-preserving neural network (NN) inference solutions have recently gained significant traction with several solutions that provide different latency-bandwidth trade-offs. Of these, many rely on homomorphic encryption (HE), a method of performing computations over encrypted data. However, HE operations even with state-of-the-art schemes are still considerably slow compared to their plaintext counterparts. Pruning the parameters of a NN model is a well-known approach to improving inference latency. However, pruning methods that are useful in the plaintext context may lend nearly negligible improvement in the HE case, as has also been demonstrated in recent work. In this work, we propose a novel set of pruning methods that reduce the latency and memory requirement, thus bringing the effectiveness of plaintext pruning methods to HE. Crucially, our proposal employs two key techniques, viz. permutation and expansion of the packed model weights, that enable pruning significantly more ciphertexts and recuperating most of the accuracy loss, respectively. We demonstrate the advantage of our method on fully connected layers where the weights are packed using a recently proposed packing technique called tile tensors, which allows executing deep NN inference in a non-interactive mode. We evaluate our methods on various autoencoder architectures and demonstrate that for a small mean-square reconstruction loss of 1.5*10^{-5} on MNIST, we reduce the memory requirement and latency of HE-enabled inference by 60%.
    Boosting the interpretability of clinical risk scores with intervention predictions. (arXiv:2207.02941v1 [cs.LG])
    Machine learning systems show significant promise for forecasting patient adverse events via risk scores. However, these risk scores implicitly encode assumptions about future interventions that the patient is likely to receive, based on the intervention policy present in the training data. Without this important context, predictions from such systems are less interpretable for clinicians. We propose a joint model of intervention policy and adverse event risk as a means to explicitly communicate the model's assumptions about future interventions. We develop such an intervention policy model on MIMIC-III, a real world de-identified ICU dataset, and discuss some use cases that highlight the utility of this approach. We show how combining typical risk scores, such as the likelihood of mortality, with future intervention probability scores leads to more interpretable clinical predictions.
    Harnessing Out-Of-Distribution Examples via Augmenting Content and Style. (arXiv:2207.03162v1 [cs.LG])
    Machine learning models are vulnerable to Out-Of-Distribution (OOD) examples, such a problem has drawn much attention. However, current methods lack a full understanding of different types of OOD data: there are benign OOD data that can be properly adapted to enhance the learning performance, while other malign OOD data would severely degenerate the classification result. To Harness OOD data, this paper proposes HOOD method that can leverage the content and style from each image instance to identify benign and malign OOD data. Particularly, we design a variational inference framework to causally disentangle content and style features by constructing a structural causal model. Subsequently, we augment the content and style through an intervention process to produce malign and benign OOD data, respectively. The benign OOD data contain novel styles but hold our interested contents, and they can be leveraged to help train a style-invariant model. In contrast, the malign OOD data inherit unknown contents but carry familiar styles, by detecting them can improve model robustness against deceiving anomalies. Thanks to the proposed novel disentanglement and data augmentation techniques, HOOD can effectively deal with OOD examples in unknown and open environments, whose effectiveness is empirically validated in three typical OOD applications including OOD detection, open-set semi-supervised learning, and open-set domain adaptation.
    Shell Language Processing: Unix command parsing for Machine Learning. (arXiv:2107.02438v3 [cs.LG] UPDATED)
    In this article, we present a Shell Language Preprocessing (SLP) library, which implements tokenization and encoding directed at parsing Unix and Linux shell commands. We describe the rationale behind the need for a new approach with specific examples of when conventional Natural Language Processing (NLP) pipelines fail. Furthermore, we evaluate our methodology on a security classification task against widely accepted information and communications technology (ICT) tokenization techniques and achieve significant improvement of an F1 score from 0.392 to 0.874.
    Automating the Design and Development of Gradient Descent Trained Expert System Networks. (arXiv:2207.02845v1 [cs.LG])
    Prior work introduced a gradient descent trained expert system that conceptually combines the learning capabilities of neural networks with the understandability and defensible logic of an expert system. This system was shown to be able to learn patterns from data and to perform decision-making at levels rivaling those reported by neural network systems. The principal limitation of the approach, though, was the necessity for the manual development of a rule-fact network (which is then trained using backpropagation). This paper proposes a technique for overcoming this significant limitation, as compared to neural networks. Specifically, this paper proposes the use of larger and denser-than-application need rule-fact networks which are trained, pruned, manually reviewed and then re-trained for use. Multiple types of networks are evaluated under multiple operating conditions and these results are presented and assessed. Based on these individual experimental condition assessments, the proposed technique is evaluated. The data presented shows that error rates as low as 3.9% (mean, 1.2% median) can be obtained, demonstrating the efficacy of this technique for many applications.
    Betty: An Automatic Differentiation Library for Multilevel Optimization. (arXiv:2207.02849v1 [cs.LG])
    Multilevel optimization has been widely adopted as a mathematical foundation for a myriad of machine learning problems, such as hyperparameter optimization, meta-learning, and reinforcement learning, to name a few. Nonetheless, implementing multilevel optimization programs oftentimes requires expertise in both mathematics and programming, stunting research in this field. We take an initial step towards closing this gap by introducing Betty, a high-level software library for gradient-based multilevel optimization. To this end, we develop an automatic differentiation procedure based on a novel interpretation of multilevel optimization as a dataflow graph. We further abstract the main components of multilevel optimization as Python classes, to enable easy, modular, and maintainable programming. We empirically demonstrate that Betty can be used as a high-level programming interface for an array of multilevel optimization programs, while also observing up to 11\% increase in test accuracy, 14\% decrease in GPU memory usage, and 20\% decrease in wall time over existing implementations on multiple benchmarks. The code is available at this http URL .
    Cardiomegaly Detection using Deep Convolutional Neural Network with U-Net. (arXiv:2205.11515v2 [eess.IV] UPDATED)
    Cardiomegaly is indeed a medical disease in which the heart is enlarged. Cardiomegaly is better to handle if caught early, so early detection is critical. The chest X-ray, being one of the most often used radiography examinations, has been used to detect and visualize abnormalities of human organs for decades. X-ray is also a significant medical diagnosis tool for cardiomegaly. Even for domain experts, distinguishing the many types of diseases from the X-ray is a difficult and time-consuming task. Deep learning models are also most effective when used on huge data sets, yet due to privacy concerns, large datasets are rarely available inside the medical industry. A Deep learning-based customized retrained U-Net model for detecting Cardiomegaly disease is presented in this research. In the training phase, chest X-ray images from the "ChestX-ray8" open source real dataset are used. To reduce computing time, this model performs data preprocessing, picture improvement, image compression, and classification before moving on to the training step. The work used a chest x-ray image dataset to simulate and produced a diagnostic accuracy of 94%, a sensitivity of 96.2 percent, and a specificity of 92.5 percent, which beats prior pre-trained model findings for identifying Cardiomegaly disease.
    Reward is enough for convex MDPs. (arXiv:2106.00661v3 [cs.AI] UPDATED)
    Maximising a cumulative reward function that is Markov and stationary, i.e., defined over state-action pairs and independent of time, is sufficient to capture many kinds of goals in a Markov decision process (MDP). However, not all goals can be captured in this manner. In this paper we study convex MDPs in which goals are expressed as convex functions of the stationary distribution and show that they cannot be formulated using stationary reward functions. Convex MDPs generalize the standard reinforcement learning (RL) problem formulation to a larger framework that includes many supervised and unsupervised RL problems, such as apprenticeship learning, constrained MDPs, and so-called `pure exploration'. Our approach is to reformulate the convex MDP problem as a min-max game involving policy and cost (negative reward) `players', using Fenchel duality. We propose a meta-algorithm for solving this problem and show that it unifies many existing algorithms in the literature.
    Federated Robustness Propagation: Sharing Robustness in Heterogeneous Federated Learning. (arXiv:2106.10196v2 [cs.LG] UPDATED)
    Federated learning (FL) emerges as a popular distributed learning schema that learns a model from a set of participating users without sharing raw data. One major challenge of FL comes with heterogeneous users, who may have distributionally different (or non-iid) data and varying computation resources. As federated users would use the model for prediction, they often demand the trained model to be robust against malicious attackers at test time. Whereas adversarial training (AT) provides a sound solution for centralized learning, extending its usage for federated users has imposed significant challenges, as many users may have very limited training data and tight computational budgets, to afford the data-hungry and costly AT. In this paper, we study a novel FL strategy: propagating adversarial robustness from rich-resource users that can afford AT, to those with poor resources that cannot afford it, during federated learning. We show that existing FL techniques cannot be effectively integrated with the strategy to propagate robustness among non-iid users and propose an efficient propagation approach by the proper use of batch-normalization. We demonstrate the rationality and effectiveness of our method through extensive experiments. Especially, the proposed method is shown to grant federated models remarkable robustness even when only a small portion of users afford AT during learning. Source code will be released.
    The Multivariate Community Hawkes Model for Dependent Relational Events in Continuous-time Networks. (arXiv:2205.00639v2 [stat.ME] UPDATED)
    The stochastic block model (SBM) is one of the most widely used generative models for network data. Many continuous-time dynamic network models are built upon the same assumption as the SBM: edges or events between all pairs of nodes are conditionally independent given the block or community memberships, which prevents them from reproducing higher-order motifs such as triangles that are commonly observed in real networks. We propose the multivariate community Hawkes (MULCH) model, an extremely flexible community-based model for continuous-time networks that introduces dependence between node pairs using structured multivariate Hawkes processes. We fit the model using a spectral clustering and likelihood-based local refinement procedure. We find that our proposed MULCH model is far more accurate than existing models both for predictive and generative tasks.
    Unsupervised Manifold Alignment with Joint Multidimensional Scaling. (arXiv:2207.02968v1 [stat.ML])
    We introduce Joint Multidimensional Scaling, a novel approach for unsupervised manifold alignment, which maps datasets from two different domains, without any known correspondences between data instances across the datasets, to a common low-dimensional Euclidean space. Our approach integrates Multidimensional Scaling (MDS) and Wasserstein Procrustes analysis into a joint optimization problem to simultaneously generate isometric embeddings of data and learn correspondences between instances from two different datasets, while only requiring intra-dataset pairwise dissimilarities as input. This unique characteristic makes our approach applicable to datasets without access to the input features, such as solving the inexact graph matching problem. We propose an alternating optimization scheme to solve the problem that can fully benefit from the optimization techniques for MDS and Wasserstein Procrustes. We demonstrate the effectiveness of our approach in several applications, including joint visualization of two datasets, unsupervised heterogeneous domain adaptation, graph matching, and protein structure alignment.
    Virtual staining of defocused autofluorescence images of unlabeled tissue using deep neural networks. (arXiv:2207.02946v1 [eess.IV])
    Deep learning-based virtual staining was developed to introduce image contrast to label-free tissue sections, digitally matching the histological staining, which is time-consuming, labor-intensive, and destructive to tissue. Standard virtual staining requires high autofocusing precision during the whole slide imaging of label-free tissue, which consumes a significant portion of the total imaging time and can lead to tissue photodamage. Here, we introduce a fast virtual staining framework that can stain defocused autofluorescence images of unlabeled tissue, achieving equivalent performance to virtual staining of in-focus label-free images, also saving significant imaging time by lowering the microscope's autofocusing precision. This framework incorporates a virtual-autofocusing neural network to digitally refocus the defocused images and then transforms the refocused images into virtually stained images using a successive network. These cascaded networks form a collaborative inference scheme: the virtual staining model regularizes the virtual-autofocusing network through a style loss during the training. To demonstrate the efficacy of this framework, we trained and blindly tested these networks using human lung tissue. Using 4x fewer focus points with 2x lower focusing precision, we successfully transformed the coarsely-focused autofluorescence images into high-quality virtually stained H&E images, matching the standard virtual staining framework that used finely-focused autofluorescence input images. Without sacrificing the staining quality, this framework decreases the total image acquisition time needed for virtual staining of a label-free whole-slide image (WSI) by ~32%, together with a ~89% decrease in the autofocusing time, and has the potential to eliminate the laborious and costly histochemical staining process in pathology.
    Efficient Self-supervised Vision Transformers for Representation Learning. (arXiv:2106.09785v2 [cs.CV] UPDATED)
    This paper investigates two techniques for developing efficient self-supervised vision transformers (EsViT) for visual representation learning. First, we show through a comprehensive empirical study that multi-stage architectures with sparse self-attentions can significantly reduce modeling complexity but with a cost of losing the ability to capture fine-grained correspondences between image regions. Second, we propose a new pre-training task of region matching which allows the model to capture fine-grained region dependencies and as a result significantly improves the quality of the learned vision representations. Our results show that combining the two techniques, EsViT achieves 81.3% top-1 on the ImageNet linear probe evaluation, outperforming prior arts with around an order magnitude of higher throughput. When transferring to downstream linear classification tasks, EsViT outperforms its supervised counterpart on 17 out of 18 datasets. The code and models are publicly available: https://github.com/microsoft/esvit
    BioLCNet: Reward-modulated Locally Connected Spiking Neural Networks. (arXiv:2109.05539v5 [cs.NE] UPDATED)
    Brain-inspired computation and information processing alongside compatibility with neuromorphic hardware have made spiking neural networks (SNN) a promising method for solving learning tasks in machine learning (ML). Spiking neurons are only one of the requirements for building a bio-plausible learning model. Network architecture and learning rules are other important factors to consider when developing such artificial agents. In this work, inspired by the human visual pathway and the role of dopamine in learning, we propose a reward-modulated locally connected spiking neural network, BioLCNet, for visual learning tasks. To extract visual features from Poisson-distributed spike trains, we used local filters that are more analogous to the biological visual system compared to convolutional filters with weight sharing. In the decoding layer, we applied a spike population-based voting scheme to determine the decision of the network. We employed Spike-timing-dependent plasticity (STDP) for learning the visual features, and its reward-modulated variant (R-STDP) for training the decoder based on the reward or punishment feedback signal. For evaluation, we first assessed the robustness of our rewarding mechanism to varying target responses in a classical conditioning experiment. Afterwards, we evaluated the performance of our network on image classification tasks of MNIST and XOR MNIST datasets.
    Comprehensive Analysis of Negative Sampling in Knowledge Graph Representation Learning. (arXiv:2206.10140v2 [cs.LG] UPDATED)
    Negative sampling (NS) loss plays an important role in learning knowledge graph embedding (KGE) to handle a huge number of entities. However, the performance of KGE degrades without hyperparameters such as the margin term and number of negative samples in NS loss being appropriately selected. Currently, empirical hyperparameter tuning addresses this problem at the cost of computational time. To solve this problem, we theoretically analyzed NS loss to assist hyperparameter tuning and understand the better use of the NS loss in KGE learning. Our theoretical analysis showed that scoring methods with restricted value ranges, such as TransE and RotatE, require appropriate adjustment of the margin term or the number of negative samples different from those without restricted value ranges, such as RESCAL, ComplEx, and DistMult. We also propose subsampling methods specialized for the NS loss in KGE studied from a theoretical aspect. Our empirical analysis on the FB15k-237, WN18RR, and YAGO3-10 datasets showed that the results of actually trained models agree with our theoretical findings.
    Machine Learning to Predict Aerodynamic Stall. (arXiv:2207.03424v1 [physics.flu-dyn])
    A convolutional autoencoder is trained using a database of airfoil aerodynamic simulations and assessed in terms of overall accuracy and interpretability. The goal is to predict the stall and to investigate the ability of the autoencoder to distinguish between the linear and non-linear response of the airfoil pressure distribution to changes in the angle of attack. After a sensitivity analysis on the learning infrastructure, we investigate the latent space identified by the autoencoder targeting extreme compression rates, i.e. very low-dimensional reconstructions. We also propose a strategy to use the decoder to generate new synthetic airfoil geometries and aerodynamic solutions by interpolation and extrapolation in the latent representation learned by the autoencoder.
    Building Machine Translation Systems for the Next Thousand Languages. (arXiv:2205.03983v3 [cs.CL] UPDATED)
    In this paper we share findings from our effort to build practical machine translation (MT) systems capable of translating across over one thousand languages. We describe results in three research domains: (i) Building clean, web-mined datasets for 1500+ languages by leveraging semi-supervised pre-training for language identification and developing data-driven filtering techniques; (ii) Developing practical MT models for under-served languages by leveraging massively multilingual models trained with supervised parallel data for over 100 high-resource languages and monolingual datasets for an additional 1000+ languages; and (iii) Studying the limitations of evaluation metrics for these languages and conducting qualitative analysis of the outputs from our MT models, highlighting several frequent error modes of these types of models. We hope that our work provides useful insights to practitioners working towards building MT systems for currently understudied languages, and highlights research directions that can complement the weaknesses of massively multilingual models in data-sparse settings.
    Exploring Runtime Decision Support for Trauma Resuscitation. (arXiv:2207.02922v1 [cs.AI])
    AI-based recommender systems have been successfully applied in many domains (e.g., e-commerce, feeds ranking). Medical experts believe that incorporating such methods into a clinical decision support system may help reduce medical team errors and improve patient outcomes during treatment processes (e.g., trauma resuscitation, surgical processes). Limited research, however, has been done to develop automatic data-driven treatment decision support. We explored the feasibility of building a treatment recommender system to provide runtime next-minute activity predictions. The system uses patient context (e.g., demographics and vital signs) and process context (e.g., activities) to continuously predict activities that will be performed in the next minute. We evaluated our system on a pre-recorded dataset of trauma resuscitation and conducted an ablation study on different model variants. The best model achieved an average F1-score of 0.67 for 61 activity types. We include medical team feedback and discuss the future work.
    Perfusion imaging in deep prostate cancer detection from mp-MRI: can we take advantage of it?. (arXiv:2207.02854v1 [eess.IV])
    To our knowledge, all deep computer-aided detection and diagnosis (CAD) systems for prostate cancer (PCa) detection consider bi-parametric magnetic resonance imaging (bp-MRI) only, including T2w and ADC sequences while excluding the 4D perfusion sequence,which is however part of standard clinical protocols for this diagnostic task. In this paper, we question strategies to integrate information from perfusion imaging in deep neural architectures. To do so, we evaluate several ways to encode the perfusion information in a U-Net like architecture, also considering early versus mid fusion strategies. We compare performance of multiparametric MRI (mp-MRI) models with the baseline bp-MRI model based on a private dataset of 219 mp-MRI exams. Perfusion maps derived from dynamic contrast enhanced MR exams are shown to positively impact segmentation and grading performance of PCa lesions, especially the 3D MR volume corresponding to the maximum slope of the wash-in curve as well as Tmax perfusion maps. The latter mp-MRI models indeed outperform the bp-MRI one whatever the fusion strategy, with Cohen's kappa score of 0.318$\pm$0.019 for the bp-MRI model and 0.378 $\pm$ 0.033 for the model including the maximum slope with a mid fusion strategy, also achieving competitive Cohen's kappa score compared to state of the art.
    Speech Enhancement with Score-Based Generative Models in the Complex STFT Domain. (arXiv:2203.17004v2 [eess.AS] UPDATED)
    Score-based generative models (SGMs) have recently shown impressive results for difficult generative tasks such as the unconditional and conditional generation of natural images and audio signals. In this work, we extend these models to the complex short-time Fourier transform (STFT) domain, proposing a novel training task for speech enhancement using a complex-valued deep neural network. We derive this training task within the formalism of stochastic differential equations (SDEs), thereby enabling the use of predictor-corrector samplers. We provide alternative formulations inspired by previous publications on using generative diffusion models for speech enhancement, avoiding the need for any prior assumptions on the noise distribution and making the training task purely generative which, as we show, results in improved enhancement performance.
    Signed Link Representation in Continuous-Time Dynamic Signed Networks. (arXiv:2207.03408v1 [cs.SI])
    Signed networks allow us to model bi-faceted relationships and interactions, such as friend/enemy, support/oppose, etc. These interactions are often temporal in real datasets, where nodes and edges appear over time. Learning the dynamics of signed networks is thus crucial to effectively predict the sign and strength of future links. Existing works model either signed networks or dynamic networks but not both together. In this work, we study dynamic signed networks where links are both signed and evolving with time. Our model learns a Signed link's Evolution using Memory modules and Balanced Aggregation (hence, the name SEMBA). Each node maintains two separate memory encodings for positive and negative interactions. On the arrival of a new edge, each interacting node aggregates this signed information with its memories while exploiting balance theory. Node embeddings are generated using updated memories, which are then used to train for multiple downstream tasks, including link sign prediction and link weight prediction. Our results show that SEMBA outperforms all the baselines on the task of sign prediction by achieving up to an 8% increase in the AUC and up to a 50% reduction in FPR. Results on the task of predicting signed weights show that SEMBA reduces the mean squared error by 9% while achieving up to 69% reduction in the KL-divergence on the distribution of predicted signed weights.
    Efficient fine-grained road segmentation using superpixel-based CNN and CRF models. (arXiv:2207.02844v1 [cs.CV])
    Towards a safe and comfortable driving, road scene segmentation is a rudimentary problem in camera-based advance driver assistance systems (ADAS). Despite of the great achievement of Convolutional Neural Networks (CNN) for semantic segmentation task, the high computational efforts of CNN based methods is still a challenging area. In recent work, we proposed a novel approach to utilise the advantages of CNNs for the task of road segmentation at reasonable computational effort. The runtime benefits from using irregular super pixels as basis for the input for the CNN rather than the image grid, which tremendously reduces the input size. Although, this method achieved remarkable low computational time in both training and testing phases, the lower resolution of the super pixel domain yields naturally lower accuracy compared to high cost state of the art methods. In this work, we focus on a refinement of the road segmentation utilising a Conditional Random Field (CRF).The refinement procedure is limited to the super pixels touching the predicted road boundary to keep the additional computational effort low. Reducing the input to the super pixel domain allows the CNNs structure to stay small and efficient to compute while keeping the advantage of convolutional layers and makes them eligible for ADAS. Applying CRF compensate the trade off between accuracy and computational efficiency. The proposed system obtained comparable performance among the top performing algorithms on the KITTI road benchmark and its fast inference makes it particularly suitable for realtime applications.
    Algebraic and machine learning approach to hierarchical triple-star stability. (arXiv:2207.03151v1 [astro-ph.SR])
    We present two approaches to determine the dynamical stability of a hierarchical triple-star system. The first is an improvement on the semi-analytical stability criterion of Mardling & Aarseth (2001), where we introduce a dependence on inner orbital eccentricity and improve the dependence on mutual orbital inclination. The second involves a machine learning approach, where we use a multilayer perceptron (MLP) to classify triple-star systems as `stable' and `unstable'. To achieve this, we generate a large training data set of 10^6 hierarchical triples using the N-body code MSTAR. Both our approaches perform better than the original Mardling & Aarseth (2001) stability criterion, with the MLP model performing the best. The improved stability formula and the machine learning model have overall classification accuracies of 93 % and 95 % respectively. Our MLP model, which accurately predicts the stability of any hierarchical triple-star system within the parameter ranges studied with almost no computation required, is publicly available on Github in the form of an easy-to-use Python script.
    Multi-scale Sinusoidal Embeddings Enable Learning on High Resolution Mass Spectrometry Data. (arXiv:2207.02980v1 [cs.LG])
    Small molecules in biological samples are studied to provide information about disease states, environmental toxins, natural product drug discovery, and many other applications. The primary window into the composition of small molecule mixtures is tandem mass spectrometry (MS2), which produces data that are of high sensitivity and part per million resolution. We adopt multi-scale sinusoidal embeddings of the mass data in MS2 designed to meet the challenge of learning from the full resolution of MS2 data. Using these embeddings, we provide a new state of the art model for spectral library search, the standard task for initial evaluation of MS2 data. We also introduce a new task, chemical property prediction from MS2 data, that has natural applications in high-throughput MS2 experiments and show that an average $R^2$ of 80\% for novel compounds can be achieved across 10 chemical properties prioritized by medicinal chemists. We use dimensionality reduction techniques and experiments with different floating point resolutions to show the essential role multi-scale sinusoidal embeddings play in learning from MS2 data.
    Machine Learning Model Sizes and the Parameter Gap. (arXiv:2207.02852v1 [cs.LG])
    We study trends in model size of notable machine learning systems over time using a curated dataset. From 1950 to 2018, model size in language models increased steadily by seven orders of magnitude. The trend then accelerated, with model size increasing by another five orders of magnitude in just 4 years from 2018 to 2022. Vision models grew at a more constant pace, totaling 7 orders of magnitude of growth between 1950 and 2022. We also identify that, since 2020, there have been many language models below 20B parameters, many models above 70B parameters, but a scarcity of models in the 20-70B parameter range. We refer to that scarcity as the parameter gap. We provide some stylized facts about the parameter gap and propose a few hypotheses to explain it. The explanations we favor are: (a) increasing model size beyond 20B parameters requires adopting different parallelism techniques, which makes mid-sized models less cost-effective, (b) GPT-3 was one order of magnitude larger than previous language models, and researchers afterwards primarily experimented with bigger models to outperform it. While these dynamics likely exist, and we believe they play some role in generating the gap, we don't have high confidence that there are no other, more important dynamics at play.
    Diagnosing and Remedying Shot Sensitivity with Cosine Few-Shot Learners. (arXiv:2207.03398v1 [cs.CV])
    Few-shot recognition involves training an image classifier to distinguish novel concepts at test time using few examples (shot). Existing approaches generally assume that the shot number at test time is known in advance. This is not realistic, and the performance of a popular and foundational method has been shown to suffer when train and test shots do not match. We conduct a systematic empirical study of this phenomenon. In line with prior work, we find that shot sensitivity is broadly present across metric-based few-shot learners, but in contrast to prior work, larger neural architectures provide a degree of built-in robustness to varying test shot. More importantly, a simple, previously known but greatly overlooked class of approaches based on cosine distance consistently and greatly improves robustness to shot variation, by removing sensitivity to sample noise. We derive cosine alternatives to popular and recent few-shot classifiers, broadening their applicability to realistic settings. These cosine models consistently improve shot-robustness, outperform prior shot-robust state of the art, and provide competitive accuracy on a range of benchmarks and architectures, including notable gains in the very-low-shot regime.
    Toward Force Estimation in Robot-Assisted Surgery using Deep Learning with Vision and Robot State. (arXiv:2011.02112v4 [cs.RO] UPDATED)
    Knowledge of interaction forces during teleoperated robot-assisted surgery could be used to enable force feedback to human operators and evaluate tissue handling skill. However, direct force sensing at the end-effector is challenging because it requires biocompatible, sterilizable, and cost-effective sensors. Vision-based deep learning using convolutional neural networks is a promising approach for providing useful force estimates, though questions remain about generalization to new scenarios and real-time inference. We present a force estimation neural network that uses RGB images and robot state as inputs. Using a self-collected dataset, we compared the network to variants that included only a single input type, and evaluated how they generalized to new viewpoints, workspace positions, materials, and tools. We found that vision-based networks were sensitive to shifts in viewpoints, while state-only networks were robust to changes in workspace. The network with both state and vision inputs had the highest accuracy for an unseen tool, and was moderately robust to changes in viewpoints. Through feature removal studies, we found that using only position features produced better accuracy than using only force features as input. The network with both state and vision inputs outperformed a physics-based baseline model in accuracy. It showed comparable accuracy but faster computation times than a baseline recurrent neural network, making it better suited for real-time applications.
    Human-Robot Commensality: Bite Timing Prediction for Robot-Assisted Feeding in Groups. (arXiv:2207.03348v1 [cs.RO])
    We develop data-driven models to predict when a robot should feed during social dining scenarios. Being able to eat independently with friends and family is considered one of the most memorable and important activities for people with mobility limitations. Robots can potentially help with this activity but robot-assisted feeding is a multi-faceted problem with challenges in bite acquisition, bite timing, and bite transfer. Bite timing in particular becomes uniquely challenging in social dining scenarios due to the possibility of interrupting a social human-robot group interaction during commensality. Our key insight is that bite timing strategies that take into account the delicate balance of social cues can lead to seamless interactions during robot-assisted feeding in a social dining scenario. We approach this problem by collecting a multimodal Human-Human Commensality Dataset (HHCD) containing 30 groups of three people eating together. We use this dataset to analyze human-human commensality behaviors and develop bite timing prediction models in social dining scenarios. We also transfer these models to human-robot commensality scenarios. Our user studies show that prediction improves when our algorithm uses multimodal social signaling cues between diners to model bite timing. The HHCD dataset, videos of user studies, and code will be publicly released after acceptance.
    Speech Emotion: Investigating Model Representations, Multi-Task Learning and Knowledge Distillation. (arXiv:2207.03334v1 [eess.AS])
    Estimating dimensional emotions, such as activation, valence and dominance, from acoustic speech signals has been widely explored over the past few years. While accurate estimation of activation and dominance from speech seem to be possible, the same for valence remains challenging. Previous research has shown that the use of lexical information can improve valence estimation performance. Lexical information can be obtained from pre-trained acoustic models, where the learned representations can improve valence estimation from speech. We investigate the use of pre-trained model representations to improve valence estimation from acoustic speech signal. We also explore fusion of representations to improve emotion estimation across all three emotion dimensions: activation, valence and dominance. Additionally, we investigate if representations from pre-trained models can be distilled into models trained with low-level features, resulting in models with a less number of parameters. We show that fusion of pre-trained model embeddings result in a 79% relative improvement in concordance correlation coefficient CCC on valence estimation compared to standard acoustic feature baseline (mel-filterbank energies), while distillation from pre-trained model embeddings to lower-dimensional representations yielded a relative 12% improvement. Such performance gains were observed over two evaluation sets, indicating that our proposed architecture generalizes across those evaluation sets. We report new state-of-the-art "text-free" acoustic-only dimensional emotion estimation $CCC$ values on two MSP-Podcast evaluation sets.
    DLME: Deep Local-flatness Manifold Embedding. (arXiv:2207.03160v1 [cs.LG])
    Manifold learning~(ML) aims to find low-dimensional embedding from high-dimensional data. Previous works focus on handcraft or easy datasets with simple and ideal scenarios; however, we find they perform poorly on real-world datasets with under-sampling data. Generally, ML methods primarily model data structure and subsequently process a low-dimensional embedding, where the poor local connectivity of under-sampling data in the former step and inappropriate optimization objectives in the later step will lead to \emph{structural distortion} and \emph{underconstrained embedding}. To solve this problem, we propose Deep Local-flatness Manifold Embedding (DLME), a novel ML framework to obtain reliable manifold embedding by reducing distortion. Our proposed DLME constructs semantic manifolds by data augmentation and overcomes \emph{structural distortion} problems with the help of its smooth framework. To overcome \emph{underconstrained embedding}, we design a specific loss for DLME and mathematically demonstrate that it leads to a more suitable embedding based on our proposed Local Flatness Assumption. In the experiments, by showing the effectiveness of DLME on downstream classification, clustering, and visualization tasks with three types of datasets (toy, biological, and image), our experimental results show that DLME outperforms SOTA ML \& contrastive learning (CL) methods.
    Backpropagation on Dynamical Networks. (arXiv:2207.03093v1 [math.DS])
    Dynamical networks are versatile models that can describe a variety of behaviours such as synchronisation and feedback. However, applying these models in real world contexts is difficult as prior information pertaining to the connectivity structure or local dynamics is often unknown and must be inferred from time series observations of network states. Additionally, the influence of coupling interactions between nodes further complicates the isolation of local node dynamics. Given the architectural similarities between dynamical networks and recurrent neural networks (RNN), we propose a network inference method based on the backpropagation through time (BPTT) algorithm commonly used to train recurrent neural networks. This method aims to simultaneously infer both the connectivity structure and local node dynamics purely from observation of node states. An approximation of local node dynamics is first constructed using a neural network. This is alternated with an adapted BPTT algorithm to regress corresponding network weights by minimising prediction errors of the dynamical network based on the previously constructed local models until convergence is achieved. This method was found to be succesful in identifying the connectivity structure for coupled networks of Lorenz, Chua and FitzHugh-Nagumo oscillators. Freerun prediction performance with the resulting local models and weights was found to be comparable to the true system with noisy initial conditions. The method is also extended to non-conventional network couplings such as asymmetric negative coupling.
    DRL-ISP: Multi-Objective Camera ISP with Deep Reinforcement Learning. (arXiv:2207.03081v1 [cs.CV])
    In this paper, we propose a multi-objective camera ISP framework that utilizes Deep Reinforcement Learning (DRL) and camera ISP toolbox that consist of network-based and conventional ISP tools. The proposed DRL-based camera ISP framework iteratively selects a proper tool from the toolbox and applies it to the image to maximize a given vision task-specific reward function. For this purpose, we implement total 51 ISP tools that include exposure correction, color-and-tone correction, white balance, sharpening, denoising, and the others. We also propose an efficient DRL network architecture that can extract the various aspects of an image and make a rigid mapping relationship between images and a large number of actions. Our proposed DRL-based ISP framework effectively improves the image quality according to each vision task such as RAW-to-RGB image restoration, 2D object detection, and monocular depth estimation.
    Group Fairness in Adaptive Submodular Maximization. (arXiv:2207.03364v1 [cs.LG])
    In this paper, we study the classic submodular maximization problem subject to a group fairness constraint under both non-adaptive and adaptive settings. It has been shown that the utility function of many machine learning applications, including data summarization, influence maximization in social networks, and personalized recommendation, satisfies the property of submodularity. Hence, maximizing a submodular function subject to various constraints can be found at the heart of many of those applications. On a high level, submodular maximization aims to select a group of most representative items (e.g., data points). However, the design of most existing algorithms does not incorporate the fairness constraint, leading to under- or over-representation some particular groups. This motivates us to study the fair submodular maximization problem, where we aim to select a group of items to maximize a (possibly non-monotone) submodular utility function subject to a group fairness constraint. To this end, we develop the first constant-factor approximation algorithm for this problem. The design of our algorithm is robust enough to be extended to solving the submodular maximization problem under a more complicated adaptive setting. Moreover, we further extend our study to incorporating a global cardinality constraint.
    Causality-based Neural Network Repair. (arXiv:2204.09274v2 [cs.SE] UPDATED)
    Neural networks have had discernible achievements in a wide range of applications. The wide-spread adoption also raises the concern of their dependability and reliability. Similar to traditional decision-making programs, neural networks can have defects that need to be repaired. The defects may cause unsafe behaviors, raise security concerns or unjust societal impacts. In this work, we address the problem of repairing a neural network for desirable properties such as fairness and the absence of backdoor. The goal is to construct a neural network that satisfies the property by (minimally) adjusting the given neural network's parameters (i.e., weights). Specifically, we propose CARE (\textbf{CA}usality-based \textbf{RE}pair), a causality-based neural network repair technique that 1) performs causality-based fault localization to identify the `guilty' neurons and 2) optimizes the parameters of the identified neurons to reduce the misbehavior. We have empirically evaluated CARE on various tasks such as backdoor removal, neural network repair for fairness and safety properties. Our experiment results show that CARE is able to repair all neural networks efficiently and effectively. For fairness repair tasks, CARE successfully improves fairness by $61.91\%$ on average. For backdoor removal tasks, CARE reduces the attack success rate from over $98\%$ to less than $1\%$. For safety property repair tasks, CARE reduces the property violation rate to less than $1\%$. Results also show that thanks to the causality-based fault localization, CARE's repair focuses on the misbehavior and preserves the accuracy of the neural networks.
    DecisioNet -- A Binary-Tree Structured Neural Network. (arXiv:2207.01127v2 [cs.CV] UPDATED)
    Deep neural networks (DNNs) and decision trees (DTs) are both state-of-the-art classifiers. DNNs perform well due to their representational learning capabilities, while DTs are computationally efficient as they perform inference along one route (root-to-leaf) that is dependent on the input data. In this paper, we present DecisioNet (DN), a binary-tree structured neural network. We propose a systematic way to convert an existing DNN into a DN to create a lightweight version of the original model. DecisioNet takes the best of both worlds - it uses neural modules to perform representational learning and utilizes its tree structure to perform only a portion of the computations. We evaluate various DN architectures, along with their corresponding baseline models on the FashionMNIST, CIFAR10, and CIFAR100 datasets. We show that the DN variants achieve similar accuracy while significantly reducing the computational cost of the original network.
    FewSOL: A Dataset for Few-Shot Object Learning in Robotic Environments. (arXiv:2207.03333v1 [cs.CV])
    We introduce the Few-Shot Object Learning (FewSOL) dataset for object recognition with a few images per object. We captured 336 real-world objects with 9 RGB-D images per object from different views. Object segmentation masks, object poses and object attributes are provided. In addition, synthetic images generated using 330 3D object models are used to augment the dataset. We investigated (i) few-shot object classification and (ii) joint object segmentation and few-shot classification with the state-of-the-art methods for few-shot learning and meta-learning using our dataset. The evaluation results show that there is still a large margin to be improved for few-shot object classification in robotic environments. Our dataset can be used to study a set of few-shot object recognition problems such as classification, detection and segmentation, shape reconstruction, pose estimation, keypoint correspondences and attribute recognition. The dataset and code are available at https://irvlutd.github.io/FewSOL.
    Red PANDA: Disambiguating Anomaly Detection by Removing Nuisance Factors. (arXiv:2207.03478v1 [cs.CV])
    Anomaly detection methods strive to discover patterns that differ from the norm in a semantic way. This goal is ambiguous as a data point differing from the norm by an attribute e.g., age, race or gender, may be considered anomalous by some operators while others may consider this attribute irrelevant. Breaking from previous research, we present a new anomaly detection method that allows operators to exclude an attribute from being considered as relevant for anomaly detection. Our approach then learns representations which do not contain information over the nuisance attributes. Anomaly scoring is performed using a density-based approach. Importantly, our approach does not require specifying the attributes that are relevant for detecting anomalies, which is typically impossible in anomaly detection, but only attributes to ignore. An empirical investigation is presented verifying the effectiveness of our approach.
    SSLGuard: A Watermarking Scheme for Self-supervised Learning Pre-trained Encoders. (arXiv:2201.11692v3 [cs.CR] UPDATED)
    Self-supervised learning is an emerging machine learning (ML) paradigm. Compared to supervised learning which leverages high-quality labeled datasets to achieve good performance, self-supervised learning relies on unlabeled datasets to pre-train powerful encoders which can then be treated as feature extractors for various downstream tasks. The huge amount of data and computational resources consumption makes the encoders themselves become valuable intellectual property of the model owner. Recent research has shown that the ML model's copyright is threatened by model stealing attacks, which aim to train a surrogate model to mimic the behavior of a given model. We empirically show that pre-trained encoders are highly vulnerable to model stealing attacks. However, most of the current efforts of copyright protection algorithms such as watermarking concentrate on classifiers. Meanwhile, the intrinsic challenges of pre-trained encoder's copyright protection remain largely unstudied. We fill the gap by proposing SSLGuard, the first watermarking algorithm for pre-trained encoders. Given a clean pre-trained encoder, SSLGuard injects a watermark into it and outputs a watermarked version. The shadow training technique is also applied to preserve the watermark under potential model stealing attacks. Our extensive evaluation shows that SSLGuard is effective in watermark injection and verification, and is robust against model stealing and other watermark removal attacks such as input noising, output perturbing, overwriting, model pruning, and fine-tuning.
    Brainish: Formalizing A Multimodal Language for Intelligence and Consciousness. (arXiv:2205.00001v3 [cs.AI] UPDATED)
    Having a rich multimodal inner language is an important component of human intelligence that enables several necessary core cognitive functions such as multimodal prediction, translation, and generation. Building upon the Conscious Turing Machine (CTM), a machine model for consciousness proposed by Blum and Blum (2021), we describe the desiderata of a multimodal language called Brainish, comprising words, images, audio, and sensations combined in representations that the CTM's processors use to communicate with each other. We define the syntax and semantics of Brainish before operationalizing this language through the lens of multimodal artificial intelligence, a vibrant research area studying the computational tools necessary for processing and relating information from heterogeneous signals. Our general framework for learning Brainish involves designing (1) unimodal encoders to segment and represent unimodal data, (2) a coordinated representation space that relates and composes unimodal features to derive holistic meaning across multimodal inputs, and (3) decoders to map multimodal representations into predictions (for fusion) or raw data (for translation or generation). Through discussing how Brainish is crucial for communication and coordination in order to achieve consciousness in the CTM, and by implementing a simple version of Brainish and evaluating its capability of demonstrating intelligence on multimodal prediction and retrieval tasks on several real-world image, text, and audio datasets, we argue that such an inner language will be important for advances in machine models of intelligence and consciousness.
    Tensor networks in machine learning. (arXiv:2207.02851v1 [quant-ph])
    A tensor network is a type of decomposition used to express and approximate large arrays of data. A given data-set, quantum state or higher dimensional multi-linear map is factored and approximated by a composition of smaller multi-linear maps. This is reminiscent to how a Boolean function might be decomposed into a gate array: this represents a special case of tensor decomposition, in which the tensor entries are replaced by 0, 1 and the factorisation becomes exact. The collection of associated techniques are called, tensor network methods: the subject developed independently in several distinct fields of study, which have more recently become interrelated through the language of tensor networks. The tantamount questions in the field relate to expressability of tensor networks and the reduction of computational overheads. A merger of tensor networks with machine learning is natural. On the one hand, machine learning can aid in determining a factorization of a tensor network approximating a data set. On the other hand, a given tensor network structure can be viewed as a machine learning model. Herein the tensor network parameters are adjusted to learn or classify a data-set. In this survey we recover the basics of tensor networks and explain the ongoing effort to develop the theory of tensor networks in machine learning.
    A Survey on Hyperlink Prediction. (arXiv:2207.02911v1 [cs.LG])
    As a natural extension of link prediction on graphs, hyperlink prediction aims for the inference of missing hyperlinks in hypergraphs, where a hyperlink can connect more than two nodes. Hyperlink prediction has applications in a wide range of systems, from chemical reaction networks, social communication networks, to protein-protein interaction networks. In this paper, we provide a systematic and comprehensive survey on hyperlink prediction. We propose a new taxonomy to classify existing hyperlink prediction methods into four categories: similarity-based, probability-based, matrix optimization-based, and deep learning-based methods. To compare the performance of methods from different categories, we perform a benchmark study on various hypergraph applications using representative methods from each category. Notably, deep learning-based methods prevail over other methods in hyperlink prediction.
    Provable Domain Generalization via Invariant-Feature Subspace Recovery. (arXiv:2201.12919v2 [cs.LG] UPDATED)
    Domain generalization asks for models trained over a set of training environments to perform well in unseen test environments. Recently, a series of algorithms such as Invariant Risk Minimization (IRM) has been proposed for domain generalization. However, Rosenfeld et al. (2021) shows that in a simple linear data model, even if non-convexity issues are ignored, IRM and its extensions cannot generalize to unseen environments with less than $d_s+1$ training environments, where $d_s$ is the dimension of the spurious-feature subspace. In this paper, we propose to achieve domain generalization with Invariant-feature Subspace Recovery (ISR). Our first algorithm, ISR-Mean, can identify the subspace spanned by invariant features from the first-order moments of the class-conditional distributions, and achieve provable domain generalization with $d_s+1$ training environments under the data model of Rosenfeld et al. (2021). Our second algorithm, ISR-Cov, further reduces the required number of training environments to $O(1)$ using the information of second-order moments. Notably, unlike IRM, our algorithms bypass non-convexity issues and enjoy global convergence guarantees. Empirically, our ISRs can obtain superior performance compared with IRM on synthetic benchmarks. In addition, on three real-world image and text datasets, we show that both ISRs can be used as simple yet effective post-processing methods to improve the worst-case accuracy of (pre-)trained models against spurious correlations and group shifts.
    Robot Learning of Mobile Manipulation with Reachability Behavior Priors. (arXiv:2203.04051v3 [cs.RO] UPDATED)
    Mobile Manipulation (MM) systems are ideal candidates for taking up the role of a personal assistant in unstructured real-world environments. Among other challenges, MM requires effective coordination of the robot's embodiments for executing tasks that require both mobility and manipulation. Reinforcement Learning (RL) holds the promise of endowing robots with adaptive behaviors, but most methods require prohibitively large amounts of data for learning a useful control policy. In this work, we study the integration of robotic reachability priors in actor-critic RL methods for accelerating the learning of MM for reaching and fetching tasks. Namely, we consider the problem of optimal base placement and the subsequent decision of whether to activate the arm for reaching a 6D target. For this, we devise a novel Hybrid RL method that handles discrete and continuous actions jointly, resorting to the Gumbel-Softmax reparameterization. Next, we train a reachability prior using data from the operational robot workspace, inspired by classical methods. Subsequently, we derive Boosted Hybrid RL (BHyRL), a novel algorithm for learning Q-functions by modeling them as a sum of residual approximators. Every time a new task needs to be learned, we can transfer our learned residuals and learn the component of the Q-function that is task-specific, hence, maintaining the task structure from prior behaviors. Moreover, we find that regularizing the target policy with a prior policy yields more expressive behaviors. We evaluate our method in simulation in reaching and fetching tasks of increasing difficulty, and we show the superior performance of BHyRL against baseline methods. Finally, we zero-transfer our learned 6D fetching policy with BHyRL to our MM robot TIAGo++. For more details and code release, please refer to our project site: irosalab.com/rlmmbp  ( 3 min )
    Characterizing player's playing styles based on Player Vectors for each playing position in the Chinese Football Super League. (arXiv:2205.02731v2 [cs.LG] UPDATED)
    Characterizing playing style is important for football clubs on scouting, monitoring and match preparation. Previous studies considered a player's style as a combination of technical performances, failing to consider the spatial information. Therefore, this study aimed to characterize the playing styles of each playing position in the Chinese Football Super League (CSL) matches, integrating a recently adopted Player Vectors framework. Data of 960 matches from 2016-2019 CSL were used. Match ratings, and ten types of match events with the corresponding coordinates for all the lineup players whose on-pitch time exceeded 45 minutes were extracted. Players were first clustered into 8 positions. A player vector was constructed for each player in each match based on the Player Vectors using Nonnegative Matrix Factorization (NMF). Another NMF process was run on the player vectors to extract different types of playing styles. The resulting player vectors discovered 18 different playing styles in the CSL. Six performance indicators of each style were investigated to observe their contributions. In general, the playing styles of forwards and midfielders are in line with football performance evolution trends, while the styles of defenders should be reconsidered. Multifunctional playing styles were also found in high rated CSL players.  ( 3 min )
    A domain-specific language for describing machine learning dataset. (arXiv:2207.02848v1 [cs.LG])
    Datasets play a central role in the training and evaluation of machine learning (ML) models. But they are also the root cause of many undesired model behaviors, such as biased predictions. To overcome this situation, the ML community is proposing a data-centric cultural shift where data issues are given the attention they deserve, and more standard practices around the gathering and processing of datasets start to be discussed and established. So far, these proposals are mostly high-level guidelines described in natural language and, as such, they are difficult to formalize and apply to particular datasets. In this sense, and inspired by these proposals, we define a new domain-specific language (DSL) to precisely describe machine learning datasets in terms of their structure, data provenance, and social concerns. We believe this DSL will facilitate any ML initiative to leverage and benefit from this data-centric shift in ML (e.g., selecting the most appropriate dataset for a new project or better replicating other ML results). The DSL is implemented as a Visual Studio Code plugin, and it has been published under an open source license.
    DeepAdversaries: Examining the Robustness of Deep Learning Models for Galaxy Morphology Classification. (arXiv:2112.14299v3 [cs.LG] UPDATED)
    With increased adoption of supervised deep learning methods for processing and analysis of cosmological survey data, the assessment of data perturbation effects (that can naturally occur in the data processing and analysis pipelines) and the development of methods that increase model robustness are increasingly important. In the context of morphological classification of galaxies, we study the effects of perturbations in imaging data. In particular, we examine the consequences of using neural networks when training on baseline data and testing on perturbed data. We consider perturbations associated with two primary sources: 1) increased observational noise as represented by higher levels of Poisson noise and 2) data processing noise incurred by steps such as image compression or telescope errors as represented by one-pixel adversarial attacks. We also test the efficacy of domain adaptation techniques in mitigating the perturbation-driven errors. We use classification accuracy, latent space visualizations, and latent space distance to assess model robustness. Without domain adaptation, we find that processing pixel-level errors easily flip the classification into an incorrect class and that higher observational noise makes the model trained on low-noise data unable to classify galaxy morphologies. On the other hand, we show that training with domain adaptation improves model robustness and mitigates the effects of these perturbations, improving the classification accuracy by 23% on data with higher observational noise. Domain adaptation also increases by a factor of ~2.3 the latent space distance between the baseline and the incorrectly classified one-pixel perturbed image, making the model more robust to inadvertent perturbations.  ( 3 min )
    Interpretable Deep Causal Learning for Moderation Effects. (arXiv:2206.10261v2 [cs.LG] UPDATED)
    In this extended abstract paper, we address the problem of interpretability and targeted regularization in causal machine learning models. In particular, we focus on the problem of estimating individual causal/treatment effects under observed confounders, which can be controlled for and moderate the effect of the treatment on the outcome of interest. Black-box ML models adjusted for the causal setting perform generally well in this task, but they lack interpretable output identifying the main drivers of treatment heterogeneity and their functional relationship. We propose a novel deep counterfactual learning architecture for estimating individual treatment effects that can simultaneously: i) convey targeted regularization on, and produce quantify uncertainty around the quantity of interest (i.e., the Conditional Average Treatment Effect); ii) disentangle baseline prognostic and moderating effects of the covariates and output interpretable score functions describing their relationship with the outcome. Finally, we demonstrate the use of the method via a simple simulated experiment.  ( 2 min )
    Building separable approximations for quantum states via neural networks. (arXiv:2112.08055v5 [quant-ph] UPDATED)
    Finding the closest separable state to a given target state is a notoriously difficult task, even more difficult than deciding whether a state is entangled or separable. To tackle this task, we parametrize separable states with a neural network and train it to minimize the distance to a given target state, with respect to a differentiable distance, such as the trace distance or Hilbert--Schmidt distance. By examining the output of the algorithm, we obtain an upper bound on the entanglement of the target state, and construct an approximation for its closest separable state. We benchmark the method on a variety of well-known classes of bipartite states and find excellent agreement, even up to local dimension of $d=10$, while providing conjectures and analytic insight for isotropic and Werner states. Moreover, we show our method to be efficient in the multipartite case, considering different notions of separability. Examining three and four-party GHZ and W states we recover known bounds and obtain additional ones, for instance for triseparability.  ( 3 min )
    Patient-specific modelling, simulation and real time processing for constrictive respiratory diseases. (arXiv:2207.01082v2 [eess.IV] UPDATED)
    Asthma is a common chronic disease of the respiratory system causing significant disability and societal burden. It affects over 500 million people worldwide and generates costs exceeding $USD 56 billion in 2011 in the United States. Managing asthma involves controlling symptoms, preventing exacerbations, and maintaining lung function. Improving asthma control affects the daily life of patients and is associated with a reduced risk of exacerbations and lung function impairment, reduces the cost of asthma care and indirect costs associated with reduced productivity. Understanding the complex dynamics of the pulmonary system and the lung's response to disease, injury, and treatment is fundamental to the advancement of Asthma treatment. Computational models of the respiratory system seek to provide a theoretical framework to understand the interaction between structure and function. Their application can improve pulmonary medicine by a patient-specific approach to medicinal methodologies optimizing the delivery given the personalized geometry and personalized ventilation patterns while introducing a patient-specific technique that maximizes drug delivery. A three-fold objective addressed within this dissertation becomes prominent at this point. The first part refers to the comprehension of pulmonary pathophysiology and the mechanics of Asthma and subsequently of constrictive pulmonary conditions in general. The second part refers to the design and implementation of tools that facilitate personalized medicine to improve delivery and effectiveness. Finally, the third part refers to the self-management of the condition, meaning that medical personnel and patients have access to tools and methods that allow the first party to easily track the course of the condition and the second party, i.e. the patient to easily self-manage it alleviating the significant burden from the health system.  ( 3 min )
    An Exploration of How Training Set Composition Bias in Machine Learning Affects Identifying Rare Objects. (arXiv:2207.03207v1 [cs.LG])
    When training a machine learning classifier on data where one of the classes is intrinsically rare, the classifier will often assign too few sources to the rare class. To address this, it is common to up-weight the examples of the rare class to ensure it isn't ignored. It is also a frequent practice to train on restricted data where the balance of source types is closer to equal for the same reason. Here we show that these practices can bias the model toward over-assigning sources to the rare class. We also explore how to detect when training data bias has had a statistically significant impact on the trained model's predictions, and how to reduce the bias's impact. While the magnitude of the impact of the techniques developed here will vary with the details of the application, for most cases it should be modest. They are, however, universally applicable to every time a machine learning classification model is used, making them analogous to Bessel's correction to the sample variance.
    Quantum Advantage in Variational Bayes Inference. (arXiv:2207.03104v1 [stat.ML])
    Variational Bayes (VB) inference algorithm is used widely to estimate both the parameters and the unobserved hidden variables in generative statistical models. The algorithm -- inspired by variational methods used in computational physics -- is iterative and can get easily stuck in local minima, even when classical techniques, such as deterministic annealing (DA), are used. We study a variational Bayes (VB) inference algorithm based on a non-traditional quantum annealing approach -- referred to as quantum annealing variational Bayes (QAVB) inference -- and show that there is indeed a quantum advantage to QAVB over its classical counterparts. In particular, we show that such better performance is rooted in key concepts from quantum mechanics: (i) the ground state of the Hamiltonian of a quantum system -- defined from the given variational Bayes (VB) problem -- corresponds to an optimal solution for the minimization problem of the variational free energy at very low temperatures; (ii) such a ground state can be achieved by a technique paralleling the quantum annealing process; and (iii) starting from this ground state, the optimal solution to the VB problem can be achieved by increasing the heat bath temperature to unity, and thereby avoiding local minima introduced by spontaneous symmetry breaking observed in classical physics based VB algorithms. We also show that the update equations of QAVB can be potentially implemented using $\lceil \log K \rceil$ qubits and $\mathcal{O} (K)$ operations per step. Thus, QAVB can match the time complexity of existing VB algorithms, while delivering higher performance.
    Lower Bounds on the Generalization Error of Nonlinear Learning Models. (arXiv:2103.14723v3 [stat.ML] UPDATED)
    We study in this paper lower bounds for the generalization error of models derived from multi-layer neural networks, in the regime where the size of the layers is commensurate with the number of samples in the training data. We show that unbiased estimators have unacceptable performance for such nonlinear networks in this regime. We derive explicit generalization lower bounds for general biased estimators, in the cases of linear regression and of two-layered networks. In the linear case the bound is asymptotically tight. In the nonlinear case, we provide a comparison of our bounds with an empirical study of the stochastic gradient descent algorithm. The analysis uses elements from the theory of large random matrices.  ( 2 min )
    On the Equivalence between Neural Network and Support Vector Machine. (arXiv:2111.06063v2 [stat.ML] UPDATED)
    Recent research shows that the dynamics of an infinitely wide neural network (NN) trained by gradient descent can be characterized by Neural Tangent Kernel (NTK) \citep{jacot2018neural}. Under the squared loss, the infinite-width NN trained by gradient descent with an infinitely small learning rate is equivalent to kernel regression with NTK \citep{arora2019exact}. However, the equivalence is only known for ridge regression currently \citep{arora2019harnessing}, while the equivalence between NN and other kernel machines (KMs), e.g. support vector machine (SVM), remains unknown. Therefore, in this work, we propose to establish the equivalence between NN and SVM, and specifically, the infinitely wide NN trained by soft margin loss and the standard soft margin SVM with NTK trained by subgradient descent. Our main theoretical results include establishing the equivalences between NNs and a broad family of $\ell_2$ regularized KMs with finite-width bounds, which cannot be handled by prior work, and showing that every finite-width NN trained by such regularized loss functions is approximately a KM. Furthermore, we demonstrate our theory can enable three practical applications, including (i) \textit{non-vacuous} generalization bound of NN via the corresponding KM; (ii) \textit{non-trivial} robustness certificate for the infinite-width NN (while existing robustness verification methods would provide vacuous bounds); (iii) intrinsically more robust infinite-width NNs than those from previous kernel regression. Our code for the experiments is available at \url{https://github.com/leslie-CH/equiv-nn-svm}.  ( 3 min )
    Adaptive Resonance Theory-based Topological Clustering with a Divisive Hierarchical Structure Capable of Continual Learning. (arXiv:2201.10713v4 [cs.LG] UPDATED)
    Adaptive Resonance Theory (ART) is considered as an effective approach for realizing continual learning thanks to its ability to handle the plasticity-stability dilemma. In general, however, the clustering performance of ART-based algorithms strongly depends on the specification of a similarity threshold, i.e., a vigilance parameter, which is data-dependent and specified by hand. This paper proposes an ART-based topological clustering algorithm with a mechanism that automatically estimates a similarity threshold from the distribution of data points. In addition, for improving information extraction performance, a divisive hierarchical clustering algorithm capable of continual learning is proposed by introducing a hierarchical structure to the proposed algorithm. Experimental results demonstrate that the proposed algorithm has high clustering performance comparable with recently-proposed state-of-the-art hierarchical clustering algorithms.  ( 2 min )
    Inferring Structural Parameters of Low-Surface-Brightness-Galaxies with Uncertainty Quantification using Bayesian Neural Networks. (arXiv:2207.03471v1 [astro-ph.IM])
    Measuring the structural parameters (size, total brightness, light concentration, etc.) of galaxies is a significant first step towards a quantitative description of different galaxy populations. In this work, we demonstrate that a Bayesian Neural Network (BNN) can be used for the inference, with uncertainty quantification, of such morphological parameters from simulated low-surface-brightness galaxy images. Compared to traditional profile-fitting methods, we show that the uncertainties obtained using BNNs are comparable in magnitude, well-calibrated, and the point estimates of the parameters are closer to the true values. Our method is also significantly faster, which is very important with the advent of the era of large galaxy surveys and big data in astrophysics.  ( 2 min )
    Don't overfit the history -- Recursive time series data augmentation. (arXiv:2207.02891v1 [cs.LG])
    Time series observations can be seen as realizations of an underlying dynamical system governed by rules that we typically do not know. For time series learning tasks, we need to understand that we fit our model on available data, which is a unique realized history. Training on a single realization often induces severe overfitting lacking generalization. To address this issue, we introduce a general recursive framework for time series augmentation, which we call Recursive Interpolation Method, denoted as RIM. New samples are generated using a recursive interpolation function of all previous values in such a way that the enhanced samples preserve the original inherent time series dynamics. We perform theoretical analysis to characterize the proposed RIM and to guarantee its test performance. We apply RIM to diverse real world time series cases to achieve strong performance over non-augmented data on regression, classification, and reinforcement learning tasks.  ( 2 min )
    Training Transformers Together. (arXiv:2207.03481v1 [cs.LG])
    The infrastructure necessary for training state-of-the-art models is becoming overly expensive, which makes training such models affordable only to large corporations and institutions. Recent work proposes several methods for training such models collaboratively, i.e., by pooling together hardware from many independent parties and training a shared model over the Internet. In this demonstration, we collaboratively trained a text-to-image transformer similar to OpenAI DALL-E. We invited the viewers to join the ongoing training run, showing them instructions on how to contribute using the available hardware. We explained how to address the engineering challenges associated with such a training run (slow communication, limited memory, uneven performance between devices, and security concerns) and discussed how the viewers can set up collaborative training runs themselves. Finally, we show that the resulting model generates images of reasonable quality on a number of prompts.  ( 2 min )
    Back to the Basics: Revisiting Out-of-Distribution Detection Baselines. (arXiv:2207.03061v1 [cs.LG])
    We study simple methods for out-of-distribution (OOD) image detection that are compatible with any already trained classifier, relying on only its predictions or learned representations. Evaluating the OOD detection performance of various methods when utilized with ResNet-50 and Swin Transformer models, we find methods that solely consider the model's predictions can be easily outperformed by also considering the learned representations. Based on our analysis, we advocate for a dead-simple approach that has been neglected in other studies: simply flag as OOD images whose average distance to their K nearest neighbors is large (in the representation space of an image classifier trained on the in-distribution data).  ( 2 min )
    Finding Fallen Objects Via Asynchronous Audio-Visual Integration. (arXiv:2207.03483v1 [cs.CV])
    The way an object looks and sounds provide complementary reflections of its physical properties. In many settings cues from vision and audition arrive asynchronously but must be integrated, as when we hear an object dropped on the floor and then must find it. In this paper, we introduce a setting in which to study multi-modal object localization in 3D virtual environments. An object is dropped somewhere in a room. An embodied robot agent, equipped with a camera and microphone, must determine what object has been dropped -- and where -- by combining audio and visual signals with knowledge of the underlying physics. To study this problem, we have generated a large-scale dataset -- the Fallen Objects dataset -- that includes 8000 instances of 30 physical object categories in 64 rooms. The dataset uses the ThreeDWorld platform which can simulate physics-based impact sounds and complex physical interactions between objects in a photorealistic setting. As a first step toward addressing this challenge, we develop a set of embodied agent baselines, based on imitation learning, reinforcement learning, and modular planning, and perform an in-depth analysis of the challenge of this new task.  ( 3 min )
    Low-resource Low-footprint Wake-word Detection using Knowledge Distillation. (arXiv:2207.03331v1 [eess.AS])
    As virtual assistants have become more diverse and specialized, so has the demand for application or brand-specific wake words. However, the wake-word-specific datasets typically used to train wake-word detectors are costly to create. In this paper, we explore two techniques to leverage acoustic modeling data for large-vocabulary speech recognition to improve a purpose-built wake-word detector: transfer learning and knowledge distillation. We also explore how these techniques interact with time-synchronous training targets to improve detection latency. Experiments are presented on the open-source "Hey Snips" dataset and a more challenging in-house far-field dataset. Using phone-synchronous targets and knowledge distillation from a large acoustic model, we are able to improve accuracy across dataset sizes for both datasets while reducing latency.  ( 2 min )
    Comparing the Utility and Disclosure Risk of Synthetic Data with Samples of Microdata. (arXiv:2207.03339v1 [cs.CR])
    Most statistical agencies release randomly selected samples of Census microdata, usually with sample fractions under 10% and with other forms of statistical disclosure control (SDC) applied. An alternative to SDC is data synthesis, which has been attracting growing interest, yet there is no clear consensus on how to measure the associated utility and disclosure risk of the data. The ability to produce synthetic Census microdata, where the utility and associated risks are clearly understood, could mean that more timely and wider-ranging access to microdata would be possible. This paper follows on from previous work by the authors which mapped synthetic Census data on a risk-utility (R-U) map. The paper presents a framework to measure the utility and disclosure risk of synthetic data by comparing it to samples of the original data of varying sample fractions, thereby identifying the sample fraction which has equivalent utility and risk to the synthetic data. Three commonly used data synthesis packages are compared with some interesting results. Further work is needed in several directions but the methodology looks very promising.  ( 2 min )
    Machine learning of percolation models using graph convolutional neural networks. (arXiv:2207.03368v1 [cond-mat.stat-mech])
    Percolation is an important topic in climate, physics, materials science, epidemiology, finance, and so on. Prediction of percolation thresholds with machine learning methods remains challenging. In this paper, we build a powerful graph convolutional neural network to study the percolation in both supervised and unsupervised ways. From a supervised learning perspective, the graph convolutional neural network simultaneously and correctly trains data of different lattice types, such as the square and triangular lattices. For the unsupervised perspective, combining the graph convolutional neural network and the confusion method, the percolation threshold can be obtained by the "W" shaped performance. The finding of this work opens up the possibility of building a more general framework that can probe the percolation-related phenomenon.  ( 2 min )
    For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria. (arXiv:2207.03470v1 [cs.GT])
    Although it has been known since the 1970s that a globally optimal strategy profile in a common-payoff game is a Nash equilibrium, global optimality is a strict requirement that limits the result's applicability. In this work, we show that any locally optimal symmetric strategy profile is also a (global) Nash equilibrium. Furthermore, we show that this result is robust to perturbations to the common payoff and to the local optimum. Applied to machine learning, our result provides a global guarantee for any gradient method that finds a local optimum in symmetric strategy space. While this result indicates stability to unilateral deviation, we nevertheless identify broad classes of games where mixed local optima are unstable under joint, asymmetric deviations. We analyze the prevalence of instability by running learning algorithms in a suite of symmetric games, and we conclude by discussing the applicability of our results to multi-agent RL, cooperative inverse RL, and decentralized POMDPs.  ( 2 min )
    Learning Interpretable Models Using an Oracle. (arXiv:1906.06852v4 [cs.LG] UPDATED)
    We look at a specific aspect of model interpretability: models often need to be constrained in size for them to be considered interpretable, e.g., a decision tree of depth 5 is easier to interpret than one of depth 50. But smaller models also tend to have high bias. This suggests a trade-off between interpretability and accuracy. We propose a model agnostic technique to minimize this trade-off. Our strategy is to first learn an oracle, a highly accurate probabilistic model on the training data. The uncertainty in the oracle's predictions are used to learn a sampling distribution for the training data. The interpretable model is then trained on a data sample obtained using this distribution, leading often to significantly greater accuracy. We formulate the sampling strategy as an optimization problem. Our solution1 possesses the following key favorable properties: (1) it uses a fixed number of seven optimization variables, irrespective of the dimensionality of the data (2) it is model agnostic - in that both the interpretable model and the oracle may belong to arbitrary model families (3) it has a flexible notion of model size, and can accommodate vector sizes (4) it is a framework, enabling it to benefit from progress in the area of optimization. We also present the following interesting observations: (a) In general, the optimal training distribution at small model sizes is different from the test distribution; (b) This effect exists even when the interpretable model and the oracle are from highly disparate model families: we show this on a text classification task, by using a Gated Recurrent Unit network as an oracle to improve the sequence classification accuracy of a Decision Tree that uses character n-grams; (c) Our technique may be used to identify an optimal training sample of a given sample size, for a model.  ( 3 min )
    Distilling Ensemble of Explanations for Weakly-Supervised Pre-Training of Image Segmentation Models. (arXiv:2207.03335v1 [cs.CV])
    While fine-tuning pre-trained networks has become a popular way to train image segmentation models, such backbone networks for image segmentation are frequently pre-trained using image classification source datasets, e.g., ImageNet. Though image classification datasets could provide the backbone networks with rich visual features and discriminative ability, they are incapable of fully pre-training the target model (i.e., backbone+segmentation modules) in an end-to-end manner. The segmentation modules are left to random initialization in the fine-tuning process due to the lack of segmentation labels in classification datasets. In our work, we propose a method that leverages Pseudo Semantic Segmentation Labels (PSSL), to enable the end-to-end pre-training for image segmentation models based on classification datasets. PSSL was inspired by the observation that the explanation results of classification models, obtained through explanation algorithms such as CAM, SmoothGrad and LIME, would be close to the pixel clusters of visual objects. Specifically, PSSL is obtained for each image by interpreting the classification results and aggregating an ensemble of explanations queried from multiple classifiers to lower the bias caused by single models. With PSSL for every image of ImageNet, the proposed method leverages a weighted segmentation learning procedure to pre-train the segmentation network en masse. Experiment results show that, with ImageNet accompanied by PSSL as the source dataset, the proposed end-to-end pre-training strategy successfully boosts the performance of various segmentation models, i.e., PSPNet-ResNet50, DeepLabV3-ResNet50, and OCRNet-HRNetW18, on a number of segmentation tasks, such as CamVid, VOC-A, VOC-C, ADE20K, and CityScapes, with significant improvements. The source code is availabel at https://github.com/PaddlePaddle/PaddleSeg.  ( 3 min )
    Market Making with Scaled Beta Policies. (arXiv:2207.03352v1 [q-fin.TR])
    This paper introduces a new representation for the actions of a market maker in an order-driven market. This representation uses scaled beta distributions, and generalises three approaches taken in the artificial intelligence for market making literature: single price-level selection, ladder strategies and "market making at the touch". Ladder strategies place uniform volume across an interval of contiguous prices. Scaled beta distribution based policies generalise these, allowing volume to be skewed across the price interval. We demonstrate that this flexibility is useful for inventory management, one of the key challenges faced by a market maker. In this paper, we conduct three main experiments: first, we compare our more flexible beta-based actions with the special case of ladder strategies; then, we investigate the performance of simple fixed distributions; and finally, we devise and evaluate a simple and intuitive dynamic control policy that adjusts actions in a continuous manner depending on the signed inventory that the market maker has acquired. All empirical evaluations use a high-fidelity limit order book simulator based on historical data with 50 levels on each side.  ( 2 min )
    VecGAN: Image-to-Image Translation with Interpretable Latent Directions. (arXiv:2207.03411v1 [cs.CV])
    We propose VecGAN, an image-to-image translation framework for facial attribute editing with interpretable latent directions. Facial attribute editing task faces the challenges of precise attribute editing with controllable strength and preservation of the other attributes of an image. For this goal, we design the attribute editing by latent space factorization and for each attribute, we learn a linear direction that is orthogonal to the others. The other component is the controllable strength of the change, a scalar value. In our framework, this scalar can be either sampled or encoded from a reference image by projection. Our work is inspired by the latent space factorization works of fixed pretrained GANs. However, while those models cannot be trained end-to-end and struggle to edit encoded images precisely, VecGAN is end-to-end trained for image translation task and successful at editing an attribute while preserving the others. Our extensive experiments show that VecGAN achieves significant improvements over state-of-the-arts for both local and global edits.  ( 2 min )
    Calibrate to Interpret. (arXiv:2207.03324v1 [cs.LG])
    Trustworthy machine learning is driving a large number of ML community works in order to improve ML acceptance and adoption. The main aspect of trustworthy machine learning are the followings: fairness, uncertainty, robustness, explainability and formal guaranties. Each of these individual domains gains the ML community interest, visible by the number of related publications. However few works tackle the interconnection between these fields. In this paper we show a first link between uncertainty and explainability, by studying the relation between calibration and interpretation. As the calibration of a given model changes the way it scores samples, and interpretation approaches often rely on these scores, it seems safe to assume that the confidence-calibration of a model interacts with our ability to interpret such model. In this paper, we show, in the context of networks trained on image classification tasks, to what extent interpretations are sensitive to confidence-calibration. It leads us to suggest a simple practice to improve the interpretation outcomes: Calibrate to Interpret.  ( 2 min )
    Learning the Quality of Machine Permutations in Job Shop Scheduling. (arXiv:2207.03244v1 [cs.LG])
    In recent years, the power demonstrated by Machine Learning (ML) has increasingly attracted the interest of the optimization community that is starting to leverage ML for enhancing and automating the design of optimal and approximate algorithms. One combinatorial optimization problem that has been tackled with ML is the Job Shop scheduling Problem (JSP). Most of the recent works focusing on the JSP and ML are based on Deep Reinforcement Learning (DRL), and only a few of them leverage supervised learning techniques. The recurrent reasons for avoiding supervised learning seem to be the difficulty in casting the right learning task, i.e., what is meaningful to predict, and how to obtain labels. Therefore, we first propose a novel supervised learning task that aims at predicting the quality of machine permutations. Then, we design an original methodology to estimate this quality that allows to create an accurate sequential deep learning model (binary accuracy above 95%). Finally, we empirically demonstrate the value of predicting the quality of machine permutations by enhancing the performance of a simple Tabu Search algorithm inspired by the works in the literature.  ( 2 min )
    A Solver + Gradient Descent Training Algorithm for Deep Neural Networks. (arXiv:2207.03264v1 [cs.LG])
    We present a novel hybrid algorithm for training Deep Neural Networks that combines the state-of-the-art Gradient Descent (GD) method with a Mixed Integer Linear Programming (MILP) solver, outperforming GD and variants in terms of accuracy, as well as resource and data efficiency for both regression and classification tasks. Our GD+Solver hybrid algorithm, called GDSolver, works as follows: given a DNN $D$ as input, GDSolver invokes GD to partially train $D$ until it gets stuck in a local minima, at which point GDSolver invokes an MILP solver to exhaustively search a region of the loss landscape around the weight assignments of $D$'s final layer parameters with the goal of tunnelling through and escaping the local minima. The process is repeated until desired accuracy is achieved. In our experiments, we find that GDSolver not only scales well to additional data and very large model sizes, but also outperforms all other competing methods in terms of rates of convergence and data efficiency. For regression tasks, GDSolver produced models that, on average, had 31.5% lower MSE in 48% less time, and for classification tasks on MNIST and CIFAR10, GDSolver was able to achieve the highest accuracy over all competing methods, using only 50% of the training data that GD baselines required.  ( 2 min )
    Not All Models Are Equal: Predicting Model Transferability in a Self-challenging Fisher Space. (arXiv:2207.03036v1 [cs.LG])
    This paper addresses an important problem of ranking the pre-trained deep neural networks and screening the most transferable ones for downstream tasks. It is challenging because the ground-truth model ranking for each task can only be generated by fine-tuning the pre-trained models on the target dataset, which is brute-force and computationally expensive. Recent advanced methods proposed several lightweight transferability metrics to predict the fine-tuning results. However, these approaches only capture static representations but neglect the fine-tuning dynamics. To this end, this paper proposes a new transferability metric, called \textbf{S}elf-challenging \textbf{F}isher \textbf{D}iscriminant \textbf{A}nalysis (\textbf{SFDA}), which has many appealing benefits that existing works do not have. First, SFDA can embed the static features into a Fisher space and refine them for better separability between classes. Second, SFDA uses a self-challenging mechanism to encourage different pre-trained models to differentiate on hard examples. Third, SFDA can easily select multiple pre-trained models for the model ensemble. Extensive experiments on $33$ pre-trained models of $11$ downstream tasks show that SFDA is efficient, effective, and robust when measuring the transferability of pre-trained models. For instance, compared with the state-of-the-art method NLEEP, SFDA demonstrates an average of $59.1$\% gain while bringing $22.5$x speedup in wall-clock time. The code will be available at \url{https://github.com/TencentARC/SFDA}.  ( 3 min )
    Robust optimal well control using an adaptive multi-grid reinforcement learning framework. (arXiv:2207.03253v1 [cs.LG])
    Reinforcement learning (RL) is a promising tool to solve robust optimal well control problems where the model parameters are highly uncertain, and the system is partially observable in practice. However, RL of robust control policies often relies on performing a large number of simulations. This could easily become computationally intractable for cases with computationally intensive simulations. To address this bottleneck, an adaptive multi-grid RL framework is introduced which is inspired by principles of geometric multi-grid methods used in iterative numerical algorithms. RL control policies are initially learned using computationally efficient low fidelity simulations using coarse grid discretization of the underlying partial differential equations (PDEs). Subsequently, the simulation fidelity is increased in an adaptive manner towards the highest fidelity simulation that correspond to finest discretization of the model domain. The proposed framework is demonstrated using a state-of-the-art, model-free policy-based RL algorithm, namely the Proximal Policy Optimisation (PPO) algorithm. Results are shown for two case studies of robust optimal well control problems which are inspired from SPE-10 model 2 benchmark case studies. Prominent gains in the computational efficiency is observed using the proposed framework saving around 60-70% of computational cost of its single fine-grid counterpart.  ( 2 min )
    Revisiting Pretraining Objectives for Tabular Deep Learning. (arXiv:2207.03208v1 [cs.LG])
    Recent deep learning models for tabular data currently compete with the traditional ML models based on decision trees (GBDT). Unlike GBDT, deep models can additionally benefit from pretraining, which is a workhorse of DL for vision and NLP. For tabular problems, several pretraining methods were proposed, but it is not entirely clear if pretraining provides consistent noticeable improvements and what method should be used, since the methods are often not compared to each other or comparison is limited to the simplest MLP architectures. In this work, we aim to identify the best practices to pretrain tabular DL models that can be universally applied to different datasets and architectures. Among our findings, we show that using the object target labels during the pretraining stage is beneficial for the downstream performance and advocate several target-aware pretraining objectives. Overall, our experiments demonstrate that properly performed pretraining significantly increases the performance of tabular DL models, which often leads to their superiority over GBDTs.  ( 2 min )
    Factorizing Knowledge in Neural Networks. (arXiv:2207.03337v1 [cs.CV])
    In this paper, we explore a novel and ambitious knowledge-transfer task, termed Knowledge Factorization~(KF). The core idea of KF lies in the modularization and assemblability of knowledge: given a pretrained network model as input, KF aims to decompose it into several factor networks, each of which handles only a dedicated task and maintains task-specific knowledge factorized from the source network. Such factor networks are task-wise disentangled and can be directly assembled, without any fine-tuning, to produce the more competent combined-task networks. In other words, the factor networks serve as Lego-brick-like building blocks, allowing us to construct customized networks in a plug-and-play manner. Specifically, each factor network comprises two modules, a common-knowledge module that is task-agnostic and shared by all factor networks, alongside with a task-specific module dedicated to the factor network itself. We introduce an information-theoretic objective, InfoMax-Bottleneck~(IMB), to carry out KF by optimizing the mutual information between the learned representations and input. Experiments across various benchmarks demonstrate that, the derived factor networks yield gratifying performances on not only the dedicated tasks but also disentanglement, while enjoying much better interpretability and modularity. Moreover, the learned common-knowledge representations give rise to impressive results on transfer learning.  ( 2 min )
    Vessel-following model for inland waterways based on deep reinforcement learning. (arXiv:2207.03257v1 [cs.CE])
    While deep reinforcement learning (RL) has been increasingly applied in designing car-following models in the last years, this study aims at investigating the feasibility of RL-based vehicle-following for complex vehicle dynamics and strong environmental disturbances. As a use case, we developed an inland waterways vessel-following model based on realistic vessel dynamics, which considers environmental influences, such as varying stream velocity and river profile. We extracted natural vessel behavior from anonymized AIS data to formulate a reward function that reflects a realistic driving style next to comfortable and safe navigation. Aiming at high generalization capabilities, we propose an RL training environment that uses stochastic processes to model leading trajectory and river dynamics. To validate the trained model, we defined different scenarios that have not been seen in training, including realistic vessel-following on the Middle Rhine. Our model demonstrated safe and comfortable driving in all scenarios, proving excellent generalization abilities. Furthermore, traffic oscillations could effectively be dampened by deploying the trained model on a sequence of following vessels.  ( 2 min )
    Pre-training helps Bayesian optimization too. (arXiv:2207.03084v1 [cs.LG])
    Bayesian optimization (BO) has become a popular strategy for global optimization of many expensive real-world functions. Contrary to a common belief that BO is suited to optimizing black-box functions, it actually requires domain knowledge on characteristics of those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process priors that specify initial beliefs on functions. However, even with expert knowledge, it is not an easy task to select a prior. This is especially true for hyperparameter tuning problems on complex machine learning models, where landscapes of tuning objectives are often difficult to comprehend. We seek an alternative practice for setting these functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. To verify our approach in realistic model training setups, we collected a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, our method is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods.  ( 3 min )
    Attention Round for Post-Training Quantization. (arXiv:2207.03088v1 [cs.LG])
    At present, the quantification methods of neural network models are mainly divided into post-training quantization (PTQ) and quantization aware training (QAT). Post-training quantization only need a small part of the data to complete the quantification process, but the performance of its quantitative model is not as good as the quantization aware training. This paper presents a novel quantification method called Attention Round. This method gives parameters w the opportunity to be mapped to all possible quantized values, rather than just the two quantized values nearby w in the process of quantization. The probability of being mapped to different quantified values is negatively correlated with the distance between the quantified values and w, and decay with a Gaussian function. In addition, this paper uses the lossy coding length as a measure to assign bit widths to the different layers of the model to solve the problem of mixed precision quantization, which effectively avoids to solve combinatorial optimization problem. This paper also performs quantitative experiments on different models, the results confirm the effectiveness of the proposed method. For ResNet18 and MobileNetV2, the post-training quantization proposed in this paper only require 1,024 training data and 10 minutes to complete the quantization process, which can achieve quantization performance on par with quantization aware training.  ( 3 min )
    A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits. (arXiv:2207.03106v1 [cs.LG])
    We study federated contextual linear bandits, where $M$ agents cooperate with each other to solve a global contextual linear bandit problem with the help of a central server. We consider the asynchronous setting, where all agents work independently and the communication between one agent and the server will not trigger other agents' communication. We propose a simple algorithm named \texttt{FedLinUCB} based on the principle of optimism. We prove that the regret of \texttt{FedLinUCB} is bounded by $\tilde{O}(d\sqrt{\sum_{m=1}^M T_m})$ and the communication complexity is $\tilde{O}(dM^2)$, where $d$ is the dimension of the contextual vector and $T_m$ is the total number of interactions with the environment by $m$-th agent. To the best of our knowledge, this is the first provably efficient algorithm that allows fully asynchronous communication for federated contextual linear bandits, while achieving the same regret guarantee as in the single-agent setting.  ( 2 min )
    Learning Invariant World State Representations with Predictive Coding. (arXiv:2207.02972v1 [cs.LG])
    Self-supervised learning methods overcome the key bottleneck for building more capable AI: limited availability of labeled data. However, one of the drawbacks of self-supervised architectures is that the representations that they learn are implicit and it is hard to extract meaningful information about the encoded world states, such as 3D structure of the visual scene encoded in a depth map. Moreover, in the visual domain such representations only rarely undergo evaluations that may be critical for downstream tasks, such as vision for autonomous cars. Herein, we propose a framework for evaluating visual representations for illumination invariance in the context of depth perception. We develop a new predictive coding-based architecture and a hybrid fully-supervised/self-supervised learning method. We propose a novel architecture that extends the predictive coding approach: PRedictive Lateral bottom-Up and top-Down Encoder-decoder Network (PreludeNet), which explicitly learns to infer and predict depth from video frames. In PreludeNet, the encoder's stack of predictive coding layers is trained in a self-supervised manner, while the predictive decoder is trained in a supervised manner to infer or predict the depth. We evaluate the robustness of our model on a new synthetic dataset, in which lighting conditions (such as overall illumination, and effect of shadows) can be be parametrically adjusted while keeping all other aspects of the world constant. PreludeNet achieves both competitive depth inference performance and next frame prediction accuracy. We also show how this new network architecture, coupled with the hybrid fully-supervised/self-supervised learning method, achieves balance between the said performance and invariance to changes in lighting. The proposed framework for evaluating visual representations can be extended to diverse task domains and invariance tests.  ( 3 min )
    Context-aware Self-supervised Learning for Medical Images Using Graph Neural Network. (arXiv:2207.02957v1 [eess.IV])
    Although self-supervised learning enables us to bootstrap the training by exploiting unlabeled data, the generic self-supervised methods for natural images do not sufficiently incorporate the context. For medical images, a desirable method should be sensitive enough to detect deviation from normal-appearing tissue of each anatomical region; here, anatomy is the context. We introduce a novel approach with two levels of self-supervised representation learning objectives: one on the regional anatomical level and another on the patient-level. We use graph neural networks to incorporate the relationship between different anatomical regions. The structure of the graph is informed by anatomical correspondences between each patient and an anatomical atlas. In addition, the graph representation has the advantage of handling any arbitrarily sized image in full resolution. Experiments on large-scale Computer Tomography (CT) datasets of lung images show that our approach compares favorably to baseline methods that do not account for the context. We use the learned embedding for staging lung tissue abnormalities related to COVID-19.  ( 3 min )
    Model Agnostic Conformal Hyperparameter Optimization. (arXiv:2207.03017v1 [cs.LG])
    Several novel frameworks for hyperparameter search have emerged in the last decade, but most rely on strict, often normal, distributional assumptions, limiting search model flexibility. This paper proposes a novel optimization framework based on Conformal prediction, assuming only exchangeability, and allowing for a larger choice of search model architectures and variance estimators. Several such models are explored and benchmarked against random hyperparameter search on both dense and convolutional neural networks with consistent overperformance both in final loss achieved and time to achievement.  ( 2 min )
    A State Transition Model for Mobile Notifications via Survival Analysis. (arXiv:2207.03099v1 [stat.ML])
    Mobile notifications have become a major communication channel for social networking services to keep users informed and engaged. As more mobile applications push notifications to users, they constantly face decisions on what to send, when and how. A lack of research and methodology commonly leads to heuristic decision making. Many notifications arrive at an inappropriate moment or introduce too many interruptions, failing to provide value to users and spurring users' complaints. In this paper we explore unique features of interactions between mobile notifications and user engagement. We propose a state transition framework to quantitatively evaluate the effectiveness of notifications. Within this framework, we develop a survival model for badging notifications assuming a log-linear structure and a Weibull distribution. Our results show that this model achieves more flexibility for applications and superior prediction accuracy than a logistic regression model. In particular, we provide an online use case on notification delivery time optimization to show how we make better decisions, drive more user engagement, and provide more value to users.  ( 2 min )
    Interactive Combinatorial Bandits: Balancing Competitivity and Complementarity. (arXiv:2207.03091v1 [cs.LG])
    We study non-modular function maximization in the online interactive bandit setting. We are motivated by applications where there is a natural complementarity between certain elements: e.g., in a movie recommendation system, watching the first movie in a series complements the experience of watching a second (and a third, etc.). This is not expressible using only submodular functions which can represent only competitiveness between elements. We extend the purely submodular approach in two ways. First, we assume that the objective can be decomposed into the sum of monotone suBmodular and suPermodular function, known as a BP objective. Here, complementarity is naturally modeled by the supermodular component. We develop a UCB-style algorithm, where at each round a noisy gain is revealed after an action is taken that balances refining beliefs about the unknown objectives (exploration) and choosing actions that appear promising (exploitation). Defining regret in terms of submodular and supermodular curvature with respect to a full-knowledge greedy baseline, we show that this algorithm achieves at most $O(\sqrt{T})$ regret after $T$ rounds of play. Second, for those functions that do not admit a BP structure, we provide analogous regret guarantees in terms of their submodularity ratio; this is applicable for functions that are almost, but not quite, submodular. We numerically study the tasks of movie recommendation on the MovieLens dataset, and selection of training subsets for classification. Through these examples, we demonstrate the algorithm's performance as well as the shortcomings of viewing these problems as being solely submodular.  ( 3 min )
    A conditional gradient homotopy method with applications to Semidefinite Programming. (arXiv:2207.03101v1 [math.OC])
    We propose a new homotopy-based conditional gradient method for solving convex optimization problems with a large number of simple conic constraints. Instances of this template naturally appear in semidefinite programming problems arising as convex relaxations of combinatorial optimization problems. Our method is a double-loop algorithm in which the conic constraint is treated via a self-concordant barrier, and the inner loop employs a conditional gradient algorithm to approximate the analytic central path, while the outer loop updates the accuracy imposed on the temporal solution and the homotopy parameter. Our theoretical iteration complexity is competitive when confronted to state-of-the-art SDP solvers, with the decisive advantage of cheap projection-free subroutines. Preliminary numerical experiments are provided for illustrating the practical performance of the method.  ( 2 min )
    Quantum compression with classically simulatable circuits. (arXiv:2207.02961v1 [quant-ph])
    As we continue to find applications where the currently available noisy devices exhibit an advantage over their classical counterparts, the efficient use of quantum resources is highly desirable. The notion of quantum autoencoders was proposed as a way for the compression of quantum information to reduce resource requirements. Here, we present a strategy to design quantum autoencoders using evolutionary algorithms for transforming quantum information into lower-dimensional representations. We successfully demonstrate the initial applications of the algorithm for compressing different families of quantum states. In particular, we point out that using a restricted gate set in the algorithm allows for efficient simulation of the generated circuits. This approach opens the possibility of using classical logic to find low representations of quantum data, using fewer computational resources.  ( 2 min )
    The "Collections as ML Data" Checklist for Machine Learning & Cultural Heritage. (arXiv:2207.02960v1 [cs.LG])
    Within the cultural heritage sector, there has been a growing and concerted effort to consider a critical sociotechnical lens when applying machine learning techniques to digital collections. Though the cultural heritage community has collectively developed an emerging body of work detailing responsible operations for machine learning in libraries and other cultural heritage institutions at the organizational level, there remains a paucity of guidelines created specifically for practitioners embarking on machine learning projects. The manifold stakes and sensitivities involved in applying machine learning to cultural heritage underscore the importance of developing such guidelines. This paper contributes to this need by formulating a detailed checklist with guiding questions and practices that can be employed while developing a machine learning project that utilizes cultural heritage data. I call the resulting checklist the "Collections as ML Data" checklist, which, when completed, can be published with the deliverables of the project. By surveying existing projects, including my own project, Newspaper Navigator, I justify the "Collections as ML Data" checklist and demonstrate how the formulated guiding questions can be employed and operationalized.  ( 2 min )
    Self-Supervised RF Signal Representation Learning for NextG Signal Classification with Deep Learning. (arXiv:2207.03046v1 [cs.NI])
    Deep learning (DL) finds rich applications in the wireless domain to improve spectrum awareness. Typically, the DL models are either randomly initialized following a statistical distribution or pretrained on tasks from other data domains such as computer vision (in the form of transfer learning) without accounting for the unique characteristics of wireless signals. Self-supervised learning enables the learning of useful representations from Radio Frequency (RF) signals themselves even when only limited training data samples with labels are available. We present the first self-supervised RF signal representation learning model and apply it to the automatic modulation recognition (AMR) task by specifically formulating a set of transformations to capture the wireless signal characteristics. We show that the sample efficiency (the number of labeled samples required to achieve a certain accuracy performance) of AMR can be significantly increased (almost an order of magnitude) by learning signal representations with self-supervised learning. This translates to substantial time and cost savings. Furthermore, self-supervised learning increases the model accuracy compared to the state-of-the-art DL methods and maintains high accuracy even when a small set of training data samples is used.  ( 2 min )
    The Union of Manifolds Hypothesis and its Implications for Deep Generative Modelling. (arXiv:2207.02862v1 [stat.ML])
    Deep learning has had tremendous success at learning low-dimensional representations of high-dimensional data. This success would be impossible if there was no hidden low-dimensional structure in data of interest; this existence is posited by the manifold hypothesis, which states that the data lies on an unknown manifold of low intrinsic dimension. In this paper, we argue that this hypothesis does not properly capture the low-dimensional structure typically present in data. Assuming the data lies on a single manifold implies intrinsic dimension is identical across the entire data space, and does not allow for subregions of this space to have a different number of factors of variation. To address this deficiency, we put forth the union of manifolds hypothesis, which accommodates the existence of non-constant intrinsic dimensions. We empirically verify this hypothesis on commonly-used image datasets, finding that indeed, intrinsic dimension should be allowed to vary. We also show that classes with higher intrinsic dimensions are harder to classify, and how this insight can be used to improve classification accuracy. We then turn our attention to the impact of this hypothesis in the context of deep generative models (DGMs). Most current DGMs struggle to model datasets with several connected components and/or varying intrinsic dimensions. To tackle these shortcomings, we propose clustered DGMs, where we first cluster the data and then train a DGM on each cluster. We show that clustered DGMs can model multiple connected components with different intrinsic dimensions, and empirically outperform their non-clustered counterparts without increasing computational requirements.  ( 3 min )
    Humans Social Relationship Classification during Accompaniment. (arXiv:2207.02890v1 [cs.LG])
    This paper presents the design of deep learning architectures which allow to classify the social relationship existing between two people who are walking in a side-by-side formation into four possible categories --colleagues, couple, family or friendship. The models are developed using Neural Networks or Recurrent Neural Networks to achieve the classification and are trained and evaluated using a database of readings obtained from humans performing an accompaniment process in an urban environment. The best achieved model accomplishes a relatively good accuracy in the classification problem and its results enhance partially the outcomes from a previous study [1]. Furthermore, the model proposed shows its future potential to improve its efficiency and to be implemented in a real robot.  ( 2 min )
    Towards Substantive conceptions of Algorithmic Fairness: Normative guidance from Equal Opportunity doctrines. (arXiv:2207.02912v1 [cs.CY])
    In this work we use Equal Oppportunity (EO) doctrines from political philosophy to make explicit the normative judgements embedded in different conceptions of algorithmic fairness. We contrast formal EO approaches that narrowly focus on fair contests at discrete decision points, with substantive EO doctrines that look at people's fair life chances more holistically over the course of a lifetime. We use this taxonomy to provide a moral interpretation of the impossibility results as the incompatibility between different conceptions of a fair contest -- foward-looking versus backward-looking -- when people do not have fair life chances. We use this result to motivate substantive conceptions of algorithmic fairness and outline two plausible procedures based on the luck-egalitarian doctrine of EO, and Rawls's principle of fair equality of opportunity.  ( 2 min )
    Scoring Rules for Performative Binary Prediction. (arXiv:2207.02847v1 [cs.LG])
    We construct a model of expert prediction where predictions can influence the state of the world. Under this model, we show through theoretical and numerical results that proper scoring rules can incentivize experts to manipulate the world with their predictions. We also construct a simple class of scoring rules that avoids this problem.  ( 2 min )
    Local Sample-weighted Multiple Kernel Clustering with Consensus Discriminative Graph. (arXiv:2207.02846v1 [cs.LG])
    Multiple kernel clustering (MKC) is committed to achieving optimal information fusion from a set of base kernels. Constructing precise and local kernel matrices is proved to be of vital significance in applications since the unreliable distant-distance similarity estimation would degrade clustering per-formance. Although existing localized MKC algorithms exhibit improved performance compared to globally-designed competi-tors, most of them widely adopt KNN mechanism to localize kernel matrix by accounting for {\tau} -nearest neighbors. However, such a coarse manner follows an unreasonable strategy that the ranking importance of different neighbors is equal, which is impractical in applications. To alleviate such problems, this paper proposes a novel local sample-weighted multiple kernel clustering (LSWMKC) model. We first construct a consensus discriminative affinity graph in kernel space, revealing the latent local structures. Further, an optimal neighborhood kernel for the learned affinity graph is output with naturally sparse property and clear block diagonal structure. Moreover, LSWMKC im-plicitly optimizes adaptive weights on different neighbors with corresponding samples. Experimental results demonstrate that our LSWMKC possesses better local manifold representation and outperforms existing kernel or graph-based clustering algo-rithms. The source code of LSWMKC can be publicly accessed from https://github.com/liliangnudt/LSWMKC.  ( 2 min )
  • Open

    Some performance considerations when using multi-armed bandit algorithms in the presence of missing data. (arXiv:2205.03820v2 [stat.ML] UPDATED)
    When comparing the performance of multi-armed bandit algorithms, the potential impact of missing data is often overlooked. In practice, it also affects their implementation where the simplest approach to overcome this is to continue to sample according to the original bandit algorithm, ignoring missing outcomes. We investigate the impact on performance of this approach to deal with missing data for several bandit algorithms through an extensive simulation study assuming the rewards are missing at random. We focus on two-armed bandit algorithms with binary outcomes in the context of patient allocation for clinical trials with relatively small sample sizes. However, our results apply to other applications of bandit algorithms where missing data is expected to occur. We assess the resulting operating characteristics, including the expected reward. Different probabilities of missingness in both arms are considered. The key finding of our work is that when using the simplest strategy of ignoring missing data, the impact on the expected performance of multi-armed bandit strategies varies according to the way these strategies balance the exploration-exploitation trade-off. Algorithms that are geared towards exploration continue to assign samples to the arm with more missing responses (which being perceived as the arm with less observed information is deemed more appealing by the algorithm than it would otherwise be). In contrast, algorithms that are geared towards exploitation would rapidly assign a high value to samples from the arms with a current high mean irrespective of the level observations per arm. Furthermore, for algorithms focusing more on exploration, we illustrate that the problem of missing responses can be alleviated using a simple mean imputation approach.
    Neural Stein critics with staged $L^2$-regularization. (arXiv:2207.03406v1 [stat.ML])
    Learning to differentiate model distributions from observed data is a fundamental problem in statistics and machine learning, and high-dimensional data remains a challenging setting for such problems. Metrics that quantify the disparity in probability distributions, such as the Stein discrepancy, play an important role in statistical testing in high dimensions. In this paper, we consider the setting where one wishes to distinguish between data sampled from an unknown probability distribution and a nominal model distribution. While recent studies revealed that the optimal $L^2$-regularized Stein critic equals the difference of the score functions of two probability distributions up to a multiplicative constant, we investigate the role of $L^2$ regularization when training a neural network Stein discrepancy critic function. Motivated by the Neural Tangent Kernel theory of training neural networks, we develop a novel staging procedure for the weight of regularization over training time. This leverages the advantages of highly-regularized training at early times while also empirically delaying overfitting. Theoretically, we relate the training dynamic with large regularization weight to the kernel regression optimization of "lazy training" regime in early training times. The benefit of the staged $L^2$ regularization is demonstrated on simulated high dimensional distribution drift data and an application to evaluating generative models of image data.
    Reward is enough for convex MDPs. (arXiv:2106.00661v3 [cs.AI] UPDATED)
    Maximising a cumulative reward function that is Markov and stationary, i.e., defined over state-action pairs and independent of time, is sufficient to capture many kinds of goals in a Markov decision process (MDP). However, not all goals can be captured in this manner. In this paper we study convex MDPs in which goals are expressed as convex functions of the stationary distribution and show that they cannot be formulated using stationary reward functions. Convex MDPs generalize the standard reinforcement learning (RL) problem formulation to a larger framework that includes many supervised and unsupervised RL problems, such as apprenticeship learning, constrained MDPs, and so-called `pure exploration'. Our approach is to reformulate the convex MDP problem as a min-max game involving policy and cost (negative reward) `players', using Fenchel duality. We propose a meta-algorithm for solving this problem and show that it unifies many existing algorithms in the literature.
    Federated Robustness Propagation: Sharing Robustness in Heterogeneous Federated Learning. (arXiv:2106.10196v2 [cs.LG] UPDATED)
    Federated learning (FL) emerges as a popular distributed learning schema that learns a model from a set of participating users without sharing raw data. One major challenge of FL comes with heterogeneous users, who may have distributionally different (or non-iid) data and varying computation resources. As federated users would use the model for prediction, they often demand the trained model to be robust against malicious attackers at test time. Whereas adversarial training (AT) provides a sound solution for centralized learning, extending its usage for federated users has imposed significant challenges, as many users may have very limited training data and tight computational budgets, to afford the data-hungry and costly AT. In this paper, we study a novel FL strategy: propagating adversarial robustness from rich-resource users that can afford AT, to those with poor resources that cannot afford it, during federated learning. We show that existing FL techniques cannot be effectively integrated with the strategy to propagate robustness among non-iid users and propose an efficient propagation approach by the proper use of batch-normalization. We demonstrate the rationality and effectiveness of our method through extensive experiments. Especially, the proposed method is shown to grant federated models remarkable robustness even when only a small portion of users afford AT during learning. Source code will be released.
    Interpretable Deep Causal Learning for Moderation Effects. (arXiv:2206.10261v2 [cs.LG] UPDATED)
    In this extended abstract paper, we address the problem of interpretability and targeted regularization in causal machine learning models. In particular, we focus on the problem of estimating individual causal/treatment effects under observed confounders, which can be controlled for and moderate the effect of the treatment on the outcome of interest. Black-box ML models adjusted for the causal setting perform generally well in this task, but they lack interpretable output identifying the main drivers of treatment heterogeneity and their functional relationship. We propose a novel deep counterfactual learning architecture for estimating individual treatment effects that can simultaneously: i) convey targeted regularization on, and produce quantify uncertainty around the quantity of interest (i.e., the Conditional Average Treatment Effect); ii) disentangle baseline prognostic and moderating effects of the covariates and output interpretable score functions describing their relationship with the outcome. Finally, we demonstrate the use of the method via a simple simulated experiment.
    A Mutually Exciting Latent Space Hawkes Process Model for Continuous-time Networks. (arXiv:2205.09263v2 [cs.LG] UPDATED)
    Networks and temporal point processes serve as fundamental building blocks for modeling complex dynamic relational data in various domains. We propose the latent space Hawkes (LSH) model, a novel generative model for continuous-time networks of relational events, using a latent space representation for nodes. We model relational events between nodes using mutually exciting Hawkes processes with baseline intensities dependent upon the distances between the nodes in the latent space and sender and receiver specific effects. We demonstrate that our proposed LSH model can replicate many features observed in real temporal networks including reciprocity and transitivity, while also achieving superior prediction accuracy and providing more interpretable fits than existing models.
    Learning Interpretable Models Using an Oracle. (arXiv:1906.06852v4 [cs.LG] UPDATED)
    We look at a specific aspect of model interpretability: models often need to be constrained in size for them to be considered interpretable, e.g., a decision tree of depth 5 is easier to interpret than one of depth 50. But smaller models also tend to have high bias. This suggests a trade-off between interpretability and accuracy. We propose a model agnostic technique to minimize this trade-off. Our strategy is to first learn an oracle, a highly accurate probabilistic model on the training data. The uncertainty in the oracle's predictions are used to learn a sampling distribution for the training data. The interpretable model is then trained on a data sample obtained using this distribution, leading often to significantly greater accuracy. We formulate the sampling strategy as an optimization problem. Our solution1 possesses the following key favorable properties: (1) it uses a fixed number of seven optimization variables, irrespective of the dimensionality of the data (2) it is model agnostic - in that both the interpretable model and the oracle may belong to arbitrary model families (3) it has a flexible notion of model size, and can accommodate vector sizes (4) it is a framework, enabling it to benefit from progress in the area of optimization. We also present the following interesting observations: (a) In general, the optimal training distribution at small model sizes is different from the test distribution; (b) This effect exists even when the interpretable model and the oracle are from highly disparate model families: we show this on a text classification task, by using a Gated Recurrent Unit network as an oracle to improve the sequence classification accuracy of a Decision Tree that uses character n-grams; (c) Our technique may be used to identify an optimal training sample of a given sample size, for a model.
    Variational Nearest Neighbor Gaussian Process. (arXiv:2202.01694v3 [cs.LG] UPDATED)
    Variational approximations to Gaussian processes (GPs) typically use a small set of inducing points to form a low-rank approximation to the covariance matrix. In this work, we instead exploit a sparse approximation of the precision matrix. We propose variational nearest neighbor Gaussian process (VNNGP), which introduces a prior that only retains correlations within K nearest-neighboring observations, thereby inducing sparse precision structure. Using the variational framework, VNNGP's objective can be factorized over both observations and inducing points, enabling stochastic optimization with a time complexity of O($K^3$). Hence, we can arbitrarily scale the inducing point size, even to the point of putting inducing points at every observed location. We compare VNNGP to other scalable GPs through various experiments, and demonstrate that VNNGP (1) can dramatically outperform low-rank methods, and (2) is less prone to overfitting than other nearest neighbor methods.  ( 2 min )
    The Multivariate Community Hawkes Model for Dependent Relational Events in Continuous-time Networks. (arXiv:2205.00639v2 [stat.ME] UPDATED)
    The stochastic block model (SBM) is one of the most widely used generative models for network data. Many continuous-time dynamic network models are built upon the same assumption as the SBM: edges or events between all pairs of nodes are conditionally independent given the block or community memberships, which prevents them from reproducing higher-order motifs such as triangles that are commonly observed in real networks. We propose the multivariate community Hawkes (MULCH) model, an extremely flexible community-based model for continuous-time networks that introduces dependence between node pairs using structured multivariate Hawkes processes. We fit the model using a spectral clustering and likelihood-based local refinement procedure. We find that our proposed MULCH model is far more accurate than existing models both for predictive and generative tasks.  ( 2 min )
    Exact Matching of Random Graphs with Constant Correlation. (arXiv:2110.05000v2 [math.ST] UPDATED)
    This paper deals with the problem of graph matching or network alignment for Erd\H{o}s--R\'enyi graphs, which can be viewed as a noisy average-case version of the graph isomorphism problem. Let $G$ and $G'$ be $G(n, p)$ Erd\H{o}s--R\'enyi graphs marginally, identified with their adjacency matrices. Assume that $G$ and $G'$ are correlated such that $\mathbb{E}[G_{ij} G'_{ij}] = p(1-\alpha)$. For a permutation $\pi$ representing a latent matching between the vertices of $G$ and $G'$, denote by $G^\pi$ the graph obtained from permuting the vertices of $G$ by $\pi$. Observing $G^\pi$ and $G'$, we aim to recover the matching $\pi$. In this work, we show that for every $\varepsilon \in (0,1]$, there is $n_0>0$ depending on $\varepsilon$ and absolute constants $\alpha_0, R > 0$ with the following property. Let $n \ge n_0$, $(1+\varepsilon) \log n \le np \le n^{\frac{1}{R \log \log n}}$, and $0 < \alpha < \min(\alpha_0,\varepsilon/4)$. There is a polynomial-time algorithm $F$ such that $\mathbb{P}\{F(G^\pi,G')=\pi\}=1-o(1)$. This is the first polynomial-time algorithm that recovers the exact matching between vertices of correlated Erd\H{o}s--R\'enyi graphs with constant correlation with high probability. The algorithm is based on comparison of partition trees associated with the graph vertices.
    A State Transition Model for Mobile Notifications via Survival Analysis. (arXiv:2207.03099v1 [stat.ML])
    Mobile notifications have become a major communication channel for social networking services to keep users informed and engaged. As more mobile applications push notifications to users, they constantly face decisions on what to send, when and how. A lack of research and methodology commonly leads to heuristic decision making. Many notifications arrive at an inappropriate moment or introduce too many interruptions, failing to provide value to users and spurring users' complaints. In this paper we explore unique features of interactions between mobile notifications and user engagement. We propose a state transition framework to quantitatively evaluate the effectiveness of notifications. Within this framework, we develop a survival model for badging notifications assuming a log-linear structure and a Weibull distribution. Our results show that this model achieves more flexibility for applications and superior prediction accuracy than a logistic regression model. In particular, we provide an online use case on notification delivery time optimization to show how we make better decisions, drive more user engagement, and provide more value to users.  ( 2 min )
    Binary Iterative Hard Thresholding Converges with Optimal Number of Measurements for 1-Bit Compressed Sensing. (arXiv:2207.03427v1 [cs.IT])
    Compressed sensing has been a very successful high-dimensional signal acquisition and recovery technique that relies on linear operations. However, the actual measurements of signals have to be quantized before storing or processing. 1(One)-bit compressed sensing is a heavily quantized version of compressed sensing, where each linear measurement of a signal is reduced to just one bit: the sign of the measurement. Once enough of such measurements are collected, the recovery problem in 1-bit compressed sensing aims to find the original signal with as much accuracy as possible. The recovery problem is related to the traditional "halfspace-learning" problem in learning theory. For recovery of sparse vectors, a popular reconstruction method from 1-bit measurements is the binary iterative hard thresholding (BIHT) algorithm. The algorithm is a simple projected sub-gradient descent method, and is known to converge well empirically, despite the nonconvexity of the problem. The convergence property of BIHT was not theoretically justified, except with an exorbitantly large number of measurements (i.e., a number of measurement greater than $\max\{k^{10}, 24^{48}, k^{3.5}/\epsilon\}$, where $k$ is the sparsity, $\epsilon$ denotes the approximation error, and even this expression hides other factors). In this paper we show that the BIHT algorithm converges with only $\tilde{O}(\frac{k}{\epsilon})$ measurements. Note that, this dependence on $k$ and $\epsilon$ is optimal for any recovery method in 1-bit compressed sensing. With this result, to the best of our knowledge, BIHT is the only practical and efficient (polynomial time) algorithm that requires the optimal number of measurements in all parameters (both $k$ and $\epsilon$). This is also an example of a gradient descent algorithm converging to the correct solution for a nonconvex problem, under suitable structural conditions.  ( 3 min )
    Quantum Advantage in Variational Bayes Inference. (arXiv:2207.03104v1 [stat.ML])
    Variational Bayes (VB) inference algorithm is used widely to estimate both the parameters and the unobserved hidden variables in generative statistical models. The algorithm -- inspired by variational methods used in computational physics -- is iterative and can get easily stuck in local minima, even when classical techniques, such as deterministic annealing (DA), are used. We study a variational Bayes (VB) inference algorithm based on a non-traditional quantum annealing approach -- referred to as quantum annealing variational Bayes (QAVB) inference -- and show that there is indeed a quantum advantage to QAVB over its classical counterparts. In particular, we show that such better performance is rooted in key concepts from quantum mechanics: (i) the ground state of the Hamiltonian of a quantum system -- defined from the given variational Bayes (VB) problem -- corresponds to an optimal solution for the minimization problem of the variational free energy at very low temperatures; (ii) such a ground state can be achieved by a technique paralleling the quantum annealing process; and (iii) starting from this ground state, the optimal solution to the VB problem can be achieved by increasing the heat bath temperature to unity, and thereby avoiding local minima introduced by spontaneous symmetry breaking observed in classical physics based VB algorithms. We also show that the update equations of QAVB can be potentially implemented using $\lceil \log K \rceil$ qubits and $\mathcal{O} (K)$ operations per step. Thus, QAVB can match the time complexity of existing VB algorithms, while delivering higher performance.  ( 3 min )
    Back to the Basics: Revisiting Out-of-Distribution Detection Baselines. (arXiv:2207.03061v1 [cs.LG])
    We study simple methods for out-of-distribution (OOD) image detection that are compatible with any already trained classifier, relying on only its predictions or learned representations. Evaluating the OOD detection performance of various methods when utilized with ResNet-50 and Swin Transformer models, we find methods that solely consider the model's predictions can be easily outperformed by also considering the learned representations. Based on our analysis, we advocate for a dead-simple approach that has been neglected in other studies: simply flag as OOD images whose average distance to their K nearest neighbors is large (in the representation space of an image classifier trained on the in-distribution data).  ( 2 min )
    Learning towards Robustness in Causally-Invariant Predictors. (arXiv:2107.01876v2 [stat.ML] UPDATED)
    We propose to learn an invariant causal predictor that is robust to distributional shifts, in the supervised regression scenario. Based on a disentangled causal factorization that describes the underlying data generating process, we attribute the distributional shifts to mutation of generating factors, which covers a wide range of cases of distributional shifts as we do not make prior specifications on the causal structure or the source of mutation. Under this causal framework, we identify a set of invariant predictors based on the do-operator. We provide a sufficient and necessary condition for a predictor to be min-max optimal, i.e., minimizes the worst-case quadratic loss among all domains. This condition is justifiable under the Markovian and faithfulness assumptions, thus inspiring a practical algorithm to identify the optimal predictor. For empirical estimation, we propose a permutation-regeneration scheme guided by a local causal discovery procedure. The utility and effectiveness of our method are demonstrated in simulation data and two real-world applications: Alzheimer's disease diagnosis and gene function prediction.  ( 2 min )
    SC2EGSet: StarCraft II Esport Replay and Game-state Dataset. (arXiv:2207.03428v1 [cs.LG])
    As a relatively new form of sport, esports offers unparalleled data availability. Despite the vast amounts of data that are generated by game engines, it can be challenging to extract them and verify their integrity for the purposes of practical and scientific use. Our work aims to open esports to a broader scientific community by supplying raw and pre-processed files from StarCraft II esports tournaments. These files can be used in statistical and machine learning modeling tasks and related to various laboratory-based measurements (e.g., behavioral tests, brain imaging). We have gathered publicly available game-engine generated "replays" of tournament matches and performed data extraction and cleanup using a low-level application programming interface (API) parser library. Additionally, we open-sourced and published all the custom tools that were developed in the process of creating our dataset. These tools include PyTorch and PyTorch Lightning API abstractions to load and model the data. Our dataset contains replays from major and premiere StarCraft II tournaments since 2016. To prepare the dataset, we processed 55 tournament "replaypacks" that contained 17930 files with game-state information. Based on initial investigation of available StarCraft II datasets, we observed that our dataset is the largest publicly available source of StarCraft II esports data upon its publication. Analysis of the extracted data holds promise for further Artificial Intelligence (AI), Machine Learning (ML), psychological, Human-Computer Interaction (HCI), and sports-related studies in a variety of supervised and self-supervised tasks.  ( 3 min )
    Sequential estimation of quantiles with applications to A/B-testing and best-arm identification. (arXiv:1906.09712v5 [math.ST] UPDATED)
    We propose confidence sequences -- sequences of confidence intervals which are valid uniformly over time -- for quantiles of any distribution over a complete, fully-ordered set, based on a stream of i.i.d. observations. We give methods both for tracking a fixed quantile and for tracking all quantiles simultaneously. Specifically, we provide explicit expressions with small constants for intervals whose widths shrink at the fastest possible $\sqrt{t^{-1} \log\log t}$ rate, along with a non-asymptotic concentration inequality for the empirical distribution function which holds uniformly over time with the same rate. The latter strengthens Smirnov's empirical process law of the iterated logarithm and extends the Dvoretzky-Kiefer-Wolfowitz inequality to hold uniformly over time. We give a new algorithm and sample complexity bound for selecting an arm with an approximately best quantile in a multi-armed bandit framework. In simulations, our method requires fewer samples than existing methods by a factor of five to fifty.  ( 3 min )
    Pre-training helps Bayesian optimization too. (arXiv:2207.03084v1 [cs.LG])
    Bayesian optimization (BO) has become a popular strategy for global optimization of many expensive real-world functions. Contrary to a common belief that BO is suited to optimizing black-box functions, it actually requires domain knowledge on characteristics of those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process priors that specify initial beliefs on functions. However, even with expert knowledge, it is not an easy task to select a prior. This is especially true for hyperparameter tuning problems on complex machine learning models, where landscapes of tuning objectives are often difficult to comprehend. We seek an alternative practice for setting these functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. To verify our approach in realistic model training setups, we collected a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, our method is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods.  ( 3 min )
    Multi-objective Optimization of Notifications Using Offline Reinforcement Learning. (arXiv:2207.03029v1 [cs.LG])
    Mobile notification systems play a major role in a variety of applications to communicate, send alerts and reminders to the users to inform them about news, events or messages. In this paper, we formulate the near-real-time notification decision problem as a Markov Decision Process where we optimize for multiple objectives in the rewards. We propose an end-to-end offline reinforcement learning framework to optimize sequential notification decisions. We address the challenge of offline learning using a Double Deep Q-network method based on Conservative Q-learning that mitigates the distributional shift problem and Q-value overestimation. We illustrate our fully-deployed system and demonstrate the performance and benefits of the proposed approach through both offline and online experiments.  ( 2 min )
    Lower Bounds on the Generalization Error of Nonlinear Learning Models. (arXiv:2103.14723v3 [stat.ML] UPDATED)
    We study in this paper lower bounds for the generalization error of models derived from multi-layer neural networks, in the regime where the size of the layers is commensurate with the number of samples in the training data. We show that unbiased estimators have unacceptable performance for such nonlinear networks in this regime. We derive explicit generalization lower bounds for general biased estimators, in the cases of linear regression and of two-layered networks. In the linear case the bound is asymptotically tight. In the nonlinear case, we provide a comparison of our bounds with an empirical study of the stochastic gradient descent algorithm. The analysis uses elements from the theory of large random matrices.  ( 2 min )
    Challenges and Pitfalls of Bayesian Unlearning. (arXiv:2207.03227v1 [cs.LG])
    Machine unlearning refers to the task of removing a subset of training data, thereby removing its contributions to a trained model. Approximate unlearning are one class of methods for this task which avoid the need to retrain the model from scratch on the retained data. Bayes' rule can be used to cast approximate unlearning as an inference problem where the objective is to obtain the updated posterior by dividing out the likelihood of deleted data. However this has its own set of challenges as one often doesn't have access to the exact posterior of the model parameters. In this work we examine the use of the Laplace approximation and Variational Inference to obtain the updated posterior. With a neural network trained for a regression task as the guiding example, we draw insights on the applicability of Bayesian unlearning in practical scenarios.  ( 2 min )
    On the Equivalence between Neural Network and Support Vector Machine. (arXiv:2111.06063v2 [stat.ML] UPDATED)
    Recent research shows that the dynamics of an infinitely wide neural network (NN) trained by gradient descent can be characterized by Neural Tangent Kernel (NTK) \citep{jacot2018neural}. Under the squared loss, the infinite-width NN trained by gradient descent with an infinitely small learning rate is equivalent to kernel regression with NTK \citep{arora2019exact}. However, the equivalence is only known for ridge regression currently \citep{arora2019harnessing}, while the equivalence between NN and other kernel machines (KMs), e.g. support vector machine (SVM), remains unknown. Therefore, in this work, we propose to establish the equivalence between NN and SVM, and specifically, the infinitely wide NN trained by soft margin loss and the standard soft margin SVM with NTK trained by subgradient descent. Our main theoretical results include establishing the equivalences between NNs and a broad family of $\ell_2$ regularized KMs with finite-width bounds, which cannot be handled by prior work, and showing that every finite-width NN trained by such regularized loss functions is approximately a KM. Furthermore, we demonstrate our theory can enable three practical applications, including (i) \textit{non-vacuous} generalization bound of NN via the corresponding KM; (ii) \textit{non-trivial} robustness certificate for the infinite-width NN (while existing robustness verification methods would provide vacuous bounds); (iii) intrinsically more robust infinite-width NNs than those from previous kernel regression. Our code for the experiments is available at \url{https://github.com/leslie-CH/equiv-nn-svm}.  ( 3 min )
    Pre-trained Gaussian processes for Bayesian optimization. (arXiv:2109.08215v4 [cs.LG] UPDATED)
    Bayesian optimization (BO) has become a popular strategy for global optimization of many expensive real-world functions. Contrary to a common belief that BO is suited to optimizing black-box functions, it actually requires domain knowledge on characteristics of those functions to deploy BO successfully. Such domain knowledge often manifests in Gaussian process priors that specify initial beliefs on functions. However, even with expert knowledge, it is not an easy task to select a prior. This is especially true for hyperparameter tuning problems on complex machine learning models, where landscapes of tuning objectives are often difficult to comprehend. We seek an alternative practice for setting these functional priors. In particular, we consider the scenario where we have data from similar functions that allow us to pre-train a tighter distribution a priori. Theoretically, we show a bounded regret of BO with pre-trained priors. To verify our approach in realistic model training setups, we collected a large multi-task hyperparameter tuning dataset by training tens of thousands of configurations of near-state-of-the-art models on popular image and text datasets, as well as a protein sequence dataset. Our results show that on average, our method is able to locate good hyperparameters at least 3 times more efficiently than the best competing methods.  ( 3 min )
    A single $T$-gate makes distribution learning hard. (arXiv:2207.03140v1 [quant-ph])
    The task of learning a probability distribution from samples is ubiquitous across the natural sciences. The output distributions of local quantum circuits form a particularly interesting class of distributions, of key importance both to quantum advantage proposals and a variety of quantum machine learning algorithms. In this work, we provide an extensive characterization of the learnability of the output distributions of local quantum circuits. Our first result yields insight into the relationship between the efficient learnability and the efficient simulatability of these distributions. Specifically, we prove that the density modelling problem associated with Clifford circuits can be efficiently solved, while for depth $d=n^{\Omega(1)}$ circuits the injection of a single $T$-gate into the circuit renders this problem hard. This result shows that efficient simulatability does not imply efficient learnability. Our second set of results provides insight into the potential and limitations of quantum generative modelling algorithms. We first show that the generative modelling problem associated with depth $d=n^{\Omega(1)}$ local quantum circuits is hard for any learning algorithm, classical or quantum. As a consequence, one cannot use a quantum algorithm to gain a practical advantage for this task. We then show that, for a wide variety of the most practically relevant learning algorithms -- including hybrid-quantum classical algorithms -- even the generative modelling problem associated with depth $d=\omega(\log(n))$ Clifford circuits is hard. This result places limitations on the applicability of near-term hybrid quantum-classical generative modelling algorithms.  ( 3 min )
    On the instrumental variable estimation with many weak and invalid instruments. (arXiv:2207.03035v1 [stat.ME])
    We discuss the fundamental issue of identification in linear instrumental variable (IV) models with unknown IV validity. We revisit the popular majority and plurality rules and show that no identification condition can be "if and only if" in general. With the assumption of the "sparsest rule", which is equivalent to the plurality rule but becomes operational in computation algorithms, we investigate and prove the advantages of non-convex penalized approaches over other IV estimators based on two-step selections, in terms of selection consistency and accommodation for individually weak IVs. Furthermore, we propose a surrogate sparsest penalty that aligns with the identification condition and provides oracle sparse structure simultaneously. Desirable theoretical properties are derived for the proposed estimator with weaker IV strength conditions compared to the previous literature. Finite sample properties are demonstrated using simulations and the selection and estimation method is applied to an empirical study concerning the effect of trade on economic growth.  ( 2 min )
    A Simple and Provably Efficient Algorithm for Asynchronous Federated Contextual Linear Bandits. (arXiv:2207.03106v1 [cs.LG])
    We study federated contextual linear bandits, where $M$ agents cooperate with each other to solve a global contextual linear bandit problem with the help of a central server. We consider the asynchronous setting, where all agents work independently and the communication between one agent and the server will not trigger other agents' communication. We propose a simple algorithm named \texttt{FedLinUCB} based on the principle of optimism. We prove that the regret of \texttt{FedLinUCB} is bounded by $\tilde{O}(d\sqrt{\sum_{m=1}^M T_m})$ and the communication complexity is $\tilde{O}(dM^2)$, where $d$ is the dimension of the contextual vector and $T_m$ is the total number of interactions with the environment by $m$-th agent. To the best of our knowledge, this is the first provably efficient algorithm that allows fully asynchronous communication for federated contextual linear bandits, while achieving the same regret guarantee as in the single-agent setting.  ( 2 min )
    Functional additive models on manifolds of planar shapes and forms. (arXiv:2109.02624v4 [stat.ME] UPDATED)
    The "shape" of a planar curve and/or landmark configuration is considered its equivalence class under translation, rotation and scaling, its "form" its equivalence class under translation and rotation while scale is preserved. We extend generalized additive regression to models for such shapes/forms as responses respecting the resulting quotient geometry by employing the squared geodesic distance as loss function and a geodesic response function to map the additive predictor to the shape/form space. For fitting the model, we propose a Riemannian $L_2$-Boosting algorithm well suited for a potentially large number of possibly parameter-intensive model terms, which also yields automated model selection. We provide novel intuitively interpretable visualizations for (even non-linear) covariate effects in the shape/form space via suitable tensor-product factorization. The usefulness of the proposed framework is illustrated in an analysis of 1) astragalus shapes of wild and domesticated sheep and 2) cell forms generated in a biophysical model, as well as 3) in a realistic simulation study with response shapes and forms motivated from a dataset on bottle outlines.  ( 2 min )
    Riemannian Diffusion Schr\"odinger Bridge. (arXiv:2207.03024v1 [stat.ML])
    Score-based generative models exhibit state of the art performance on density estimation and generative modeling tasks. These models typically assume that the data geometry is flat, yet recent extensions have been developed to synthesize data living on Riemannian manifolds. Existing methods to accelerate sampling of diffusion models are typically not applicable in the Riemannian setting and Riemannian score-based methods have not yet been adapted to the important task of interpolation of datasets. To overcome these issues, we introduce \emph{Riemannian Diffusion Schr\"odinger Bridge}. Our proposed method generalizes Diffusion Schr\"odinger Bridge introduced in \cite{debortoli2021neurips} to the non-Euclidean setting and extends Riemannian score-based models beyond the first time reversal. We validate our proposed method on synthetic data and real Earth and climate data.  ( 2 min )
    Provable Domain Generalization via Invariant-Feature Subspace Recovery. (arXiv:2201.12919v2 [cs.LG] UPDATED)
    Domain generalization asks for models trained over a set of training environments to perform well in unseen test environments. Recently, a series of algorithms such as Invariant Risk Minimization (IRM) has been proposed for domain generalization. However, Rosenfeld et al. (2021) shows that in a simple linear data model, even if non-convexity issues are ignored, IRM and its extensions cannot generalize to unseen environments with less than $d_s+1$ training environments, where $d_s$ is the dimension of the spurious-feature subspace. In this paper, we propose to achieve domain generalization with Invariant-feature Subspace Recovery (ISR). Our first algorithm, ISR-Mean, can identify the subspace spanned by invariant features from the first-order moments of the class-conditional distributions, and achieve provable domain generalization with $d_s+1$ training environments under the data model of Rosenfeld et al. (2021). Our second algorithm, ISR-Cov, further reduces the required number of training environments to $O(1)$ using the information of second-order moments. Notably, unlike IRM, our algorithms bypass non-convexity issues and enjoy global convergence guarantees. Empirically, our ISRs can obtain superior performance compared with IRM on synthetic benchmarks. In addition, on three real-world image and text datasets, we show that both ISRs can be used as simple yet effective post-processing methods to improve the worst-case accuracy of (pre-)trained models against spurious correlations and group shifts.  ( 3 min )
    Unsupervised Manifold Alignment with Joint Multidimensional Scaling. (arXiv:2207.02968v1 [stat.ML])
    We introduce Joint Multidimensional Scaling, a novel approach for unsupervised manifold alignment, which maps datasets from two different domains, without any known correspondences between data instances across the datasets, to a common low-dimensional Euclidean space. Our approach integrates Multidimensional Scaling (MDS) and Wasserstein Procrustes analysis into a joint optimization problem to simultaneously generate isometric embeddings of data and learn correspondences between instances from two different datasets, while only requiring intra-dataset pairwise dissimilarities as input. This unique characteristic makes our approach applicable to datasets without access to the input features, such as solving the inexact graph matching problem. We propose an alternating optimization scheme to solve the problem that can fully benefit from the optimization techniques for MDS and Wasserstein Procrustes. We demonstrate the effectiveness of our approach in several applications, including joint visualization of two datasets, unsupervised heterogeneous domain adaptation, graph matching, and protein structure alignment.  ( 2 min )
    Model Selection in Reinforcement Learning with General Function Approximations. (arXiv:2207.02992v1 [stat.ML])
    We consider model selection for classic Reinforcement Learning (RL) environments -- Multi Armed Bandits (MABs) and Markov Decision Processes (MDPs) -- under general function approximations. In the model selection framework, we do not know the function classes, denoted by $\mathcal{F}$ and $\mathcal{M}$, where the true models -- reward generating function for MABs and and transition kernel for MDPs -- lie, respectively. Instead, we are given $M$ nested function (hypothesis) classes such that true models are contained in at-least one such class. In this paper, we propose and analyze efficient model selection algorithms for MABs and MDPs, that \emph{adapt} to the smallest function class (among the nested $M$ classes) containing the true underlying model. Under a separability assumption on the nested hypothesis classes, we show that the cumulative regret of our adaptive algorithms match to that of an oracle which knows the correct function classes (i.e., $\cF$ and $\cM$) a priori. Furthermore, for both the settings, we show that the cost of model selection is an additive term in the regret having weak (logarithmic) dependence on the learning horizon $T$.  ( 2 min )
    The Union of Manifolds Hypothesis and its Implications for Deep Generative Modelling. (arXiv:2207.02862v1 [stat.ML])
    Deep learning has had tremendous success at learning low-dimensional representations of high-dimensional data. This success would be impossible if there was no hidden low-dimensional structure in data of interest; this existence is posited by the manifold hypothesis, which states that the data lies on an unknown manifold of low intrinsic dimension. In this paper, we argue that this hypothesis does not properly capture the low-dimensional structure typically present in data. Assuming the data lies on a single manifold implies intrinsic dimension is identical across the entire data space, and does not allow for subregions of this space to have a different number of factors of variation. To address this deficiency, we put forth the union of manifolds hypothesis, which accommodates the existence of non-constant intrinsic dimensions. We empirically verify this hypothesis on commonly-used image datasets, finding that indeed, intrinsic dimension should be allowed to vary. We also show that classes with higher intrinsic dimensions are harder to classify, and how this insight can be used to improve classification accuracy. We then turn our attention to the impact of this hypothesis in the context of deep generative models (DGMs). Most current DGMs struggle to model datasets with several connected components and/or varying intrinsic dimensions. To tackle these shortcomings, we propose clustered DGMs, where we first cluster the data and then train a DGM on each cluster. We show that clustered DGMs can model multiple connected components with different intrinsic dimensions, and empirically outperform their non-clustered counterparts without increasing computational requirements.  ( 3 min )

  • Open

    [R] Self-Modeling Programs: A Direct Approach to Program Likelihood
    PDF Link Abstract: In algorithmic information theory, the length of a program is used as a measure of its probability. This paper presents a category of programs that directly compute the combined probability of their own code symbols and input data. The probability of each symbol is computed from past symbols by requiring that execution of the program formed by the first n symbols returns a probability distribution over symbols for position n + 1. The program of this type with the highest likelihood ending in the input data sequence intuitively represents the most likely sequence of events that could have generated the data. Advantages of programs of this form and the relationship to the Kolmogorov complexity are discussed. I'd appreciate any criticisms or comments. submitted by /u/ml6189 [link] [comments]  ( 86 min )
    [R] Collecting survey responses for Machine Learning Report in Australia and New Zealand
    Hi everyone. I hope this post doesn't go against community or this subreddit's rules. Please remove if so (or point me in the right direction). The organisation I work for, DiUS, is conducting some research into Machine Learning. We're looking for people at all stages of their ML journey, from experimenting to applying, to complete a quick five minute survey. The results will inform our 2022 National Pulse Report, due to be published later this year. Please note: We are only looking for responses from those in Australia and New Zealand. For your time, we'll send you a copy of the published report and make a donation to our charity partner OzHarvest. Thanks! https://www.surveymonkey.com/r/dius_ml_survey?&utm_source=ml-reddit&utm_medium=display&utm_campaign=mlsurvey&utm_content=homepage submitted by /u/cj_td [link] [comments]  ( 86 min )
    [D] How to deal with badly labelled data?
    The labeling team at my organization is very bad. They take forever to understand the labeling objective. And produce datasets that are not very reliable. The take months to annotate a small dataset of roughly 2000 images. Now, I have 2 questions: How do I spot these anomalies? (Classification Dataset) How do I generate pseudo labels or use similar techniques to generate data for training? Should I complain about them to my manager or ask them to label the datasets again? Because this situation is getting out of hand submitted by /u/FnSK4R17s [link] [comments]  ( 86 min )
    [D] Paper Explained - JEPA: A Path Towards Autonomous Machine Intelligence (Video Walkthrough)
    https://youtu.be/jSdHmImyUjk Yann LeCun's position paper on a path towards machine intelligence combines Self-Supervised Learning, Energy-Based Models, and hierarchical predictive embedding models to arrive at a system that can teach itself to learn useful abstractions at multiple levels and use that as a world model to plan ahead in time. ​ OUTLINE: 0:00 - Introduction 2:00 - Main Contributions 5:45 - Mode 1 and Mode 2 actors 15:40 - Self-Supervised Learning and Energy-Based Models 20:15 - Introducing latent variables 25:00 - The problem of collapse 29:50 - Contrastive vs regularized methods 36:00 - The JEPA architecture 47:00 - Hierarchical JEPA (H-JEPA) 53:00 - Broader relevance 56:00 - Summary & Comments ​ Paper: https://openreview.net/forum?id=BZ5a1r-kVsf submitted by /u/ykilcher [link] [comments]  ( 86 min )
    [D] Current state of modeling uncertainty for Bayesian optimization?
    Hello, recently I've gotten into Bayesian Optimization and was looking to use it for a NAS use case using a gaussian process, but it seems to me that there are far more/better scaling options rather than using a GP, such as MC dropout on a regular NN, BNNs, NN ensembles, and SWAG (which I don't really understand). I would appreciate any advise on the advantages/disadvantages of these methods or direction to some survey/review of modeling uncertainty. Thanks in advance. submitted by /u/Nearby-Vehicle6622 [link] [comments]  ( 87 min )
    [D] An accusation of academic misconduct by Prof. Yisen Wang (Peking University) in ICML2021 and NeurIPS2021
    I recently noticed a Weibo (Chinese Twitter) thread of an alarming potential academic misconduct - Prof. Yisen Wang’s girlfriend accused him of cheating and collusion behaviors in recent top-tier machine learning conferences, including but may not limit to NeurIPS2021 and ICML2021. Yisen Wang (homepage: https://yisenwang.github.io/) obtained his Ph.D. degree at Tsinghua University (China) and is now an assistant professor at Peking University (China). Yisen is interested in adversarial attack, etc. Here are some facts from Yisen’s girlfriend’s post: [Cheating in best paper nomination in ICML 2021] In ICML2021, Yisen asked one area chair of ICML2021 to recommend his first PhD student Jingyi Cui’s paper to be best paper candidate(I am not sure if it is termed as “best paper candidate”, …  ( 93 min )
    [Discussion] About model serving for production
    Hi! I hope I'm not breaking any rules with this question. I'm studying some frameworks used in production for model serving, namely Seldon Core, Kubeflow and an academic artifact named Clipper. Some can manage the entire ML life cycle, but I have a question about serving in production. In particular, how would one go about actually batching multiple requests on the cloud? There doesn't seem to be a golden standard for it, so I'm assuming it depends on the size of the data and on the scope of model, right? For example, if the goal is image classification, it could be useful to have a cloud queue, right? If so, do you know what are some solutions that are actually used in production? submitted by /u/Mediocre-Piccolo7474 [link] [comments]  ( 86 min )
    [D] LeCun's 2022 paper on autonomous machine intelligence rehashes but does not cite essential work of 1990-2015
    Saw Schmidhuber’s tweeting again: 🔥 “Lecun’s 2022 paper on Autonomous Machine Intelligence rehashes but doesn’t cite essential work of 1990-2015. We’ve already published his “main original contributions:” learning subgoals, predictable abstract representations, multiple time scales…” Jürgen Schmidhuber’s response to Yann Lecun’s recent technical report / position paper “Autonomous Machine Intelligence” in this latest blog post: https://people.idsia.ch/~juergen/lecun-rehash-1990-2022.html An excerpt: On 14 June 2022, a science tabloid that published this article (24 June) on LeCun's report “A Path Towards Autonomous Machine Intelligence” (27 June) sent me a draft of the report (back then still under embargo) and asked for comments. I wrote a review (see below), telling them that this is essentially a rehash of our previous work that LeCun did not mention. My comments, however, fell on deaf ears. Now I am posting my not so enthusiastic remarks here such that the history of our field does not become further corrupted. The images below link to relevant blog posts from the AI Blog. I would like to start this by acknowledging that I am not without a conflict of interest here; my seeking to correct the record will naturally seem self-interested. The truth of the matter is that it is. Much of the closely related work pointed to below was done in my lab, and I naturally wish that it be acknowledged, and recognized. Setting my conflict aside, I ask the reader to study the original papers and judge for themselves the scientific content of these remarks, as I seek to set emotions aside and minimize bias so much as I am capable. For reference, previous discussion on r/MachineLearning about Yann Lecun’s paper: https://www.reddit.com/r/MachineLearning/comments/vm39oe/a_path_towards_autonomous_machine_intelligence/ submitted by /u/hardmaru [link] [comments]  ( 90 min )
    [D] Why do first layer filters in CNNs converge to edge-detector-like filters?
    I believe its well known that generally first layer filters in CNNs will converge to "edge-detector-like" shapes like this: shorturl.at/ANS78. This phenomenon is independent of the task from what I've seen - every large CNN backbone I've trained will converge to this given enough data. There is also research showing this type of edge detection happens in the visual cortex. Thus this edge detector phenomenon appears to be some fundamentally emergent property of the real world (+ maybe CNN type processors) Is there any compelling technical explanation for how SGD and its variants can reliably produce this convergence? I don't mean why edge detectors are "good" first stage filters - that intuitively makes sense to me. But rather, how is it that SGD can reliably produce this type of convergence on any dataset? I've been looking for a while for an explanation but couldn't find anything great. I was thinking that maybe there is some explanation using an assumption that edges are naturally "higher information" on raw images from the real world, and thus more directionally stepped towards in the gradient? But can't get the explanation to a satisfying state. submitted by /u/AeronByHermanMiller [link] [comments]  ( 90 min )
  • Open

    Using artificial intelligence, scientists recreate the smells of the past
    submitted by /u/ezikler [link] [comments]  ( 84 min )
    AI2’s PRIOR Team Introduces Unified-IO: The First Neural Model To Execute Various AI Tasks Spanning Classical Computer Vision, Image Synthesis, Vision-and-Language, and Natural Language Processing NLP
    Almost all industries are now using machine learning systems to improve the efficiency and dependability of their work. With the increasing use of ML, companies have seen a boom in the investments in the resources needed to support ML systems. Additionally, a single ML process necessitates the execution of numerous distinct models, further complicating the process and increasing costs. The idea of “Unified Models” was established in recent years, where a single model is constructed to power a process or product rather than a collection of connected but independent models. Combining all of the necessary data into one array and passing it to the model makes it possible to create a unified model that delivers all of the findings at once rather than by calling individual models one at a time. Continue reading | Check out the demo submitted by /u/ai-lover [link] [comments]  ( 85 min )
    AI Dream 60 - EPIC Cosmic Midjourney Expedition by AI
    submitted by /u/LordPewPew777 [link] [comments]  ( 84 min )
    Nvidia Omniverse AI Predicts Alternate Future of The World | FIFA Uses Full Body Tracking AI | New Meta AI Translates 200 Languages With Highest Degree of Accuracy
    submitted by /u/tohelpyou88 [link] [comments]  ( 84 min )
    Midjourney Invites
    If anyone wants to get a midjourney invite, feel free to DM me. submitted by /u/xAnunnakix [link] [comments]  ( 84 min )
    It’s so hard to find motivation to finish your art sometimes especially when you work full time 😫
    submitted by /u/Legitimate_Run_6350 [link] [comments]  ( 84 min )
    AI: Diagnosis and Forecasting Spread of Infectious Diseases
    ​ AI is taking great strides in facilitating the way organizations are handling the pandemic. More so, the scope for AI professionals in healthcare sectors could be bountiful. submitted by /u/Emily-joe [link] [comments]  ( 84 min )
    AI Referee Will Track Players' Individual Limbs at World Cup
    submitted by /u/estasfuera [link] [comments]  ( 85 min )
    Quick analysis of the most in-demand jobs in AI/ML in 2022
    In short: Data Engineers are still the most sought-after professionals in the field (more engineering, less "modeling"?), demand for analysts and leadership (!) roles is on the rise. Full insights here: https://insights.ai-jobs.net/the-10-most-in-demand-jobs-in-ai-ml-and-big-data-in-2022/ submitted by /u/ai_jobs [link] [comments]  ( 84 min )
    Customizable Writing AI?
    This is totally a shot in the dark but I'm going for it anyway. Long story short, for kicks and giggles, I am trying to find a writing AI that allows you to input example writing of your choice to pull from rather than just give it a sentence prompt. I've spent an hour or so trying to google one up to no avail. Huge thanks in advance! submitted by /u/MustangLegends [link] [comments]  ( 85 min )
    Who needs a midjourney invite? Bc I got some left
    I dunno who to give these invites to so if anyone needs one I got you all!! submitted by /u/CombinationMammoth50 [link] [comments]  ( 84 min )
    Fairy's Pure Beauty | Raw Unscaled (FILM) | PYTTI 3D AI Art Animation
    submitted by /u/Available_Tadpole829 [link] [comments]  ( 84 min )
    Is there an AI out there that takes an image and makes it more "realistic?"
    I know AI face generators can make pretty impressive images from scratch but I'm wondering if there is something that takes an image as input (like a video game screenshot or character) and spits out a more realistic version of it. I could imagine this would be a fun tool for 3D artists. Thanks! submitted by /u/ImPlento [link] [comments]  ( 85 min )
  • Open

    Drive efficiencies with CI/CD best practices on Amazon Lex
    Let’s say you have identified a use case in your organization that you would like to handle via a chatbot. You familiarized yourself with Amazon Lex, built a prototype, and did a few trial interactions with the bot. You liked the overall experience and now want to deploy the bot in your production environment, but […]  ( 7 min )
    Feature engineering at scale for healthcare and life sciences with Amazon SageMaker Data Wrangler
    Machine learning (ML) is disrupting a lot of industries at an unprecedented pace. The healthcare and life sciences (HCLS) industry has been going through a rapid evolution in recent years embracing ML across a multitude of use cases for delivering quality care and improving patient outcomes. In a typical ML lifecycle, data engineers and scientists […]  ( 17 min )
  • Open

    Mission-Driven: Takeaways From Our Corporate Responsibility Report
    NVIDIA’s latest corporate responsibility report shares our efforts in empowering employees and putting to work our technologies for the benefit of humanity. Amid ongoing global economic concerns and pandemic challenges, this year’s report highlights our ability to attract and retain talent that come here to do their life’s work while tackling some of the world’s Read article > The post Mission-Driven: Takeaways From Our Corporate Responsibility Report appeared first on NVIDIA Blog.  ( 7 min )
    GFN Thursday Brings New Games to GeForce NOW for the Perfect Summer Playlist
    Nothing beats the summer heat like GFN Thursday. Get ready for four new titles streaming at GeForce quality across nearly any device. Buckle up for some great gaming, whether poolside, in the car for a long road trip, or in the air-conditioned comfort of home. Speaking of summer, it’s also last call for this year’s Read article > The post GFN Thursday Brings New Games to GeForce NOW for the Perfect Summer Playlist appeared first on NVIDIA Blog.  ( 5 min )
    Wordle for AI: Santiago Valderrama on Getting Smarter on Machine Learning
    Want to learn about AI and machine learning? There are plenty of resources out there to help — blogs, podcasts, YouTube tutorials — perhaps too many. Machine learning engineer Santiago Valdarrama has taken a far more focused approach to helping us all get smarter about the field. He’s created a following by posing one machine Read article > The post Wordle for AI: Santiago Valderrama on Getting Smarter on Machine Learning appeared first on NVIDIA Blog.  ( 5 min )
  • Open

    Memory allocation problems in Stable Baselines3
    I'm trying to make an AI that finds the exit in a 50x50 maze using stable baselines3. The maze is represented by a 2d list where -1 means unexplored, 0 means empty space, 1 means wall and 2 means exit. There's another list on top of this one with the player's coordinates (so its a 3d list). It begins like this: self.pmp=[[-1]*50 for _ in range(50)] This is the AI's personal map, there also an objective map which is fully explored and it's added slowly to the personal map depending on the agent's coordinates. But every time I try to train the AI I get this error: model=DQN('MlpPolicy', env, verbose=1) numpy.core._exceptions._ArrayMemoryError: Unable to allocate 18.6 GiB for an array with shape (1000000, 1, 2, 50, 50) and data type int32 Not sure where the 1000000 came from. I tried saving memory by replacing that first bit of code with this: self.pmp=np.empty((50,50)) But it didn't do anything. Is there a way to reduce the memory this process takes up? submitted by /u/AnonCaptain0022 [link] [comments]  ( 85 min )
    ELI5: Braids counter example.
    Hi, I am really confused on the braids counter example: were does the state start, how does it affect function approximation, etc. Online searches either make me more confused or give me braided haircuts ELI5 maybe a bit too much, but could someone help explain to me Braids counter example. my best guess right now is that its like a cyclic import in python. submitted by /u/100M-900 [link] [comments]  ( 84 min )
    I have been reading about POMDP, but still confused between the differences in state, observation and belief. Can someone please explain it, with an example preferably.
    submitted by /u/aabra__ka__daabra [link] [comments]  ( 86 min )
    Where can I get pre trained machine learning models?
    submitted by /u/PopOk539 [link] [comments]  ( 84 min )
    RecurrentPPO (SB3-contrib) learning for autonomous driving
    Hi everyone! I'm a complete newbie to DRL, so please forgive my lack of understanding of some things on here. I'm training a recPPO from SB3-contrib on E.Leurent's Highway env [https://github.com/eleurent/highway-env] (I customized the action space to be more high-level). During training I get the desired behavioural outcome from the agent but I noticed that some training metrics of the model seem quite off respect to the trend found online (especially the explained variance).I just wanted an opinion from some more navigated fellas in here! Can I somehow fix this trend by hyperparameter tuning or do I have e.g. to modify the reward function somehow? How can I improve the training? For any details I'm always available. I share the tensorboard plots obtained for RecPPO. Fixed LR RecPPO ​ Linearly decreasing LR RecPPO P.S. with a fixed LR the model performs way better on the env it trained on and is very poor in exploitation on more complex envs (but it's ok, there are scenarios he couldn't have seen), while the one with decreasing LR performs poorly on the training env (crashes a lot) and does better in exploitation (but it has a weird way to navigate). Thank you in advance for the help! submitted by /u/pigopigu [link] [comments]  ( 85 min )
    Question about the old policy and new policy in TRPO code
    The code is a TRPO code. In this code, when "get_kl" , I can't understand the differences between the "mean0, log_std0, std0" and "mean1, log_std1, std1", aren't they equal in the code? And both the difference between the log_probs of old policy and new policy in the part of "get_loss" , aren't they equal in the code? Thanks for the help! submitted by /u/Snoopy9797 [link] [comments]  ( 86 min )
  • Open

    Enabling Creative Expression with Concept Activation Vectors
    Posted by Been Kim, Research Scientist, Google Research, Brain Team, and Alison Lentz, Senior Staff Strategist, Google Research, Mural Team Advances in computer vision and natural language processing continue to unlock new ways of exploring billions of images available on public and searchable websites. Today’s visual search tools make it possible to search with your camera, voice, text, images, or multiple modalities at the same time. However, it remains difficult to input subjective concepts, such as visual tones or moods, into current systems. For this reason, we have been working collaboratively with artists, photographers, and image researchers to explore how machine learning (ML) might enable people to use expressive queries as a way of visually exploring datasets. Today, we are i…  ( 22 min )
  • Open

    Nvidia Omniverse AI Predicts Alternate Future of The World | FIFA Uses Full Body Tracking AI | New Meta AI Translates 200 Languages With Highest Degree of Accuracy
    submitted by /u/tohelpyou88 [link] [comments]  ( 84 min )
    Where can I get pre trained machine learning models?
    submitted by /u/PopOk539 [link] [comments]  ( 84 min )
  • Open

    Sentient AI And The Turing Test — Did Google Engineer Prove Computers Can Have Feelings?
    One the biggest stories of the year in the AI community is about a Google engineer’s claim of sentient AI. This was part of Google’s LaMDA…  ( 21 min )
    Artificial Intelligences
    Artificial intelligences as our allies  ( 8 min )
  • Open

    AI4Science to empower the fifth paradigm of scientific discovery
    Over the coming decade, deep learning looks set to have a transformational impact on the natural sciences. The consequences are potentially far-reaching and could dramatically improve our ability to model and predict natural phenomena over widely varying scales of space and time. Could this capability represent the dawn of a new paradigm of scientific discovery? […] The post AI4Science to empower the fifth paradigm of scientific discovery appeared first on Microsoft Research.  ( 10 min )
  • Open

    Smart textiles sense how their users are moving
    Researchers develop a comfortable, form-fitting fabric that recognizes its wearer’s activities, like walking, running, and jumping.  ( 8 min )
  • Open

    AI-enhanced iterative solvers for accelerating the solution of large scale parametrized linear systems of equations. (arXiv:2207.02543v1 [math.NA])
    Recent advances in the field of machine learning open a new era in high performance computing. Applications of machine learning algorithms for the development of accurate and cost-efficient surrogates of complex problems have already attracted major attention from scientists. Despite their powerful approximation capabilities, however, surrogates cannot produce the `exact' solution to the problem. To address this issue, this paper exploits up-to-date ML tools and delivers customized iterative solvers of linear equation systems, capable of solving large-scale parametrized problems at any desired level of accuracy. Specifically, the proposed approach consists of the following two steps. At first, a reduced set of model evaluations is performed and the corresponding solutions are used to establish an approximate mapping from the problem's parametric space to its solution space using deep feedforward neural networks and convolutional autoencoders. This mapping serves a means to obtain very accurate initial predictions of the system's response to new query points at negligible computational cost. Subsequently, an iterative solver inspired by the Algebraic Multigrid method in combination with Proper Orthogonal Decomposition, termed POD-2G, is developed that successively refines the initial predictions towards the exact system solutions. The application of POD-2G as a standalone solver or as preconditioner in the context of preconditioned conjugate gradient methods is demonstrated on several numerical examples of large scale systems, with the results indicating its superiority over conventional iterative solution schemes.  ( 3 min )
    DIWIFT: Discovering Instance-wise Influential Features for Tabular Data. (arXiv:2207.02773v1 [cs.LG])
    Tabular data is one of the most common data storage formats in business applications, ranging from retail, bank and E-commerce. These applications rely heavily on machine learning models to achieve business success. One of the critical problems in learning tabular data is to distinguish influential features from all the predetermined features. Global feature selection has been well-studied for quite some time, assuming that all instances have the same influential feature subsets. However, different instances rely on different feature subsets in practice, which also gives rise to that instance-wise feature selection receiving increasing attention in recent studies. In this paper, we first propose a novel method for discovering instance-wise influential features for tabular data (DIWIFT), the core of which is to introduce the influence function to measure the importance of an instance-wise feature. DIWIFT is capable of automatically discovering influential feature subsets of different sizes in different instances, which is different from global feature selection that considers all instances with the same influential feature subset. On the other hand, different from the previous instance-wise feature selection, DIWIFT minimizes the validation loss on the validation set and is thus more robust to the distribution shift existing in the training dataset and test dataset, which is important in tabular data. Finally, we conduct extensive experiments on both synthetic and real-world datasets to validate the effectiveness of our DIWIFT, compared it with baseline methods. Moreover, we also demonstrate the robustness of our method via some ablation experiments.  ( 3 min )
    Clustering with Semidefinite Programming and Fixed Point Iteration. (arXiv:2012.09202v3 [math.OC] UPDATED)
    We introduce a novel method for clustering using a semidefinite programming (SDP) relaxation of the Max k-Cut problem. The approach is based on a new methodology for rounding the solution of an SDP relaxation using iterated linear optimization. We show the vertices of the Max k-Cut relaxation correspond to partitions of the data into at most k sets. We also show the vertices are attractive fixed points of iterated linear optimization. Each step of this iterative process solves a relaxation of the closest vertex problem and leads to a new clustering problem where the underlying clusters are more clearly defined. Our experiments show that using fixed point iteration for rounding the Max k-Cut SDP relaxation leads to significantly better results when compared to randomized rounding.  ( 2 min )
    When does Bias Transfer in Transfer Learning?. (arXiv:2207.02842v1 [cs.LG])
    Using transfer learning to adapt a pre-trained "source model" to a downstream "target task" can dramatically increase performance with seemingly no downside. In this work, we demonstrate that there can exist a downside after all: bias transfer, or the tendency for biases of the source model to persist even after adapting the model to the target class. Through a combination of synthetic and natural experiments, we show that bias transfer both (a) arises in realistic settings (such as when pre-training on ImageNet or other standard datasets) and (b) can occur even when the target dataset is explicitly de-biased. As transfer-learned models are increasingly deployed in the real world, our work highlights the importance of understanding the limitations of pre-trained source models. Code is available at https://github.com/MadryLab/bias-transfer  ( 2 min )
    A Tutorial on the Spectral Theory of Markov Chains. (arXiv:2207.02296v1 [cs.LG])
    Markov chains are a class of probabilistic models that have achieved widespread application in the quantitative sciences. This is in part due to their versatility, but is compounded by the ease with which they can be probed analytically. This tutorial provides an in-depth introduction to Markov chains, and explores their connection to graphs and random walks. We utilize tools from linear algebra and graph theory to describe the transition matrices of different types of Markov chains, with a particular focus on exploring properties of the eigenvalues and eigenvectors corresponding to these matrices. The results presented are relevant to a number of methods in machine learning and data mining, which we describe at various stages. Rather than being a novel academic study in its own right, this text presents a collection of known results, together with some new concepts. Moreover, the tutorial focuses on offering intuition to readers rather than formal understanding, and only assumes basic exposure to concepts from linear algebra and probability theory. It is therefore accessible to students and researchers from a wide variety of disciplines.  ( 2 min )
    A Deep Model for Partial Multi-Label Image Classification with Curriculum Based Disambiguation. (arXiv:2207.02410v1 [cs.CV])
    In this paper, we study the partial multi-label (PML) image classification problem, where each image is annotated with a candidate label set consists of multiple relevant labels and other noisy labels. Existing PML methods typically design a disambiguation strategy to filter out noisy labels by utilizing prior knowledge with extra assumptions, which unfortunately is unavailable in many real tasks. Furthermore, because the objective function for disambiguation is usually elaborately designed on the whole training set, it can be hardly optimized in a deep model with SGD on mini-batches. In this paper, for the first time we propose a deep model for PML to enhance the representation and discrimination ability. On one hand, we propose a novel curriculum based disambiguation strategy to progressively identify ground-truth labels by incorporating the varied difficulties of different classes. On the other hand, a consistency regularization is introduced for model retraining to balance fitting identified easy labels and exploiting potential relevant labels. Extensive experimental results on the commonly used benchmark datasets show the proposed method significantly outperforms the SOTA methods.  ( 2 min )
    Scaling Private Deep Learning with Low-Rank and Sparse Gradients. (arXiv:2207.02699v1 [cs.LG])
    Applying Differentially Private Stochastic Gradient Descent (DPSGD) to training modern, large-scale neural networks such as transformer-based models is a challenging task, as the magnitude of noise added to the gradients at each iteration scales with model dimension, hindering the learning capability significantly. We propose a unified framework, $\textsf{LSG}$, that fully exploits the low-rank and sparse structure of neural networks to reduce the dimension of gradient updates, and hence alleviate the negative impacts of DPSGD. The gradient updates are first approximated with a pair of low-rank matrices. Then, a novel strategy is utilized to sparsify the gradients, resulting in low-dimensional, less noisy updates that are yet capable of retaining the performance of neural networks. Empirical evaluation on natural language processing and computer vision tasks shows that our method outperforms other state-of-the-art baselines.  ( 2 min )
    Towards the Use of Saliency Maps for Explaining Low-Quality Electrocardiograms to End Users. (arXiv:2207.02726v1 [cs.LG])
    When using medical images for diagnosis, either by clinicians or artificial intelligence (AI) systems, it is important that the images are of high quality. When an image is of low quality, the medical exam that produced the image often needs to be redone. In telemedicine, a common problem is that the quality issue is only flagged once the patient has left the clinic, meaning they must return in order to have the exam redone. This can be especially difficult for people living in remote regions, who make up a substantial portion of the patients at Portal Telemedicina, a digital healthcare organization based in Brazil. In this paper, we report on ongoing work regarding (i) the development of an AI system for flagging and explaining low-quality medical images in real-time, (ii) an interview study to understand the explanation needs of stakeholders using the AI system at OurCompany, and, (iii) a longitudinal user study design to examine the effect of including explanations on the workflow of the technicians in our clinics. To the best of our knowledge, this would be the first longitudinal study on evaluating the effects of XAI methods on end-users -- stakeholders that use AI systems but do not have AI-specific expertise. We welcome feedback and suggestions on our experimental setup.  ( 3 min )
    Pre-training Transformers for Molecular Property Prediction Using Reaction Prediction. (arXiv:2207.02724v1 [cs.LG])
    Molecular property prediction is essential in chemistry, especially for drug discovery applications. However, available molecular property data is often limited, encouraging the transfer of information from related data. Transfer learning has had a tremendous impact in fields like Computer Vision and Natural Language Processing signaling for its potential in molecular property prediction. We present a pre-training procedure for molecular representation learning using reaction data and use it to pre-train a SMILES Transformer. We fine-tune and evaluate the pre-trained model on 12 molecular property prediction tasks from MoleculeNet within physical chemistry, biophysics, and physiology and show a statistically significant positive effect on 5 of the 12 tasks compared to a non-pre-trained baseline model.  ( 2 min )
    Careful seeding for the k-medoids algorithm with incremental k++ cluster construction. (arXiv:2207.02404v1 [cs.LG])
    The k-medoids algorithm is a popular variant of the k-means algorithm and widely used in pattern recognition and machine learning. A main drawback of the k-medoids algorithm is that it can be trapped in local optima. An improved k-medoids algorithm (INCKM) was recently proposed to overcome this drawback, based on constructing a candidate medoids subset with a parameter choosing procedure, but it may fail when dealing with imbalanced datasets. In this paper, we propose a novel incremental k-medoids algorithm (INCKPP) which dynamically increases the number of clusters from 2 to k through a nonparametric and stochastic k-means++ search procedure. Our algorithm can overcome the parameter selection problem in the improved k-medoids algorithm, improve the clustering performance, and deal with imbalanced datasets very well. But our algorithm has a weakness in computation efficiency. To address this issue, we propose a fast INCKPP algorithm (called INCKPP$_{sample}$) which preserves the computational efficiency of the simple and fast k-medoids algorithm with an improved clustering performance. The proposed algorithm is compared with three state-of-the-art algorithms: the improved k-medoids algorithm (INCKM), the simple and fast k-medoids algorithm (FKM) and the k-means++ algorithm (KPP). Extensive experiments on both synthetic and real world datasets including imbalanced datasets illustrate the effectiveness of the proposed algorithm.  ( 2 min )
    Nonparametric Factor Trajectory Learning for Dynamic Tensor Decomposition. (arXiv:2207.02446v1 [cs.LG])
    Tensor decomposition is a fundamental framework to analyze data that can be represented by multi-dimensional arrays. In practice, tensor data is often accompanied by temporal information, namely the time points when the entry values were generated. This information implies abundant, complex temporal variation patterns. However, current methods always assume the factor representations of the entities in each tensor mode are static, and never consider their temporal evolution. To fill this gap, we propose NONparametric FActor Trajectory learning for dynamic tensor decomposition (NONFAT). We place Gaussian process (GP) priors in the frequency domain and conduct inverse Fourier transform via Gauss-Laguerre quadrature to sample the trajectory functions. In this way, we can overcome data sparsity and obtain robust trajectory estimates across long time horizons. Given the trajectory values at specific time points, we use a second-level GP to sample the entry values and to capture the temporal relationship between the entities. For efficient and scalable inference, we leverage the matrix Gaussian structure in the model, introduce a matrix Gaussian posterior, and develop a nested sparse variational learning algorithm. We have shown the advantage of our method in several real-world applications.  ( 2 min )
    Robust Counterfactual Explanations for Tree-Based Ensembles. (arXiv:2207.02739v1 [cs.LG])
    Counterfactual explanations inform ways to achieve a desired outcome from a machine learning model. However, such explanations are not robust to certain real-world changes in the underlying model (e.g., retraining the model, changing hyperparameters, etc.), questioning their reliability in several applications, e.g., credit lending. In this work, we propose a novel strategy -- that we call RobX -- to generate robust counterfactuals for tree-based ensembles, e.g., XGBoost. Tree-based ensembles pose additional challenges in robust counterfactual generation, e.g., they have a non-smooth and non-differentiable objective function, and they can change a lot in the parameter space under retraining on very similar data. We first introduce a novel metric -- that we call Counterfactual Stability -- that attempts to quantify how robust a counterfactual is going to be to model changes under retraining, and comes with desirable theoretical properties. Our proposed strategy RobX works with any counterfactual generation method (base method) and searches for robust counterfactuals by iteratively refining the counterfactual generated by the base method using our metric Counterfactual Stability. We compare the performance of RobX with popular counterfactual generation methods (for tree-based ensembles) across benchmark datasets. The results demonstrate that our strategy generates counterfactuals that are significantly more robust (nearly 100% validity after actual model changes) and also realistic (in terms of local outlier factor) over existing state-of-the-art methods.  ( 3 min )
    Implementing Reinforcement Learning Datacenter Congestion Control in NVIDIA NICs. (arXiv:2207.02295v1 [cs.NI])
    Cloud datacenters are exponentially growing both in numbers and size. This increase results in a network activity surge that warrants better congestion avoidance. The resulting challenge is two-fold: (i) designing algorithms that can be custom-tuned to the complex traffic patterns of a given datacenter; but, at the same time (ii) run on low-level hardware with the required low latency of effective Congestion Control (CC). In this work, we present a Reinforcement Learning (RL) based CC solution that learns from certain traffic scenarios and successfully generalizes to others. We then distill the RL neural network policy into binary decision trees to achieve the desired $\mu$sec decision latency required for real-time inference with RDMA. We deploy the distilled policy on NVIDIA NICs in a real network and demonstrate state-of-the-art performance, balancing all tested metrics simultaneously: bandwidth, latency, fairness, and packet drops.  ( 2 min )
    Evaluating Robustness to Dataset Shift via Parametric Robustness Sets. (arXiv:2205.15947v2 [cs.LG] UPDATED)
    We give a method for proactively identifying small, plausible shifts in distribution which lead to large differences in model performance. To ensure that these shifts are plausible, we parameterize them in terms of interpretable changes in causal mechanisms of observed variables. This defines a parametric robustness set of plausible distributions and a corresponding worst-case loss. While the loss under an individual parametric shift can be estimated via reweighting techniques such as importance sampling, the resulting worst-case optimization problem is non-convex, and the estimate may suffer from large variance. For small shifts, however, we can construct a local second-order approximation to the loss under shift and cast the problem of finding a worst-case shift as a particular non-convex quadratic optimization problem, for which efficient algorithms are available. We demonstrate that this second-order approximation can be estimated directly for shifts in conditional exponential family models, and we bound the approximation error. We apply our approach to a computer vision task (classifying gender from images), revealing sensitivity to shifts in non-causal attributes.
    Unsupervised Recurrent Federated Learning for Edge Popularity Prediction in Privacy-Preserving Mobile Edge Computing Networks. (arXiv:2207.00755v2 [cs.MM] UPDATED)
    Nowadays wireless communication is rapidly reshaping entire industry sectors. In particular, mobile edge computing (MEC) as an enabling technology for industrial Internet of things (IIoT) brings powerful computing/storage infrastructure closer to the mobile terminals and, thereby, significant lowers the response latency. To reap the benefit of proactive caching at the network edge, precise knowledge on the popularity pattern among the end devices is essential. However, the complex and dynamic nature of the content popularity over space and time as well as the data-privacy requirements in many IIoT scenarios pose tough challenges to its acquisition. In this article, we propose an unsupervised and privacy-preserving popularity prediction framework for MEC-enabled IIoT. The concepts of local and global popularities are introduced and the time-varying popularity of each user is modelled as a model-free Markov chain. On this basis, a novel unsupervised recurrent federated learning (URFL) algorithm is proposed to predict the distributed popularity while achieve privacy preservation and unsupervised training. Simulations indicate that the proposed framework can enhance the prediction accuracy in terms of a reduced root-mean-squared error by up to $60.5\%-68.7\%$. Additionally, manual labeling and violation of users' data privacy are both avoided.
    Progressive Latent Replay for efficient Generative Rehearsal. (arXiv:2207.01562v2 [cs.CV] UPDATED)
    We introduce a new method for internal replay that modulates the frequency of rehearsal based on the depth of the network. While replay strategies mitigate the effects of catastrophic forgetting in neural networks, recent works on generative replay show that performing the rehearsal only on the deeper layers of the network improves the performance in continual learning. However, the generative approach introduces additional computational overhead, limiting its applications. Motivated by the observation that earlier layers of neural networks forget less abruptly, we propose to update network layers with varying frequency using intermediate-level features during replay. This reduces the computational burden by omitting computations for both deeper layers of the generator and earlier layers of the main model. We name our method Progressive Latent Replay and show that it outperforms Internal Replay while using significantly fewer resources.
    Flow Completion Network: Inferring the Fluid Dynamics from Incomplete Flow Information using Graph Neural Networks. (arXiv:2205.04739v2 [physics.flu-dyn] UPDATED)
    This paper introduces a novel neural network - flow completion network (FCN) - to infer the fluid dynamics, includ-ing the flow field and the force acting on the body, from the incomplete data based on Graph Convolution AttentionNetwork. The FCN is composed of several graph convolution layers and spatial attention layers. It is designed to inferthe velocity field and the vortex force contribution of the flow field when combined with the vortex force map (VFM)method. Compared with other neural networks adopted in fluid dynamics, the FCN is capable of dealing with bothstructured data and unstructured data. The performance of the proposed FCN is assessed by the computational fluiddynamics (CFD) data on the flow field around a circular cylinder. The force coefficients predicted by our model arevalidated against those obtained directly from CFD. Moreover, it is shown that our model effectively utilizes the exist-ing flow field information and the gradient information simultaneously, giving a better performance than the traditionalconvolution neural network (CNN)-based and deep neural network (DNN)-based models. Specifically, among all thecases of different Reynolds numbers and different proportions of the training dataset, the results show that the proposedFCN achieves a maximum norm mean square error of 5.86% in the test dataset, which is much lower than those of thetraditional CNN-based and DNN-based models (42.32% and 15.63% respectively).
    The rise of the lottery heroes: why zero-shot pruning is hard. (arXiv:2202.12400v2 [cs.LG] UPDATED)
    Recent advances in deep learning optimization showed that just a subset of parameters are really necessary to successfully train a model. Potentially, such a discovery has broad impact from the theory to application; however, it is known that finding these trainable sub-network is a typically costly process. This inhibits practical applications: can the learned sub-graph structures in deep learning models be found at training time? In this work we explore such a possibility, observing and motivating why common approaches typically fail in the extreme scenarios of interest, and proposing an approach which potentially enables training with reduced computational effort. The experiments on either challenging architectures and datasets suggest the algorithmic accessibility over such a computational gain, and in particular a trade-off between accuracy achieved and training complexity deployed emerges.
    Motley: Benchmarking Heterogeneity and Personalization in Federated Learning. (arXiv:2206.09262v2 [cs.LG] UPDATED)
    Personalized federated learning considers learning models unique to each client in a heterogeneous network. The resulting client-specific models have been purported to improve metrics such as accuracy, fairness, and robustness in federated networks. However, despite a plethora of work in this area, it remains unclear: (1) which personalization techniques are most effective in various settings, and (2) how important personalization truly is for realistic federated applications. To better answer these questions, we propose Motley, a benchmark for personalized federated learning. Motley consists of a suite of cross-device and cross-silo federated datasets from varied problem domains, as well as thorough evaluation metrics for better understanding the possible impacts of personalization. We establish baselines on the benchmark by comparing a number of representative personalized federated learning methods. These initial results highlight strengths and weaknesses of existing approaches, and raise several open questions for the community. Motley aims to provide a reproducible means with which to advance developments in personalized and heterogeneity-aware federated learning, as well as the related areas of transfer learning, meta-learning, and multi-task learning.
    Adversarially Trained Actor Critic for Offline Reinforcement Learning. (arXiv:2202.02446v2 [cs.LG] UPDATED)
    We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm for offline reinforcement learning (RL) under insufficient data coverage, based on the concept of relative pessimism. ATAC is designed as a two-player Stackelberg game: A policy actor competes against an adversarially trained value critic, who finds data-consistent scenarios where the actor is inferior to the data-collection behavior policy. We prove that, when the actor attains no regret in the two-player game, running ATAC produces a policy that provably 1) outperforms the behavior policy over a wide range of hyperparameters that control the degree of pessimism, and 2) competes with the best policy covered by data with appropriately chosen hyperparameters. Compared with existing works, notably our framework offers both theoretical guarantees for general function approximation and a deep RL implementation scalable to complex environments and large datasets. In the D4RL benchmark, ATAC consistently outperforms state-of-the-art offline RL algorithms on a range of continuous control tasks.
    Unfolding AIS transmission behavior for vessel movement modeling on noisy data leveraging machine learning. (arXiv:2202.13867v2 [cs.LG] UPDATED)
    The oceans are a source of an impressive mixture of complex data that could be used to uncover relationships yet to be discovered. Such data comes from the oceans and their surface, such as Automatic Identification System (AIS) messages used for tracking vessels' trajectories. AIS messages are transmitted over radio or satellite at ideally periodic time intervals but vary irregularly over time. As such, this paper aims to model the AIS message transmission behavior through neural networks for forecasting upcoming AIS messages' content from multiple vessels, particularly in a simultaneous approach despite messages' temporal irregularities as outliers. We present a set of experiments comprising multiple algorithms for forecasting tasks with horizon sizes of varying lengths. Deep learning models (e.g., neural networks) revealed themselves to adequately preserve vessels' spatial awareness regardless of temporal irregularity. We show how convolutional layers, feed-forward networks, and recurrent neural networks can improve such tasks by working together. Experimenting with short, medium, and large-sized sequences of messages, our model achieved 36/37/38% of the Relative Percentage Difference - the lower, the better, whereas we observed 92/45/96% on the Elman's RNN, 51/52/40% on the GRU, and 129/98/61% on the LSTM. These results support our model as a driver for improving the prediction of vessel routes when analyzing multiple vessels of diverging types simultaneously under temporally noise data.
    SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy. (arXiv:2203.17001v2 [eess.AS] UPDATED)
    Deep learning based singing voice synthesis (SVS) systems have been demonstrated to flexibly generate singing with better qualities, compared to conventional statistical parametric based methods. However, neural systems are generally data-hungry and have difficulty to reach reasonable singing quality with limited public available training data. In this work, we explore different data augmentation methods to boost the training of SVS systems, including several strategies customized to SVS based on pitch augmentation and mix-up augmentation. To further stabilize the training, we introduce the cycle-consistent training strategy. Extensive experiments on two public singing databases demonstrate that our proposed augmentation methods and the stabilizing training strategy can significantly improve the performance on both objective and subjective evaluations.
    ADAST: Attentive Cross-domain EEG-based Sleep Staging Framework with Iterative Self-Training. (arXiv:2107.04470v4 [cs.LG] UPDATED)
    Sleep staging is of great importance in the diagnosis and treatment of sleep disorders. Recently, numerous data-driven deep learning models have been proposed for automatic sleep staging. They mainly train the model on a large public labeled sleep dataset and test it on a smaller one with subjects of interest. However, they usually assume that the train and test data are drawn from the same distribution, which may not hold in real-world scenarios. Unsupervised domain adaption (UDA) has been recently developed to handle this domain shift problem. However, previous UDA methods applied for sleep staging have two main limitations. First, they rely on a totally shared model for the domain alignment, which may lose the domain-specific information during feature extraction. Second, they only align the source and target distributions globally without considering the class information in the target domain, which hinders the classification performance of the model while testing. In this work, we propose a novel adversarial learning framework called ADAST to tackle the domain shift problem in the unlabeled target domain. First, we develop an unshared attention mechanism to preserve the domain-specific features in both domains. Second, we design an iterative self-training strategy to improve the classification performance on the target domain via target domain pseudo labels. We also propose dual distinct classifiers to increase the robustness and quality of the pseudo labels. The experimental results on six cross-domain scenarios validate the efficacy of our proposed framework and its advantage over state-of-the-art UDA methods. The source code is available at https://github.com/emadeldeen24/ADAST.
    Fast Density Estimation for Density-based Clustering Methods. (arXiv:2109.11383v3 [cs.LG] UPDATED)
    Density-based clustering algorithms are widely used for discovering clusters in pattern recognition and machine learning since they can deal with non-hyperspherical clusters and are robustness to handle outliers. However, the runtime of density-based algorithms are heavily dominated by finding fixed-radius near neighbors and calculating the density, which is time-consuming. Meanwhile, the traditional acceleration methods using indexing technique such as KD tree is not effective in processing high-dimensional data. In this paper, we propose a fast region query algorithm named fast principal component analysis pruning (called FPCAP) with the help of the fast principal component analysis technique in conjunction with geometric information provided by principal attributes of the data, which can process high-dimensional data and be easily applied to density-based methods to prune unnecessary distance calculations when finding neighbors and estimating densities. As an application in density-based clustering methods, FPCAP method was combined with the Density Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm. And then, an improved DBSCAN (called IDBSCAN) is obtained, which preserves the advantage of DBSCAN and meanwhile, greatly reduces the computation of redundant distances. Experiments on seven benchmark datasets demonstrate that the proposed algorithm improves the computational efficiency significantly.
    On the Effects of Artificial Data Modification. (arXiv:2110.13968v2 [cs.LG] UPDATED)
    Data distortion is commonly applied in vision models during both training (e.g methods like MixUp and CutMix) and evaluation (e.g. shape-texture bias and robustness). This data modification can introduce artificial information. It is often assumed that the resulting artefacts are detrimental to training, whilst being negligible when analysing models. We investigate these assumptions and conclude that in some cases they are unfounded and lead to incorrect results. Specifically, we show current shape bias identification methods and occlusion robustness measures are biased and propose a fairer alternative for the latter. Subsequently, through a series of experiments we seek to correct and strengthen the community's perception of how augmenting affects learning of vision models. Based on our empirical results we argue that the impact of the artefacts must be understood and exploited rather than eliminated.
    MoTiAC: Multi-Objective Actor-Critics for Real-Time Bidding. (arXiv:2002.07408v2 [cs.AI] UPDATED)
    Online Real-Time Bidding (RTB) is a complex auction game among which advertisers struggle to bid for ad impressions when a user request occurs. Considering display cost, Return on Investment (ROI), and other influential Key Performance Indicators (KPIs), large ad platforms try to balance the trade-off among various goals in dynamics. To address the challenge, we propose a Multi-ObjecTive Actor-Critics algorithm based on reinforcement learning (RL), named MoTiAC, for the problem of bidding optimization with various goals. In MoTiAC, objective-specific agents update the global network asynchronously with different goals and perspectives, leading to a robust bidding policy. Unlike previous RL models, the proposed MoTiAC can simultaneously fulfill multi-objective tasks in complicated bidding environments. In addition, we mathematically prove that our model will converge to Pareto optimality. Finally, experiments on a large-scale real-world commercial dataset from Tencent verify the effectiveness of MoTiAC versus a set of recent approaches
    Enhancing Adversarial Attacks on Single-Layer NVM Crossbar-Based Neural Networks with Power Consumption Information. (arXiv:2207.02764v1 [cs.LG])
    Adversarial attacks on state-of-the-art machine learning models pose a significant threat to the safety and security of mission-critical autonomous systems. This paper considers the additional vulnerability of machine learning models when attackers can measure the power consumption of their underlying hardware platform. In particular, we explore the utility of power consumption information for adversarial attacks on non-volatile memory crossbar-based single-layer neural networks. Our results from experiments with MNIST and CIFAR-10 datasets show that power consumption can reveal important information about the neural network's weight matrix, such as the 1-norm of its columns. That information can be used to infer the sensitivity of the network's loss with respect to different inputs. We also find that surrogate-based black box attacks that utilize crossbar power information can lead to improved attack efficiency.
    Landscape analysis for shallow neural networks: complete classification of critical points for affine target functions. (arXiv:2103.10922v3 [cs.LG] UPDATED)
    In this paper, we analyze the landscape of the true loss of neural networks with one hidden layer and ReLU, leaky ReLU, or quadratic activation. In all three cases, we provide a complete classification of the critical points in the case where the target function is affine and one-dimensional. In particular, we show that there exist no local maxima and clarify the structure of saddle points. Moreover, we prove that non-global local minima can only be caused by `dead' ReLU neurons. In particular, they do not appear in the case of leaky ReLU or quadratic activation. Our approach is of a combinatorial nature and builds on a careful analysis of the different types of hidden neurons that can occur.
    Epistemic Neural Networks. (arXiv:2107.08924v5 [cs.LG] UPDATED)
    Intelligence relies on an agent's knowledge of what it does not know. This capability can be assessed based on the quality of joint predictions of labels across multiple inputs. Conventional neural networks lack this capability and, since most research has focused on marginal predictions, this shortcoming has been largely overlooked. We introduce the epistemic neural network (ENN) as an interface for models that represent uncertainty as required to generate useful joint predictions. While prior approaches to uncertainty modeling such as Bayesian neural networks can be expressed as ENNs, this new interface facilitates comparison of joint predictions and the design of novel architectures and algorithms. In particular, we introduce the epinet: an architecture that can supplement any conventional neural network, including large pretrained models, and can be trained with modest incremental computation to estimate uncertainty. With an epinet, conventional neural networks outperform very large ensembles, consisting of hundreds or more particles, with orders of magnitude less computation. We demonstrate this efficacy across synthetic data, ImageNet, and some reinforcement learning tasks. As part of this effort we open-source experiment code.
    Topological Information Retrieval with Dilation-Invariant Bottleneck Comparative Measures. (arXiv:2104.01672v3 [stat.ML] UPDATED)
    Appropriately representing elements in a database so that queries may be accurately matched is a central task in information retrieval; recently, this has been achieved by embedding the graphical structure of the database into a manifold in a hierarchy-preserving manner using a variety of metrics. Persistent homology is a tool commonly used in topological data analysis that is able to rigorously characterize a database in terms of both its hierarchy and connectivity structure. Computing persistent homology on a variety of embedded datasets reveals that some commonly used embeddings fail to preserve the connectivity. We show that those embeddings which successfully retain the database topology coincide in persistent homology by introducing two dilation-invariant comparative measures to capture this effect: in particular, they address the issue of metric distortion on manifolds. We provide an algorithm for their computation that exhibits greatly reduced time complexity over existing methods. We use these measures to perform the first instance of topology-based information retrieval and demonstrate its increased performance over the standard bottleneck distance for persistent homology. We showcase our approach on databases of different data varieties including text, videos, and medical images.
    NAS-Bench-360: Benchmarking Neural Architecture Search on Diverse Tasks. (arXiv:2110.05668v4 [cs.CV] UPDATED)
    Most existing neural architecture search (NAS) benchmarks and algorithms prioritize well-studied tasks, e.g. image classification on CIFAR or ImageNet. This makes the performance of NAS approaches in more diverse areas poorly understood. In this paper, we present NAS-Bench-360, a benchmark suite to evaluate methods on domains beyond those traditionally studied in architecture search, and use it to address the following question: do state-of-the-art NAS methods perform well on diverse tasks? To construct the benchmark, we curate ten tasks spanning a diverse array of application domains, dataset sizes, problem dimensionalities, and learning objectives. Each task is carefully chosen to interoperate with modern CNN-based search methods while possibly being far-afield from its original development domain. To speed up and reduce the cost of NAS research, for two of the tasks we release the precomputed performance of 15,625 architectures comprising a standard CNN search space. Experimentally, we show the need for more robust NAS evaluation of the kind NAS-Bench-360 enables by showing that several modern NAS procedures perform inconsistently across the ten tasks, with many catastrophically poor results. We also demonstrate how NAS-Bench-360 and its associated precomputed results will enable future scientific discoveries by testing whether several recent hypotheses promoted in the NAS literature hold on diverse tasks. NAS-Bench-360 is hosted at https://nb360.ml.cmu.edu.
    Machine Learning for Stuttering Identification: Review, Challenges and Future Directions. (arXiv:2107.04057v3 [cs.SD] UPDATED)
    Stuttering is a speech disorder during which the flow of speech is interrupted by involuntary pauses and repetition of sounds. Stuttering identification is an interesting interdisciplinary domain research problem which involves pathology, psychology, acoustics, and signal processing that makes it hard and complicated to detect. Recent developments in machine and deep learning have dramatically revolutionized speech domain, however minimal attention has been given to stuttering identification. This work fills the gap by trying to bring researchers together from interdisciplinary fields. In this paper, we review comprehensively acoustic features, statistical and deep learning based stuttering/disfluency classification methods. We also present several challenges and possible future directions.
    Graph Trees with Attention. (arXiv:2207.02760v1 [cs.LG])
    When dealing with tabular data, models based on regression and decision trees are a popular choice due to the high accuracy they provide on such tasks and their ease of application as compared to other model classes. Yet, when it comes to graph-structure data, current tree learning algorithms do not provide tools to manage the structure of the data other than relying on feature engineering. In this work we address the above gap, and introduce Graph Trees with Attention (GTA), a new family of tree-based learning algorithms that are designed to operate on graphs. GTA leverages both the graph structure and the features at the vertices and employs an attention mechanism that allows decisions to concentrate on sub-structures of the graph. We analyze GTA models and show that they are strictly more expressive than plain decision trees. We also demonstrate the benefits of GTA empirically on multiple graph and node prediction benchmarks. In these experiments, GTA always outperformed other tree-based models and often outperformed other types of graph-learning algorithms such as Graph Neural Networks (GNNs) and Graph Kernels. Finally, we also provide an explainability mechanism for GTA, and demonstrate it can provide intuitive explanations.
    Improved conformalized quantile regression. (arXiv:2207.02808v1 [stat.ML])
    Conformalized quantile regression is a procedure that inherits the advantages of conformal prediction and quantile regression. That is, we use quantile regression to estimate the true conditional quantile and then apply a conformal step on a calibration set to ensure marginal coverage. In this way, we get adaptive prediction intervals that account for heteroscedasticity. However, the aforementioned conformal step lacks adaptiveness as described in (Romano et al., 2019). To overcome this limitation, instead of applying a single conformal step after estimating conditional quantiles with quantile regression, we propose to cluster the explanatory variables weighted by their permutation importance with an optimized k-means and apply k conformal steps. To show that this improved version outperforms the classic version of conformalized quantile regression and is more adaptive to heteroscedasticity, we extensively compare the prediction intervals of both in open datasets.
    Avoiding Forgetting and Allowing Forward Transfer in Continual Learning via Sparse Networks. (arXiv:2110.05329v3 [cs.LG] UPDATED)
    Using task-specific components within a neural network in continual learning (CL) is a compelling strategy to address the stability-plasticity dilemma in fixed-capacity models without access to past data. Current methods focus only on selecting a sub-network for a new task that reduces forgetting of past tasks. However, this selection could limit the forward transfer of relevant past knowledge that helps in future learning. Our study reveals that satisfying both objectives jointly is more challenging when a unified classifier is used for all classes of seen tasks-class-Incremental Learning (class-IL)-as it is prone to ambiguities between classes across tasks. Moreover, the challenge increases when the semantic similarity of classes across tasks increases. To address this challenge, we propose a new CL method, named AFAF, that aims to Avoid Forgetting and Allow Forward transfer in class-IL using fix-capacity models. AFAF allocates a sub-network that enables selective transfer of relevant knowledge to a new task while preserving past knowledge, reusing some of the previously allocated components to utilize the fixed-capacity, and addressing class-ambiguities when similarities exist. The experiments show the effectiveness of AFAF in providing models with multiple CL desirable properties, while outperforming state-of-the-art methods on various challenging benchmarks with different semantic similarities.
    Novel Techniques to Assess Predictive Systems and Reduce Their Alarm Burden. (arXiv:2102.05691v3 [cs.LG] UPDATED)
    Machine prediction algorithms (e.g., binary classifiers) often are adopted on the basis of claimed performance using classic metrics such as sensitivity and predictive value. However, classifier performance depends heavily upon the context (workflow) in which the classifier operates. Classic metrics do not reflect the realized utility of a predictor unless certain implicit assumptions are met, and these assumptions cannot be met in many common clinical scenarios. This often results in suboptimal implementations and in disappointment when expected outcomes are not achieved. One common failure mode for classic metrics arises when multiple predictions can be made for the same event, particularly when redundant true positive predictions produce little additional value. This describes many clinical alerting systems. We explain why classic metrics cannot correctly represent predictor performance in such contexts, and introduce an improved performance assessment technique using utility functions to score predictions based on their utility in a specific workflow context. The resulting utility metrics (u-metrics) explicitly account for the effects of temporal relationships on prediction utility. Compared to traditional measures, u-metrics more accurately reflect the real world costs and benefits of a predictor operating in a live clinical context. The improvement can be significant. We also describe a formal approach to snoozing, a mitigation strategy in which some predictions are suppressed to improve predictor performance by reducing false positives while retaining event capture. Snoozing is especially useful for predictors that generate interruptive alarms. U-metrics correctly measure and predict the performance benefits of snoozing, whereas traditional metrics do not.
    DexMV: Imitation Learning for Dexterous Manipulation from Human Videos. (arXiv:2108.05877v5 [cs.LG] UPDATED)
    While significant progress has been made on understanding hand-object interactions in computer vision, it is still very challenging for robots to perform complex dexterous manipulation. In this paper, we propose a new platform and pipeline DexMV (Dexterous Manipulation from Videos) for imitation learning. We design a platform with: (i) a simulation system for complex dexterous manipulation tasks with a multi-finger robot hand and (ii) a computer vision system to record large-scale demonstrations of a human hand conducting the same tasks. In our novel pipeline, we extract 3D hand and object poses from videos, and propose a novel demonstration translation method to convert human motion to robot demonstrations. We then apply and benchmark multiple imitation learning algorithms with the demonstrations. We show that the demonstrations can indeed improve robot learning by a large margin and solve the complex tasks which reinforcement learning alone cannot solve. More details can be found in the project page: https://yzqin.github.io/dexmv
    Histopathology DatasetGAN: Synthesizing Large-Resolution Histopathology Datasets. (arXiv:2207.02712v1 [eess.IV])
    Self-supervised learning (SSL) methods are enabling an increasing number of deep learning models to be trained on image datasets in domains where labels are difficult to obtain. These methods, however, struggle to scale to the high resolution of medical imaging datasets, where they are critical for achieving good generalization on label-scarce medical image datasets. In this work, we propose the Histopathology DatasetGAN (HDGAN) framework, an extension of the DatasetGAN semi-supervised framework for image generation and segmentation that scales well to large-resolution histopathology images. We make several adaptations from the original framework, including updating the generative backbone, selectively extracting latent features from the generator, and switching to memory-mapped arrays. These changes reduce the memory consumption of the framework, improving its applicability to medical imaging domains. We evaluate HDGAN on a thrombotic microangiopathy high-resolution tile dataset, demonstrating strong performance on the high-resolution image-annotation generation task. We hope that this work enables more application of deep learning models to medical datasets, in addition to encouraging more exploration of self-supervised frameworks within the medical imaging domain.
    Learning with Neighbor Consistency for Noisy Labels. (arXiv:2202.02200v2 [cs.CV] UPDATED)
    Recent advances in deep learning have relied on large, labelled datasets to train high-capacity models. However, collecting large datasets in a time- and cost-efficient manner often results in label noise. We present a method for learning from noisy labels that leverages similarities between training examples in feature space, encouraging the prediction of each example to be similar to its nearest neighbours. Compared to training algorithms that use multiple models or distinct stages, our approach takes the form of a simple, additional regularization term. It can be interpreted as an inductive version of the classical, transductive label propagation algorithm. We thoroughly evaluate our method on datasets evaluating both synthetic (CIFAR-10, CIFAR-100) and realistic (mini-WebVision, WebVision, Clothing1M, mini-ImageNet-Red) noise, and achieve competitive or state-of-the-art accuracies across all of them.
    BFE and AdaBFE: A New Approach in Learning Rate Automation for Stochastic Optimization. (arXiv:2207.02763v1 [cs.LG])
    In this paper, a new gradient-based optimization approach by automatically adjusting the learning rate is proposed. This approach can be applied to design non-adaptive learning rate and adaptive learning rate. Firstly, I will introduce the non-adaptive learning rate optimization method: Binary Forward Exploration (BFE), and then the corresponding adaptive per-parameter learning rate method: Adaptive BFE (AdaBFE) is possible to be developed. This approach could be an alternative method to optimize the learning rate based on the stochastic gradient descent (SGD) algorithm besides the current non-adaptive learning rate methods e.g. SGD, momentum, Nesterov and the adaptive learning rate methods e.g. AdaGrad, AdaDelta, Adam... The purpose to develop this approach is not to beat the benchmark of other methods but just to provide a different perspective to optimize the gradient descent method, although some comparative study with previous methods will be made in the following sections. This approach is expected to be heuristic or inspire researchers to improve gradient-based optimization combined with previous methods.
    Architectural Optimization and Feature Learning for High-Dimensional Time Series Datasets. (arXiv:2202.13486v2 [cs.LG] UPDATED)
    As our ability to sense increases, we are experiencing a transition from data-poor problems, in which the central issue is a lack of relevant data, to data-rich problems, in which the central issue is to identify a few relevant features in a sea of observations. Motivated by applications in gravitational-wave astrophysics, we study the problem of predicting the presence of transient noise artifacts in a gravitational wave detector from a rich collection of measurements from the detector and its environment. We argue that feature learning--in which relevant features are optimized from data--is critical to achieving high accuracy. We introduce models that reduce the error rate by over 60% compared to the previous state of the art, which used fixed, hand-crafted features. Feature learning is useful not only because it improves performance on prediction tasks; the results provide valuable information about patterns associated with phenomena of interest that would otherwise be undiscoverable. In our application, features found to be associated with transient noise provide diagnostic information about its origin and suggest mitigation strategies. Learning in high-dimensional settings is challenging. Through experiments with a variety of architectures, we identify two key factors in successful models: sparsity, for selecting relevant variables within the high-dimensional observations; and depth, which confers flexibility for handling complex interactions and robustness with respect to temporal variations. We illustrate their significance through systematic experiments on real detector data. Our results provide experimental corroboration of common assumptions in the machine-learning community and have direct applicability to improving our ability to sense gravitational waves, as well as to many other problem settings with similarly high-dimensional, noisy, or partly irrelevant data.
    Self-supervised Detransformation Autoencoder for Representation Learning in Open Set Recognition. (arXiv:2105.13557v2 [cs.LG] UPDATED)
    The objective of Open set recognition (OSR) is to learn a classifier that can reject the unknown samples while classifying the known classes accurately. In this paper, we propose a self-supervision method, Detransformation Autoencoder (DTAE), for the OSR problem. This proposed method engages in learning representations that are invariant to the transformations of the input data. Experiments on several standard image datasets indicate that the pre-training process significantly improves the model performance in the OSR tasks. Meanwhile, our proposed self-supervision method achieves significant gains in detecting the unknown class and classifying the known classes. Moreover, our analysis indicates that DTAE can yield representations that contain more target class information and less transformation information than RotNet.
    Trading with the Momentum Transformer: An Intelligent and Interpretable Architecture. (arXiv:2112.08534v2 [cs.LG] UPDATED)
    We introduce the Momentum Transformer, an attention-based deep learning architecture which outperforms benchmark momentum and mean-reversion trading strategies. Unlike state-of-the-art Long Short-Term Memory (LSTM) architectures, which are sequential in nature, the attention mechanism provides our architecture with a direct connection to all previous time-steps. Our architecture enables us to learn longer-term dependencies, improves performance when considering returns net of transaction costs and naturally adapts to new market regimes, such as during the SARS-CoV-2 crisis. The Momentum Transformer is inherently interpretable, providing us with greater insights into our deep learning momentum trading strategy, including how it blends different classical strategies and the past time-steps which are of the greatest significance to the model.
    Detecting and Diagnosing Terrestrial Gravitational-Wave Mimics Through Feature Learning. (arXiv:2203.05086v2 [astro-ph.IM] UPDATED)
    As engineered systems grow in complexity, there is an increasing need for automatic methods that can detect, diagnose, and even correct transient anomalies that inevitably arise and can be difficult or impossible to diagnose and fix manually. Among the most sensitive and complex systems of our civilization are the detectors that search for incredibly small variations in distance caused by gravitational waves -- phenomena originally predicted by Albert Einstein to emerge and propagate through the universe as the result of collisions between black holes and other massive objects in deep space. The extreme complexity and precision of such detectors causes them to be subject to transient noise issues that can significantly limit their sensitivity and effectiveness. In this work, we present a demonstration of a method that can detect and characterize emergent transient anomalies of such massively complex systems. We illustrate the performance, precision, and adaptability of the automated solution via one of the prevalent issues limiting gravitational-wave discoveries: noise artifacts of terrestrial origin that contaminate gravitational wave observatories' highly sensitive measurements and can obscure or even mimic the faint astrophysical signals for which they are listening. Specifically, we demonstrate how a highly interpretable convolutional classifier can automatically learn to detect transient anomalies from auxiliary detector data without needing to observe the anomalies themselves. We also illustrate several other useful features of the model, including how it performs automatic variable selection to reduce tens of thousands of auxiliary data channels to only a few relevant ones; how it identifies behavioral signatures predictive of anomalies in those channels; and how it can be used to investigate individual anomalies and the channels associated with them.
    Stochastic normalizing flows as non-equilibrium transformations. (arXiv:2201.08862v3 [hep-lat] UPDATED)
    Normalizing flows are a class of deep generative models that provide a promising route to sample lattice field theories more efficiently than conventional Monte Carlo simulations. In this work we show that the theoretical framework of stochastic normalizing flows, in which neural-network layers are combined with Monte Carlo updates, is the same that underlies out-of-equilibrium simulations based on Jarzynski's equality, which have been recently deployed to compute free-energy differences in lattice gauge theories. We lay out a strategy to optimize the efficiency of this extended class of generative models and present examples of applications.
    Artificial Intelligence-Assisted Optimization and Multiphase Analysis of Polygon PEM Fuel Cells. (arXiv:2205.06768v2 [cs.NE] UPDATED)
    This article presents new hexagonal and pentagonal PEM fuel cell models. The models have been optimized after achieving improved cell performance. The input parameters of the multi-objective optimization algorithm were pressure and temperature at the inlet, and consumption and output powers were the objective parameters. The output data of the numerical simulation has been trained using deep neural networks and then modeled with polynomial regression. The target functions have been extracted using the RSM (Response Surface Method), and the targets were optimized using the multi-objective genetic algorithm (NSGA-II). Compared to the base model, the optimized Pentagonal and Hexagonal models increase the output current density by 21.8% and 39.9%, respectively.
    Speech Denoising in the Waveform Domain with Self-Attention. (arXiv:2202.07790v2 [cs.SD] UPDATED)
    In this work, we present CleanUNet, a causal speech denoising model on the raw waveform. The proposed model is based on an encoder-decoder architecture combined with several self-attention blocks to refine its bottleneck representations, which is crucial to obtain good results. The model is optimized through a set of losses defined over both waveform and multi-resolution spectrograms. The proposed method outperforms the state-of-the-art models in terms of denoised speech quality from various objective and subjective evaluation metrics.
    Deep Learning Approximation of Diffeomorphisms via Linear-Control Systems. (arXiv:2110.12393v2 [math.OC] UPDATED)
    In this paper we propose a Deep Learning architecture to approximate diffeomorphisms diffeotopic to the identity. We consider a control system of the form $\dot x = \sum_{i=1}^lF_i(x)u_i$, with linear dependence in the controls, and we use the corresponding flow to approximate the action of a diffeomorphism on a compact ensemble of points. Despite the simplicity of the control system, it has been recently shown that a Universal Approximation Property holds. The problem of minimizing the sum of the training error and of a regularizing term induces a gradient flow in the space of admissible controls. A possible training procedure for the discrete-time neural network consists in projecting the gradient flow onto a finite-dimensional subspace of the admissible controls. An alternative approach relies on an iterative method based on Pontryagin Maximum Principle for the numerical resolution of Optimal Control problems. Here the maximization of the Hamiltonian can be carried out with an extremely low computational effort, owing to the linear dependence of the system in the control variables.
    A Recurrent Differentiable Engine for Modeling Tensegrity Robots Trainable with Low-Frequency Data. (arXiv:2203.00041v2 [cs.RO] UPDATED)
    Tensegrity robots, composed of rigid rods and flexible cables, are difficult to accurately model and control given the presence of complex dynamics and high number of DoFs. Differentiable physics engines have been recently proposed as a data-driven approach for model identification of such complex robotic systems. These engines are often executed at a high-frequency to achieve accurate simulation. Ground truth trajectories for training differentiable engines, however, are not typically available at such high frequencies due to limitations of real-world sensors. The present work focuses on this frequency mismatch, which impacts the modeling accuracy. We proposed a recurrent structure for a differentiable physics engine of tensegrity robots, which can be trained effectively even with low-frequency trajectories. To train this new recurrent engine in a robust way, this work introduces relative to prior work: (i) a new implicit integration scheme, (ii) a progressive training pipeline, and (iii) a differentiable collision checker. A model of NASA's icosahedron SUPERballBot on MuJoCo is used as the ground truth system to collect training data. Simulated experiments show that once the recurrent differentiable engine has been trained given the low-frequency trajectories from MuJoCo, it is able to match the behavior of MuJoCo's system. The criterion for success is whether a locomotion strategy learned using the differentiable engine can be transferred back to the ground-truth system and result in a similar motion. Notably, the amount of ground truth data needed to train the differentiable engine, such that the policy is transferable to the ground truth system, is 1% of the data needed to train the policy directly on the ground-truth system.
    Benchmarking of DL Libraries and Models on Mobile Devices. (arXiv:2202.06512v2 [cs.LG] UPDATED)
    Deploying deep learning (DL) on mobile devices has been a notable trend in recent years. To support fast inference of on-device DL, DL libraries play a critical role as algorithms and hardware do. Unfortunately, no prior work ever dives deep into the ecosystem of modern DL libs and provides quantitative results on their performance. In this paper, we first build a comprehensive benchmark that includes 6 representative DL libs and 15 diversified DL models. We then perform extensive experiments on 10 mobile devices, which help reveal a complete landscape of the current mobile DL libs ecosystem. For example, we find that the best-performing DL lib is severely fragmented across different models and hardware, and the gap between those DL libs can be rather huge. In fact, the impacts of DL libs can overwhelm the optimizations from algorithms or hardware, e.g., model quantization and GPU/DSP-based heterogeneous computing. Finally, atop the observations, we summarize practical implications to different roles in the DL lib ecosystem.
    Reconstructing Nonlinear Dynamical Systems from Multi-Modal Time Series. (arXiv:2111.02922v3 [cs.LG] UPDATED)
    Empirically observed time series in physics, biology, or medicine, are commonly generated by some underlying dynamical system (DS) which is the target of scientific interest. There is an increasing interest to harvest machine learning methods to reconstruct this latent DS in a data-driven, unsupervised way. In many areas of science it is common to sample time series observations from many data modalities simultaneously, e.g. electrophysiological and behavioral time series in a typical neuroscience experiment. However, current machine learning tools for reconstructing DSs usually focus on just one data modality. Here we propose a general framework for multi-modal data integration for the purpose of nonlinear DS reconstruction and the analysis of cross-modal relations. This framework is based on dynamically interpretable recurrent neural networks as general approximators of nonlinear DSs, coupled to sets of modality-specific decoder models from the class of generalized linear models. Both an expectation-maximization and a variational inference algorithm for model training are advanced and compared. We show on nonlinear DS benchmarks that our algorithms can efficiently compensate for too noisy or missing information in one data channel by exploiting other channels, and demonstrate on experimental neuroscience data how the algorithm learns to link different data domains to the underlying dynamics.
    Adversarial Mask: Real-World Universal Adversarial Attack on Face Recognition Models. (arXiv:2111.10759v2 [cs.CV] UPDATED)
    Deep learning-based facial recognition (FR) models have demonstrated state-of-the-art performance in the past few years, even when wearing protective medical face masks became commonplace during the COVID-19 pandemic. Given the outstanding performance of these models, the machine learning research community has shown increasing interest in challenging their robustness. Initially, researchers presented adversarial attacks in the digital domain, and later the attacks were transferred to the physical domain. However, in many cases, attacks in the physical domain are conspicuous, and thus may raise suspicion in real-world environments (e.g., airports). In this paper, we propose Adversarial Mask, a physical universal adversarial perturbation (UAP) against state-of-the-art FR models that is applied on face masks in the form of a carefully crafted pattern. In our experiments, we examined the transferability of our adversarial mask to a wide range of FR model architectures and datasets. In addition, we validated our adversarial mask's effectiveness in real-world experiments (CCTV use case) by printing the adversarial pattern on a fabric face mask. In these experiments, the FR system was only able to identify 3.34% of the participants wearing the mask (compared to a minimum of 83.34% with other evaluated masks). A demo of our experiments can be found at: https://youtu.be/_TXkDO5z11w.
    Two-Sample Testing in Reinforcement Learning. (arXiv:2201.08078v2 [cs.LG] UPDATED)
    Value-based reinforcement-learning algorithms have shown strong performances in games, robotics, and other real-world applications. The most popular sample-based method is $Q$-Learning. It subsequently performs updates by adjusting the current $Q$-estimate towards the observed reward and the maximum of the $Q$-estimates of the next state. The procedure introduces maximization bias with approaches like Double $Q$-Learning. We frame the bias problem statistically and consider it an instance of estimating the maximum expected value (MEV) of a set of random variables. We propose the $T$-Estimator (TE) based on two-sample testing for the mean, that flexibly interpolates between over- and underestimation by adjusting the significance level of the underlying hypothesis tests. A generalization, termed $K$-Estimator (KE), obeys the same bias and variance bounds as the TE while relying on a nearly arbitrary kernel function. We introduce modifications of $Q$-Learning and the Bootstrapped Deep $Q$-Network (BDQN) using the TE and the KE. Furthermore, we propose an adaptive variant of the TE-based BDQN that dynamically adjusts the significance level to minimize the absolute estimation bias. All proposed estimators and algorithms are thoroughly tested and validated on diverse tasks and environments, illustrating the bias control and performance potential of the TE and KE.
    Expectation Distance-based Distributional Clustering for Noise-Robustness. (arXiv:2110.08871v3 [cs.LG] UPDATED)
    This paper presents a clustering technique that reduces the susceptibility to data noise by learning and clustering the data-distribution and then assigning the data to the cluster of its distribution and, in the process, reducing the impact of noise on clustering results. This method involves introducing a new distance among distributions, namely the expectation distance (denoted, ED), that goes beyond the state-of-art distribution distance of optimal mass transport (denoted, $W_2$ for $2$-Wasserstein): The latter essentially depends only on the marginal distributions while the former also employs the information about the joint distributions. Using the ED, the paper extends the classical $K$-means and $K$-medoids clustering to those over data-distributions (rather raw data) and introduces $K$-medoids using $W_2$. The paper also presents the closed-form expressions of the ED distance measure for the case when the uncertainty is Gaussian. The implementation results of the proposed ED and the $W_2$ distance measures to cluster real-world weather data are also presented, which involves efficiently extracting and using underlying uncertainty information in the form of means and variances (that, for example, is adequate to characterize Gaussian distributions). The results show striking performance improvement over classical clustering of raw data, with higher accuracy realized for ED. This is because while $W_2$ employs only the marginal distributions ignoring the correlations, the proposed ED also uses the joint distributions factoring the correlations into the distance measures.
    SE(3) Equivariant Graph Neural Networks with Complete Local Frames. (arXiv:2110.14811v2 [cs.CE] UPDATED)
    Group equivariance (e.g. SE(3) equivariance) is a critical physical symmetry in science, from classical and quantum physics to computational biology. It enables robust and accurate prediction under arbitrary reference transformations. In light of this, great efforts have been put on encoding this symmetry into deep neural networks, which has been shown to improve the generalization performance and data efficiency for downstream tasks. Constructing an equivariant neural network generally brings high computational costs to ensure expressiveness. Therefore, how to better trade-off the expressiveness and computational efficiency plays a core role in the design of the equivariant deep learning models. In this paper, we propose a framework to construct SE(3) equivariant graph neural networks that can approximate the geometric quantities efficiently. Inspired by differential geometry and physics, we introduce equivariant local complete frames to graph neural networks, such that tensor information at given orders can be projected onto the frames. The local frame is constructed to form an orthonormal basis that avoids direction degeneration and ensure completeness. Since the frames are built only by cross product operations, our method is computationally efficient. We evaluate our method on two tasks: Newton mechanics modeling and equilibrium molecule conformation generation. Extensive experimental results demonstrate that our model achieves the best or competitive performance in two types of datasets.
    Neural network stochastic differential equation models with applications to financial data forecasting. (arXiv:2111.13164v5 [cs.LG] UPDATED)
    In this article, we employ a collection of stochastic differential equations with drift and diffusion coefficients approximated by neural networks to predict the trend of chaotic time series which has big jump properties. Our contributions are, first, we propose a model called L\'evy induced stochastic differential equation network, which explores compounded stochastic differential equations with $\alpha$-stable L\'evy motion to model complex time series data and solve the problem through neural network approximation. Second, we theoretically prove the convergence of our algorithm with respect to hyper-parameters of the neural network, and obtain the error bound without curse of dimensionality. Finally, we illustrate our method by applying it to real financial time series data and find the accuracy increases through the use of non-Gaussian L\'evy processes. We also present detailed comparisons in terms of data patterns, various models, different shapes of L\'evy motion and the prediction lengths.
    A Unified Survey on Anomaly, Novelty, Open-Set, and Out-of-Distribution Detection: Solutions and Future Challenges. (arXiv:2110.14051v3 [cs.CV] UPDATED)
    Machine learning models often encounter samples that are diverged from the training distribution. Failure to recognize an out-of-distribution (OOD) sample, and consequently assign that sample to an in-class label significantly compromises the reliability of a model. The problem has gained significant attention due to its importance for safety deploying models in open-world settings. Detecting OOD samples is challenging due to the intractability of modeling all possible unknown distributions. To date, several research domains tackle the problem of detecting unfamiliar samples, including anomaly detection, novelty detection, one-class learning, open set recognition, and out-of-distribution detection. Despite having similar and shared concepts, out-of-distribution, open-set, and anomaly detection have been investigated independently. Accordingly, these research avenues have not cross-pollinated, creating research barriers. While some surveys intend to provide an overview of these approaches, they seem to only focus on a specific domain without examining the relationship between different domains. This survey aims to provide a cross-domain and comprehensive review of numerous eminent works in respective areas while identifying their commonalities. Researchers can benefit from the overview of research advances in different fields and develop future methodology synergistically. Furthermore, to the best of our knowledge, while there are surveys in anomaly detection or one-class learning, there is no comprehensive or up-to-date survey on out-of-distribution detection, which our survey covers extensively. Finally, having a unified cross-domain perspective, we discuss and shed light on future lines of research, intending to bring these fields closer together.
    Quantum Logic Gate Synthesis as a Markov Decision Process. (arXiv:1912.12002v2 [quant-ph] UPDATED)
    Reinforcement learning has witnessed recent applications to a variety of tasks in quantum programming. The underlying assumption is that those tasks could be modeled as Markov Decision Processes (MDPs). Here, we investigate the feasibility of this assumption by exploring its consequences for two fundamental tasks in quantum programming: state preparation and gate compilation. By forming discrete MDPs, focusing exclusively on the single-qubit case (both with and without noise), we solve for the optimal policy exactly through policy iteration. We find optimal paths that correspond to the shortest possible sequence of gates to prepare a state, or compile a gate, up to some target accuracy. As an example, we find sequences of $H$ and $T$ gates with length as small as $11$ producing $\sim 99\%$ fidelity for states of the form $(HT)^{n} |0\rangle$ with values as large as $n=10^{10}$. In the presence of gate noise, we demonstrate how the optimal policy adapts to the effects of noisy gates in order to achieve a higher state fidelity. Our work shows that one can meaningfully impose a discrete, stochastic and Markovian nature to a continuous, deterministic and non-Markovian quantum evolution, and provides theoretical insight into why reinforcement learning may be successfully used to find optimally short gate sequences in quantum programming.
    Astroconformer: Inferring Surface Gravity of Stars from Stellar Light Curves with Transformer. (arXiv:2207.02787v1 [astro-ph.SR])
    We introduce Astroconformer, a Transformer-based model to analyze stellar light curves from the Kepler mission. We demonstrate that Astrconformer can robustly infer the stellar surface gravity as a supervised task. Importantly, as Transformer captures long-range information in the time series, it outperforms the state-of-the-art data-driven method in the field, and the critical role of self-attention is proved through ablation experiments. Furthermore, the attention map from Astroconformer exemplifies the long-range correlation information learned by the model, leading to a more interpretable deep learning approach for asteroseismology. Besides data from Kepler, we also show that the method can generalize to sparse cadence light curves from the Rubin Observatory, paving the way for the new era of asteroseismology, harnessing information from long-cadence ground-based observations.
    Deep Learning-based automated classification of Chinese Speech Sound Disorders. (arXiv:2205.11748v4 [cs.SD] CROSS LISTED)
    This article describes a system for analyzing acoustic data to assist in the diagnosis and classification of children's speech sound disorders (SSDs) using a computer. The analysis concentrated on identifying and categorizing four distinct types of Chinese SSDs. The study collected and generated a speech corpus containing 2540 stopping, backing, final consonant deletion process (FCDP), and affrication samples from 90 children aged 3--6 years with normal or pathological articulatory features. Each recording was accompanied by a detailed diagnostic annotation by two speech-language pathologists (SLPs). Classification of the speech samples was accomplished using three well-established neural network models for image classification. The feature maps were created using three sets of Mel-frequency cepstral coefficients (MFCC) parameters extracted from speech sounds and aggregated into a three-dimensional data structure as model input. We employed six techniques for data augmentation to augment the available dataset while avoiding overfitting. The experiments examine the usability of four different categories of Chinese phrases and characters. Experiments with different data subsets demonstrate the system's ability to accurately detect the analyzed pronunciation disorders. The best multi-class classification using a single Chinese phrase achieves an accuracy of 74.4~percent.
    Federated Neural Architecture Search. (arXiv:2002.06352v5 [cs.LG] UPDATED)
    To preserve user privacy while enabling mobile intelligence, techniques have been proposed to train deep neural networks on decentralized data. However, training over decentralized data makes the design of neural architecture quite difficult as it already was. Such difficulty is further amplified when designing and deploying different neural architectures for heterogeneous mobile platforms. In this work, we propose an automatic neural architecture search into the decentralized training, as a new DNN training paradigm called Federated Neural Architecture Search, namely federated NAS. To deal with the primary challenge of limited on-client computational and communication resources, we present FedNAS, a highly optimized framework for efficient federated NAS. FedNAS fully exploits the key opportunity of insufficient model candidate re-training during the architecture search process, and incorporates three key optimizations: parallel candidates training on partial clients, early dropping candidates with inferior performance, and dynamic round numbers. Tested on large-scale datasets and typical CNN architectures, FedNAS achieves comparable model accuracy as state-of-the-art NAS algorithm that trains models with centralized data, and also reduces the client cost by up to two orders of magnitude compared to a straightforward design of federated NAS.
    A multi-task network approach for calculating discrimination-free insurance prices. (arXiv:2207.02799v1 [cs.LG])
    In applications of predictive modeling, such as insurance pricing, indirect or proxy discrimination is an issue of major concern. Namely, there exists the possibility that protected policyholder characteristics are implicitly inferred from non-protected ones by predictive models, and are thus having an undesirable (or illegal) impact on prices. A technical solution to this problem relies on building a best-estimate model using all policyholder characteristics (including protected ones) and then averaging out the protected characteristics for calculating individual prices. However, such approaches require full knowledge of policyholders' protected characteristics, which may in itself be problematic. Here, we address this issue by using a multi-task neural network architecture for claim predictions, which can be trained using only partial information on protected characteristics, and it produces prices that are free from proxy discrimination. We demonstrate the use of the proposed model and we find that its predictive accuracy is comparable to a conventional feedforward neural network (on full information). However, this multi-task network has clearly superior performance in the case of partially missing policyholder information.
    Integral Probability Metrics PAC-Bayes Bounds. (arXiv:2207.00614v2 [stat.ML] UPDATED)
    We present a PAC-Bayes-style generalization bound which enables the replacement of the KL-divergence with a variety of Integral Probability Metrics (IPM). We provide instances of this bound with the IPM being the total variation metric and the Wasserstein distance. A notable feature of the obtained bounds is that they naturally interpolate between classical uniform convergence bounds in the worst case (when the prior and posterior are far away from each other), and preferable bounds in better cases (when the posterior and prior are close). This illustrates the possibility of reinforcing classical generalization bounds with algorithm- and data-dependent components, thus making them more suitable to analyze algorithms that use a large hypothesis space.
    Simple and Efficient Heterogeneous Graph Neural Network. (arXiv:2207.02547v1 [cs.LG])
    Heterogeneous graph neural networks (HGNNs) deliver the powerful capability to embed rich structural and semantic information of a heterogeneous graph into low-dimensional node representations. Existing HGNNs usually learn to embed information using hierarchy attention mechanism and repeated neighbor aggregation, suffering from unnecessary complexity and redundant computation. This paper proposes Simple and Efficient Heterogeneous Graph Neural Network (SeHGNN) which reduces this excess complexity through avoiding overused node-level attention within the same relation and pre-computing the neighbor aggregation in the pre-processing stage. Unlike previous work, SeHGNN utilizes a light-weight parameter-free neighbor aggregator to learn structural information for each metapath, and a transformer-based semantic aggregator to combine semantic information across metapaths for the final embedding of each node. As a result, SeHGNN offers the simple network structure, high prediction accuracy, and fast training speed. Extensive experiments on five real-world heterogeneous graphs demonstrate the superiority of SeHGNN over the state-of-the-arts on both the accuracy and training speed. Codes are available at https://github.com/ICT-GIMLab/SeHGNN.
    Transformers discover an elementary calculation system exploiting local attention and grid-like problem representation. (arXiv:2207.02536v1 [cs.LG])
    Mathematical reasoning is one of the most impressive achievements of human intellect but remains a formidable challenge for artificial intelligence systems. In this work we explore whether modern deep learning architectures can learn to solve a symbolic addition task by discovering effective arithmetic procedures. Although the problem might seem trivial at first glance, generalizing arithmetic knowledge to operations involving a higher number of terms, possibly composed by longer sequences of digits, has proven extremely challenging for neural networks. Here we show that universal transformers equipped with local attention and adaptive halting mechanisms can learn to exploit an external, grid-like memory to carry out multi-digit addition. The proposed model achieves remarkable accuracy even when tested with problems requiring extrapolation outside the training distribution; most notably, it does so by discovering human-like calculation strategies such as place value alignment.
    A Hybrid Approach for Binary Classification of Imbalanced Data. (arXiv:2207.02738v1 [cs.LG])
    Binary classification with an imbalanced dataset is challenging. Models tend to consider all samples as belonging to the majority class. Although existing solutions such as sampling methods, cost-sensitive methods, and ensemble learning methods improve the poor accuracy of the minority class, these methods are limited by overfitting problems or cost parameters that are difficult to decide. We propose HADR, a hybrid approach with dimension reduction that consists of data block construction, dimentionality reduction, and ensemble learning with deep neural network classifiers. We evaluate the performance on eight imbalanced public datasets in terms of recall, G-mean, and AUC. The results show that our model outperforms state-of-the-art methods.
    Text Enriched Sparse Hyperbolic Graph Convolutional Networks. (arXiv:2207.02368v1 [cs.IR])
    Heterogeneous networks, which connect informative nodes containing text with different edge types, are routinely used to store and process information in various real-world applications. Graph Neural Networks (GNNs) and their hyperbolic variants provide a promising approach to encode such networks in a low-dimensional latent space through neighborhood aggregation and hierarchical feature extraction, respectively. However, these approaches typically ignore metapath structures and the available semantic information. Furthermore, these approaches are sensitive to the noise present in the training data. To tackle these limitations, in this paper, we propose Text Enriched Sparse Hyperbolic Graph Convolution Network (TESH-GCN) to capture the graph's metapath structures using semantic signals and further improve prediction in large heterogeneous graphs. In TESH-GCN, we extract semantic node information, which successively acts as a connection signal to extract relevant nodes' local neighborhood and graph-level metapath features from the sparse adjacency tensor in a reformulated hyperbolic graph convolution layer. These extracted features in conjunction with semantic features from the language model (for robustness) are used for the final downstream task. Experiments on various heterogeneous graph datasets show that our model outperforms the current state-of-the-art approaches by a large margin on the task of link prediction. We also report a reduction in both the training time and model parameters compared to the existing hyperbolic approaches through a reformulated hyperbolic graph convolution. Furthermore, we illustrate the robustness of our model by experimenting with different levels of simulated noise in both the graph structure and text, and also, present a mechanism to explain TESH-GCN's prediction by analyzing the extracted metapaths.
    Cascaded Deep Hybrid Models for Multistep Household Energy Consumption Forecasting. (arXiv:2207.02589v1 [cs.LG])
    Sustainability requires increased energy efficiency with minimal waste. The future power systems should thus provide high levels of flexibility iin controling energy consumption. Precise projections of future energy demand/load at the aggregate and on the individual site levels are of great importance for decision makers and professionals in the energy industry. Forecasting energy loads has become more advantageous for energy providers and customers, allowing them to establish an efficient production strategy to satisfy demand. This study introduces two hybrid cascaded models for forecasting multistep household power consumption in different resolutions. The first model integrates Stationary Wavelet Transform (SWT), as an efficient signal preprocessing technique, with Convolutional Neural Networks and Long Short Term Memory (LSTM). The second hybrid model combines SWT with a self-attention based neural network architecture named transformer. The major constraint of using time-frequency analysis methods such as SWT in multistep energy forecasting problems is that they require sequential signals, making signal reconstruction problematic in multistep forecasting applications.The cascaded models can efficiently address this problem through using the recursive outputs. Experimental results show that the proposed hybrid models achieve superior prediction performance compared to the existing multistep power consumption prediction methods. The results will pave the way for more accurate and reliable forecasting of household power consumption.
    Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design. (arXiv:2207.02575v1 [cs.LG])
    While much progress has been made in understanding the minimax sample complexity of reinforcement learning (RL) -- the complexity of learning on the "worst-case" instance -- such measures of complexity often do not capture the true difficulty of learning. In practice, on an "easy" instance, we might hope to achieve a complexity far better than that achievable on the worst-case instance. In this work we seek to understand the "instance-dependent" complexity of learning near-optimal policies (PAC RL) in the setting of RL with linear function approximation. We propose an algorithm, \textsc{Pedel}, which achieves a fine-grained instance-dependent measure of complexity, the first of its kind in the RL with function approximation setting, thereby capturing the difficulty of learning on each particular problem instance. Through an explicit example, we show that \textsc{Pedel} yields provable gains over low-regret, minimax-optimal algorithms and that such algorithms are unable to hit the instance-optimal rate. Our approach relies on a novel online experiment design-based procedure which focuses the exploration budget on the "directions" most relevant to learning a near-optimal policy, and may be of independent interest.
    The Intrinsic Manifolds of Radiological Images and their Role in Deep Learning. (arXiv:2207.02797v1 [eess.IV])
    The manifold hypothesis is a core mechanism behind the success of deep learning, so understanding the intrinsic manifold structure of image data is central to studying how neural networks learn from the data. Intrinsic dataset manifolds and their relationship to learning difficulty have recently begun to be studied for the common domain of natural images, but little such research has been attempted for radiological images. We address this here. First, we compare the intrinsic manifold dimensionality of radiological and natural images. We also investigate the relationship between intrinsic dimensionality and generalization ability over a wide range of datasets. Our analysis shows that natural image datasets generally have a higher number of intrinsic dimensions than radiological images. However, the relationship between generalization ability and intrinsic dimensionality is much stronger for medical images, which could be explained as radiological images having intrinsic features that are more difficult to learn. These results give a more principled underpinning for the intuition that radiological images can be more challenging to apply deep learning to than natural image datasets common to machine learning research. We believe rather than directly applying models developed for natural images to the radiological imaging domain, more care should be taken to developing architectures and algorithms that are more tailored to the specific characteristics of this domain. The research shown in our paper, demonstrating these characteristics and the differences from natural images, is an important first step in this direction.
    Pure Transformers are Powerful Graph Learners. (arXiv:2207.02505v1 [cs.LG])
    We show that standard Transformers without graph-specific modifications can lead to promising results in graph learning both in theory and practice. Given a graph, we simply treat all nodes and edges as independent tokens, augment them with token embeddings, and feed them to a Transformer. With an appropriate choice of token embeddings, we prove that this approach is theoretically at least as expressive as an invariant graph network (2-IGN) composed of equivariant linear layers, which is already more expressive than all message-passing Graph Neural Networks (GNN). When trained on a large-scale graph dataset (PCQM4Mv2), our method coined Tokenized Graph Transformer (TokenGT) achieves significantly better results compared to GNN baselines and competitive results compared to Transformer variants with sophisticated graph-specific inductive bias. Our implementation is available at https://github.com/jw9730/tokengt.
    Instance-optimal PAC Algorithms for Contextual Bandits. (arXiv:2207.02357v1 [stat.ML])
    In the stochastic contextual bandit setting, regret-minimizing algorithms have been extensively researched, but their instance-minimizing best-arm identification counterparts remain seldom studied. In this work, we focus on the stochastic bandit problem in the $(\epsilon,\delta)$-$\textit{PAC}$ setting: given a policy class $\Pi$ the goal of the learner is to return a policy $\pi\in \Pi$ whose expected reward is within $\epsilon$ of the optimal policy with probability greater than $1-\delta$. We characterize the first $\textit{instance-dependent}$ PAC sample complexity of contextual bandits through a quantity $\rho_{\Pi}$, and provide matching upper and lower bounds in terms of $\rho_{\Pi}$ for the agnostic and linear contextual best-arm identification settings. We show that no algorithm can be simultaneously minimax-optimal for regret minimization and instance-dependent PAC for best-arm identification. Our main result is a new instance-optimal and computationally efficient algorithm that relies on a polynomial number of calls to an argmax oracle.
    Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation. (arXiv:2205.14141v2 [cs.CV] UPDATED)
    Masked image modeling (MIM) learns representations with remarkably good fine-tuning performances, overshadowing previous prevalent pre-training approaches such as image classification, instance contrastive learning, and image-text alignment. In this paper, we show that the inferior fine-tuning performance of these pre-training approaches can be significantly improved by a simple post-processing in the form of feature distillation (FD). The feature distillation converts the old representations to new representations that have a few desirable properties just like those representations produced by MIM. These properties, which we aggregately refer to as optimization friendliness, are identified and analyzed by a set of attention- and optimization-related diagnosis tools. With these properties, the new representations show strong fine-tuning performance. Specifically, the contrastive self-supervised learning methods are made as competitive in fine-tuning as the state-of-the-art masked image modeling (MIM) algorithms. The CLIP models' fine-tuning performance is also significantly improved, with a CLIP ViT-L model reaching \textbf{89.0%} top-1 accuracy on ImageNet-1K classification. On the 3-billion-parameter SwinV2-G model, the fine-tuning accuracy on ADE20K semantic segmentation is improved by +1.5 mIoU to \textbf{61.4 mIoU}, creating a new record. More importantly, our work provides a way for the future research to focus more effort on the generality and scalability of the learnt representations without being pre-occupied with optimization friendliness since it can be enhanced rather easily. The code will be available at https://github.com/SwinTransformer/Feature-Distillation.
    Characterizing and Mitigating the Difficulty in Training Physics-informed Artificial Neural Networks under Pointwise Constraints. (arXiv:2206.09321v2 [cs.LG] UPDATED)
    Neural networks can be used to learn the solution of partial differential equations (PDEs) on arbitrary domains without requiring a computational mesh. Common approaches integrate differential operators in training neural networks using a structured loss function. The most common training algorithm for neural networks is backpropagation which relies on the gradient of the loss function with respect to the parameters of the network. In this work, we characterize the difficulty of training neural networks on physics by investigating the impact of differential operators in corrupting the back propagated gradients. Particularly, we show that perturbations present in the output of a neural network model during early stages of training lead to higher levels of noise in a structured loss function that is composed of high-order differential operators. These perturbations consequently corrupt the back-propagated gradients and impede convergence. We mitigate this issue by introducing auxiliary flux parameters to obtain a system of first-order differential equations. We formulate a non-linear unconstrained optimization problem using the augmented Lagrangian method that properly constrains the boundary conditions and adaptively focus on regions of higher gradients that are difficult to learn. We apply our approach to learn the solution of various benchmark PDE problems and demonstrate orders of magnitude improvement over existing approaches.
    Self-Normalized Density Map (SNDM) for Counting Microbiological Objects. (arXiv:2203.09474v2 [cs.CV] UPDATED)
    The statistical properties of the density map (DM) approach to counting microbiological objects on images are studied in detail. The DM is given by U$^2$-Net. Two statistical methods for deep neural networks are utilized: the bootstrap and the Monte Carlo (MC) dropout. The detailed analysis of the uncertainties for the DM predictions leads to a deeper understanding of the DM model's deficiencies. Based on our investigation, we propose a self-normalization module in the network. The improved network model, called \textit{Self-Normalized Density Map} (SNDM), can correct its output density map by itself to accurately predict the total number of objects in the image. The SNDM architecture outperforms the original model. Moreover, both statistical frameworks -- bootstrap and MC dropout -- have consistent statistical results for SNDM, which were not observed in the original model. The SNDM efficiency is comparable with the detector-base models, such as Faster and Cascade R-CNN detectors.
    A Heterogeneous Graph Based Framework for Multimodal Neuroimaging Fusion Learning. (arXiv:2110.08465v4 [cs.LG] UPDATED)
    Graph neural networks (GNNs) provide powerful insights for brain neuroimaging technology from the view of graphical networks. However, most existing GNN-based models assume that the neuroimaging-produced brain connectome network is a homogeneous graph with single types of nodes and edges. In fact, emerging studies have reported and emphasized the significance of heterogeneity among human brain activities, especially between the two cerebral hemispheres. Thus, homogeneous-structured brain network-based graph methods are insufficient for modelling complicated cerebral activity states. To overcome this problem, in this paper, we present a heterogeneous graph neural network (HeBrainGNN) for multimodal brain neuroimaging fusion learning. We first model the brain network as a heterogeneous graph with multitype nodes (i.e., left and right hemispheric nodes) and multitype edges (i.e., intra- and interhemispheric edges). Then, we propose a self-supervised pretraining strategy based on a heterogeneous brain network to address the potential overfitting problem caused by the conflict between a large parameter size and a small medical data sample size. Our results show the superiority of the proposed model over other existing methods in brain-related disease prediction tasks. Ablation experiments show that our heterogeneous graph-based model attaches more importance to hemishpheric connections that may be neglected due to their low strength by previous homogeneous graph models. Other experiments also indicate that our proposed model with a pretraining strategy alleviates the problem of limited labelled data and yields a significant improvement in accuracy.
    Deep Contrastive Patch-Based Subspace Learning for Camera Image Signal Processing. (arXiv:2104.00253v3 [eess.IV] UPDATED)
    Camera Image Signal Processing(ISP) pipelines, including deep learning trained versions, can get appealing results in different image signal processing tasks. However, most if not all of these methods tend to apply a single filter that is homogeneous over the entire image. This is also particularly true when an encoder-decoder type deep architecture is trained for the task. However, it is natural to view a camera image as heterogeneous, as the color intensity and the artificial noise are distributed vastly different, even across the two dimensional domain of a single image. Varied Moire ringing, motion-blur, color-bleaching or lens based projection distortions can all potentially lead to a heterogeneous image artifact filtering problem. In this paper, we present a specific patch-based, local subspace deep neural network that improves Camera ISP to be robust to heterogeneous artifacts (especially image denoising). We call our three-fold deep trained model the Patch Subspace Learning Autoencoder (PSL-AE). PSL-AE does not necessarily assume uniform image distortion levels nor repeated nor similar artifact types within the image. Rather, PSL-AE first diagnostically encodes patches extracted from noisy and clean image pairs, with different artifact type and distortion levels, by contrastive learning. Then, each image's patches are encoded into soft-clusters in their appropriate latent sub-space, using a prior mixture model. Lastly, the decoders of the PSL-AE are also trained in an unsupervised manner customized for the image patches in each soft-cluster. Our experimental results demonstrates the flexibility and performance that one can achieve through improved heterogeneous filtering, both from synthesized artifacts but also realistic SIDD image pairs.
    Domain Adaptive Hand Keypoint and Pixel Localization in the Wild. (arXiv:2203.08344v4 [cs.CV] UPDATED)
    We aim to improve the performance of regressing hand keypoints and segmenting pixel-level hand masks under new imaging conditions (e.g., outdoors) when we only have labeled images taken under very different conditions (e.g., indoors). In the real world, it is important that the model trained for both tasks works under various imaging conditions. However, their variation covered by existing labeled hand datasets is limited. Thus, it is necessary to adapt the model trained on the labeled images (source) to unlabeled images (target) with unseen imaging conditions. While self-training domain adaptation methods (i.e., learning from the unlabeled target images in a self-supervised manner) have been developed for both tasks, their training may degrade performance when the predictions on the target images are noisy. To avoid this, it is crucial to assign a low importance (confidence) weight to the noisy predictions during self-training. In this paper, we propose to utilize the divergence of two predictions to estimate the confidence of the target image for both tasks. These predictions are given from two separate networks, and their divergence helps identify the noisy predictions. To integrate our proposed confidence estimation into self-training, we propose a teacher-student framework where the two networks (teachers) provide supervision to a network (student) for self-training, and the teachers are learned from the student by knowledge distillation. Our experiments show its superiority over state-of-the-art methods in adaptation settings with different lighting, grasping objects, backgrounds, and camera viewpoints. Our method improves by 4% the multi-task score on HO3D compared to the latest adversarial adaptation method. We also validate our method on Ego4D, egocentric videos with rapid changes in imaging conditions outdoors.
    SAAC: Safe Reinforcement Learning as an Adversarial Game of Actor-Critics. (arXiv:2204.09424v2 [cs.LG] UPDATED)
    Although Reinforcement Learning (RL) is effective for sequential decision-making problems under uncertainty, it still fails to thrive in real-world systems where risk or safety is a binding constraint. In this paper, we formulate the RL problem with safety constraints as a non-zero-sum game. While deployed with maximum entropy RL, this formulation leads to a safe adversarially guided soft actor-critic framework, called SAAC. In SAAC, the adversary aims to break the safety constraint while the RL agent aims to maximize the constrained value function given the adversary's policy. The safety constraint on the agent's value function manifests only as a repulsion term between the agent's and the adversary's policies. Unlike previous approaches, SAAC can address different safety criteria such as safe exploration, mean-variance risk sensitivity, and CVaR-like coherent risk sensitivity. We illustrate the design of the adversary for these constraints. Then, in each of these variations, we show the agent differentiates itself from the adversary's unsafe actions in addition to learning to solve the task. Finally, for challenging continuous control tasks, we demonstrate that SAAC achieves faster convergence, better efficiency, and fewer failures to satisfy the safety constraints than risk-averse distributional RL and risk-neutral soft actor-critic algorithms.
    Predicting Kidney Transplant Survival using Multiple Feature Representations for HLAs. (arXiv:2103.03305v2 [cs.LG] UPDATED)
    Kidney transplantation can significantly enhance living standards for people suffering from end-stage renal disease. A significant factor that affects graft survival time (the time until the transplant fails and the patient requires another transplant) for kidney transplantation is the compatibility of the Human Leukocyte Antigens (HLAs) between the donor and recipient. In this paper, we propose 4 new biologically-relevant feature representations for incorporating HLA information into machine learning-based survival analysis algorithms. We evaluate our proposed HLA feature representations on a database of over 100,000 transplants and find that they improve prediction accuracy by about 1%, modest at the patient level but potentially significant at a societal level. Accurate prediction of survival times can improve transplant survival outcomes, enabling better allocation of donors to recipients and reducing the number of re-transplants due to graft failure with poorly matched donors.
    Variational Flow Graphical Model. (arXiv:2207.02722v1 [stat.ML])
    This paper introduces a novel approach to embed flow-based models with hierarchical structures. The proposed framework is named Variational Flow Graphical (VFG) Model. VFGs learn the representation of high dimensional data via a message-passing scheme by integrating flow-based functions through variational inference. By leveraging the expressive power of neural networks, VFGs produce a representation of the data using a lower dimension, thus overcoming the drawbacks of many flow-based models, usually requiring a high dimensional latent space involving many trivial variables. Aggregation nodes are introduced in the VFG models to integrate forward-backward hierarchical information via a message passing scheme. Maximizing the evidence lower bound (ELBO) of data likelihood aligns the forward and backward messages in each aggregation node achieving a consistency node state. Algorithms have been developed to learn model parameters through gradient updating regarding the ELBO objective. The consistency of aggregation nodes enable VFGs to be applicable in tractable inference on graphical structures. Besides representation learning and numerical inference, VFGs provide a new approach for distribution modeling on datasets with graphical latent structures. Additionally, theoretical study shows that VFGs are universal approximators by leveraging the implicitly invertible flow-based structures. With flexible graphical structures and superior excessive power, VFGs could potentially be used to improve probabilistic inference. In the experiments, VFGs achieves improved evidence lower bound (ELBO) and likelihood values on multiple datasets.
    PAC Prediction Sets for Meta-Learning. (arXiv:2207.02440v1 [cs.LG])
    Uncertainty quantification is a key component of machine learning models targeted at safety-critical systems such as in healthcare or autonomous vehicles. We study this problem in the context of meta learning, where the goal is to quickly adapt a predictor to new tasks. In particular, we propose a novel algorithm to construct \emph{PAC prediction sets}, which capture uncertainty via sets of labels, that can be adapted to new tasks with only a few training examples. These prediction sets satisfy an extension of the typical PAC guarantee to the meta learning setting; in particular, the PAC guarantee holds with high probability over future tasks. We demonstrate the efficacy of our approach on four datasets across three application domains: mini-ImageNet and CIFAR10-C in the visual domain, FewRel in the language domain, and the CDC Heart Dataset in the medical domain. In particular, our prediction sets satisfy the PAC guarantee while having smaller size compared to other baselines that also satisfy this guarantee.
    Enabling Fast Deep Learning on Tiny Energy-Harvesting IoT Devices. (arXiv:2111.14051v3 [cs.LG] UPDATED)
    Energy harvesting (EH) IoT devices that operate intermittently without batteries, coupled with advances in deep neural networks (DNNs), have opened up new opportunities for enabling sustainable smart applications. Nevertheless, implementing those computation and memory-intensive intelligent algorithms on EH devices is extremely difficult due to the challenges of limited resources and intermittent power supply that causes frequent failures. To address those challenges, this paper proposes a methodology that enables fast deep learning with low-energy accelerators for tiny energy harvesting devices. We first propose $RAD$, a resource-aware structured DNN training framework, which employs block circulant matrix and structured pruning to achieve high compression for leveraging the advantage of various vector operation accelerators. A DNN implementation method, $ACE$, is then proposed that employs low-energy accelerators to profit maximum performance with small energy consumption. Finally, we further design $FLEX$, the system support for intermittent computation in energy harvesting situations. Experimental results from three different DNN models demonstrate that $RAD$, $ACE$, and $FLEX$ can enable fast and correct inference on energy harvesting devices with up to 4.26X runtime reduction, up to 7.7X energy reduction with higher accuracy over the state-of-the-art.
    Fast Sparse Decision Tree Optimization via Reference Ensembles. (arXiv:2112.00798v7 [cs.LG] UPDATED)
    Sparse decision tree optimization has been one of the most fundamental problems in AI since its inception and is a challenge at the core of interpretable machine learning. Sparse decision tree optimization is computationally hard, and despite steady effort since the 1960's, breakthroughs have only been made on the problem within the past few years, primarily on the problem of finding optimal sparse decision trees. However, current state-of-the-art algorithms often require impractical amounts of computation time and memory to find optimal or near-optimal trees for some real-world datasets, particularly those having several continuous-valued features. Given that the search spaces of these decision tree optimization problems are massive, can we practically hope to find a sparse decision tree that competes in accuracy with a black box machine learning model? We address this problem via smart guessing strategies that can be applied to any optimal branch-and-bound-based decision tree algorithm. We show that by using these guesses, we can reduce the run time by multiple orders of magnitude, while providing bounds on how far the resulting trees can deviate from the black box's accuracy and expressive power. Our approach enables guesses about how to bin continuous features, the size of the tree, and lower bounds on the error for the optimal decision tree. Our experiments show that in many cases we can rapidly construct sparse decision trees that match the accuracy of black box models. To summarize: when you are having trouble optimizing, just guess.
    Online Bilevel Optimization: Regret Analysis of Online Alternating Gradient Methods. (arXiv:2207.02829v1 [math.OC])
    Online optimization is a well-established optimization paradigm that aims to make a sequence of correct decisions given knowledge of the correct answer to previous decision tasks. Bilevel programming involves a hierarchical optimization problem where the feasible region of the so-called outer problem is restricted by the graph of the solution set mapping of the inner problem. This paper brings these two ideas together and studies an online bilevel optimization setting in which a sequence of time-varying bilevel problems are revealed one after the other. We extend the known regret bounds for single-level online algorithms to the bilevel setting. Specifically, we introduce new notions of bilevel regret, develop an online alternating time-averaged gradient method that is capable of leveraging smoothness, and provide regret bounds in terms of the path-length of the inner and outer minimizer sequences.
    AutoSpeed: A Linked Autoencoder Approach for Pulse-Echo Speed-of-Sound Imaging for Medical Ultrasound. (arXiv:2207.02392v1 [eess.IV])
    Quantitative ultrasound, e.g., speed-of-sound (SoS) in tissues, provides information about tissue properties that have diagnostic value. Recent studies showed the possibility of extracting SoS information from pulse-echo ultrasound raw data (a.k.a. RF data) using deep neural networks that are fully trained on simulated data. These methods take sensor domain data, i.e., RF data, as input and train a network in an end-to-end fashion to learn the implicit mapping between the RF data domain and SoS domain. However, such networks are prone to overfitting to simulated data which results in poor performance and instability when tested on measured data. We propose a novel method for SoS mapping employing learned representations from two linked autoencoders. We test our approach on simulated and measured data acquired from human breast mimicking phantoms. We show that SoS mapping is possible using linked autoencoders. The proposed method has a Mean Absolute Percentage Error (MAPE) of 2.39% on the simulated data. On the measured data, the predictions of the proposed method are close to the expected values with MAPE of 1.1%. Compared to an end-to-end trained network, the proposed method shows higher stability and reproducibility.
    TractoFormer: A Novel Fiber-level Whole Brain Tractography Analysis Framework Using Spectral Embedding and Vision Transformers. (arXiv:2207.02327v1 [eess.IV])
    Diffusion MRI tractography is an advanced imaging technique for quantitative mapping of the brain's structural connectivity. Whole brain tractography (WBT) data contains over hundreds of thousands of individual fiber streamlines (estimated brain connections), and this data is usually parcellated to create compact representations for data analysis applications such as disease classification. In this paper, we propose a novel parcellation-free WBT analysis framework, TractoFormer, that leverages tractography information at the level of individual fiber streamlines and provides a natural mechanism for interpretation of results using the attention mechanism of transformers. TractoFormer includes two main contributions. First, we propose a novel and simple 2D image representation of WBT, TractoEmbedding, to encode 3D fiber spatial relationships and any feature of interest that can be computed from individual fibers (such as FA or MD). Second, we design a network based on vision transformers (ViTs) that includes: 1) data augmentation to overcome model overfitting on small datasets, 2) identification of discriminative fibers for interpretation of results, and 3) ensemble learning to leverage fiber information from different brain regions. In a synthetic data experiment, TractoFormer successfully identifies discriminative fibers with simulated group differences. In a disease classification experiment comparing several methods, TractoFormer achieves the highest accuracy in classifying schizophrenia vs control. Discriminative fibers are identified in left hemispheric frontal and parietal superficial white matter regions, which have previously been shown to be affected in schizophrenia patients.
    Ordinal Regression via Binary Preference vs Simple Regression: Statistical and Experimental Perspectives. (arXiv:2207.02454v1 [cs.LG])
    Ordinal regression with anchored reference samples (ORARS) has been proposed for predicting the subjective Mean Opinion Score (MOS) of input stimuli automatically. The ORARS addresses the MOS prediction problem by pairing a test sample with each of the pre-scored anchored reference samples. A trained binary classifier is then used to predict which sample, test or anchor, is better statistically. Posteriors of the binary preference decision are then used to predict the MOS of the test sample. In this paper, rigorous framework, analysis, and experiments to demonstrate that ORARS are advantageous over simple regressions are presented. The contributions of this work are: 1) Show that traditional regression can be reformulated into multiple preference tests to yield a better performance, which is confirmed with simulations experimentally; 2) Generalize ORARS to other regression problems and verify its effectiveness; 3) Provide some prerequisite conditions which can insure proper application of ORARS.
    Effective and Efficient Training for Sequential Recommendation using Recency Sampling. (arXiv:2207.02643v1 [cs.IR])
    Many modern sequential recommender systems use deep neural networks, which can effectively estimate the relevance of items but require a lot of time to train. Slow training increases expenses, hinders product development timescales and prevents the model from being regularly updated to adapt to changing user preferences. Training such sequential models involves appropriately sampling past user interactions to create a realistic training objective. The existing training objectives have limitations. For instance, next item prediction never uses the beginning of the sequence as a learning target, thereby potentially discarding valuable data. On the other hand, the item masking used by BERT4Rec is only weakly related to the goal of the sequential recommendation; therefore, it requires much more time to obtain an effective model. Hence, we propose a novel Recency-based Sampling of Sequences training objective that addresses both limitations. We apply our method to various recent and state-of-the-art model architectures - such as GRU4Rec, Caser, and SASRec. We show that the models enhanced with our method can achieve performances exceeding or very close to stateof-the-art BERT4Rec, but with much less training time.
    Tractable Dendritic RNNs for Reconstructing Nonlinear Dynamical Systems. (arXiv:2207.02542v1 [cs.LG])
    In many scientific disciplines, we are interested in inferring the nonlinear dynamical system underlying a set of observed time series, a challenging task in the face of chaotic behavior and noise. Previous deep learning approaches toward this goal often suffered from a lack of interpretability and tractability. In particular, the high-dimensional latent spaces often required for a faithful embedding, even when the underlying dynamics lives on a lower-dimensional manifold, can hamper theoretical analysis. Motivated by the emerging principles of dendritic computation, we augment a dynamically interpretable and mathematically tractable piecewise-linear (PL) recurrent neural network (RNN) by a linear spline basis expansion. We show that this approach retains all the theoretically appealing properties of the simple PLRNN, yet boosts its capacity for approximating arbitrary nonlinear dynamical systems in comparatively low dimensions. We employ two frameworks for training the system, one combining back-propagation-through-time (BPTT) with teacher forcing, and another based on fast and scalable variational inference. We show that the dendritically expanded PLRNN achieves better reconstructions with fewer parameters and dimensions on various dynamical systems benchmarks and compares favorably to other methods, while retaining a tractable and interpretable structure.
    Ensemble feature selection with clustering for analysis of high-dimensional, correlated clinical data in the search for Alzheimer's disease biomarkers. (arXiv:2207.02380v1 [cs.LG])
    Healthcare datasets often contain groups of highly correlated features, such as features from the same biological system. When feature selection is applied to these datasets to identify the most important features, the biases inherent in some multivariate feature selectors due to correlated features make it difficult for these methods to distinguish between the important and irrelevant features and the results of the feature selection process can be unstable. Feature selection ensembles, which aggregate the results of multiple individual base feature selectors, have been investigated as a means of stabilising feature selection results, but do not address the problem of correlated features. We present a novel framework to create feature selection ensembles from multivariate feature selectors while taking into account the biases produced by groups of correlated features, using agglomerative hierarchical clustering in a pre-processing step. These methods were applied to two real-world datasets from studies of Alzheimer's disease (AD), a progressive neurodegenerative disease that has no cure and is not yet fully understood. Our results show a marked improvement in the stability of features selected over the models without clustering, and the features selected by these models are in keeping with the findings in the AD literature.
    Strong Heuristics for Named Entity Linking. (arXiv:2207.02824v1 [cs.CL])
    Named entity linking (NEL) in news is a challenging endeavour due to the frequency of unseen and emerging entities, which necessitates the use of unsupervised or zero-shot methods. However, such methods tend to come with caveats, such as no integration of suitable knowledge bases (like Wikidata) for emerging entities, a lack of scalability, and poor interpretability. Here, we consider person disambiguation in Quotebank, a massive corpus of speaker-attributed quotations from the news, and investigate the suitability of intuitive, lightweight, and scalable heuristics for NEL in web-scale corpora. Our best performing heuristic disambiguates 94% and 63% of the mentions on Quotebank and the AIDA-CoNLL benchmark, respectively. Additionally, the proposed heuristics compare favourably to the state-of-the-art unsupervised and zero-shot methods, Eigenthemes and mGENRE, respectively, thereby serving as strong baselines for unsupervised and zero-shot entity linking.
    Rethinking the Importance of Sampling in Physics-informed Neural Networks. (arXiv:2207.02338v1 [cs.LG])
    Physics-informed neural networks (PINNs) have emerged as a powerful tool for solving partial differential equations (PDEs) in a variety of domains. While previous research in PINNs has mainly focused on constructing and balancing loss functions during training to avoid poor minima, the effect of sampling collocation points on the performance of PINNs has largely been overlooked. In this work, we find that the performance of PINNs can vary significantly with different sampling strategies, and using a fixed set of collocation points can be quite detrimental to the convergence of PINNs to the correct solution. In particular, (1) we hypothesize that training of PINNs rely on successful "propagation" of solution from initial and/or boundary condition points to interior points, and PINNs with poor sampling strategies can get stuck at trivial solutions if there are \textit{propagation failures}. (2) We demonstrate that propagation failures are characterized by highly imbalanced PDE residual fields where very high residuals are observed over very narrow regions. (3) To mitigate propagation failure, we propose a novel \textit{evolutionary sampling} (Evo) method that can incrementally accumulate collocation points in regions of high PDE residuals. We further provide an extension of Evo to respect the principle of causality while solving time-dependent PDEs. We empirically demonstrate the efficacy and efficiency of our proposed methods in a variety of PDE problems.
    Quantitative Assessment of DESIS Hyperspectral Data for Plant Biodiversity Estimation in Australia. (arXiv:2207.02482v1 [cs.LG])
    Diversity of terrestrial plants plays a key role in maintaining a stable, healthy, and productive ecosystem. Though remote sensing has been seen as a promising and cost-effective proxy for estimating plant diversity, there is a lack of quantitative studies on how confidently plant diversity can be inferred from spaceborne hyperspectral data. In this study, we assessed the ability of hyperspectral data captured by the DLR Earth Sensing Imaging Spectrometer (DESIS) for estimating plant species richness in the Southern Tablelands and Snowy Mountains regions in southeast Australia. Spectral features were firstly extracted from DESIS spectra with principal component analysis, canonical correlation analysis, and partial least squares analysis. Then regression was conducted between the extracted features and plant species richness with ordinary least squares regression, kernel ridge regression, and Gaussian process regression. Results were assessed with the coefficient of correlation ($r$) and Root-Mean-Square Error (RMSE), based on a two-fold cross validation scheme. With the best performing model, $r$ is 0.71 and RMSE is 5.99 for the Southern Tablelands region, while $r$ is 0.62 and RMSE is 6.20 for the Snowy Mountains region. The assessment results reported in this study provide supports for future studies on understanding the relationship between spaceborne hyperspectral measurements and terrestrial plant biodiversity.
    Cooperative Distribution Alignment via JSD Upper Bound. (arXiv:2207.02286v1 [cs.LG])
    Unsupervised distribution alignment estimates a transformation that maps two or more source distributions to a shared aligned distribution given only samples from each distribution. This task has many applications including generative modeling, unsupervised domain adaptation, and socially aware learning. Most prior works use adversarial learning (i.e., min-max optimization), which can be challenging to optimize and evaluate. A few recent works explore non-adversarial flow-based (i.e., invertible) approaches, but they lack a unified perspective and are limited in efficiently aligning multiple distributions. Therefore, we propose to unify and generalize previous flow-based approaches under a single non-adversarial framework, which we prove is equivalent to minimizing an upper bound on the Jensen-Shannon Divergence (JSD). Importantly, our problem reduces to a min-min, i.e., cooperative, problem and can provide a natural evaluation metric for unsupervised distribution alignment. We present empirical results of our framework on both simulated and real-world datasets to demonstrate the benefits of our approach.
    Composite FORCE learning of chaotic echo state networks for time-series prediction. (arXiv:2207.02420v1 [cs.LG])
    Echo state network (ESN), a kind of recurrent neural networks, consists of a fixed reservoir in which neurons are connected randomly and recursively and obtains the desired output only by training output connection weights. First-order reduced and controlled error (FORCE) learning is an online supervised training approach that can change the chaotic activity of ESNs into specified activity patterns. This paper proposes a composite FORCE learning method based on recursive least squares to train ESNs whose initial activity is spontaneously chaotic, where a composite learning technique featured by dynamic regressor extension and memory data exploitation is applied to enhance parameter convergence. The proposed method is applied to a benchmark problem about predicting chaotic time series generated by the Mackey-Glass system, and numerical results have shown that it significantly improves learning and prediction performances compared with existing methods.
    Private Matrix Approximation and Geometry of Unitary Orbits. (arXiv:2207.02794v1 [cs.DS])
    Consider the following optimization problem: Given $n \times n$ matrices $A$ and $\Lambda$, maximize $\langle A, U\Lambda U^*\rangle$ where $U$ varies over the unitary group $\mathrm{U}(n)$. This problem seeks to approximate $A$ by a matrix whose spectrum is the same as $\Lambda$ and, by setting $\Lambda$ to be appropriate diagonal matrices, one can recover matrix approximation problems such as PCA and rank-$k$ approximation. We study the problem of designing differentially private algorithms for this optimization problem in settings where the matrix $A$ is constructed using users' private data. We give efficient and private algorithms that come with upper and lower bounds on the approximation error. Our results unify and improve upon several prior works on private matrix approximation problems. They rely on extensions of packing/covering number bounds for Grassmannians to unitary orbits which should be of independent interest.
    Predicting is not Understanding: Recognizing and Addressing Underspecification in Machine Learning. (arXiv:2207.02598v1 [cs.LG])
    Machine learning (ML) models are typically optimized for their accuracy on a given dataset. However, this predictive criterion rarely captures all desirable properties of a model, in particular how well it matches a domain expert's understanding of a task. Underspecification refers to the existence of multiple models that are indistinguishable in their in-domain accuracy, even though they differ in other desirable properties such as out-of-distribution (OOD) performance. Identifying these situations is critical for assessing the reliability of ML models. We formalize the concept of underspecification and propose a method to identify and partially address it. We train multiple models with an independence constraint that forces them to implement different functions. They discover predictive features that are otherwise ignored by standard empirical risk minimization (ERM), which we then distill into a global model with superior OOD performance. Importantly, we constrain the models to align with the data manifold to ensure that they discover meaningful features. We demonstrate the method on multiple datasets in computer vision (collages, WILDS-Camelyon17, GQA) and discuss general implications of underspecification. Most notably, in-domain performance cannot serve for OOD model selection without additional assumptions.  ( 2 min )
    Unified Embeddings of Structural and Functional Connectome via a Function-Constrained Structural Graph Variational Auto-Encoder. (arXiv:2207.02328v1 [q-bio.NC])
    Graph theoretical analyses have become standard tools in modeling functional and anatomical connectivity in the brain. With the advent of connectomics, the primary graphs or networks of interest are structural connectome (derived from DTI tractography) and functional connectome (derived from resting-state fMRI). However, most published connectome studies have focused on either structural or functional connectome, yet complementary information between them, when available in the same dataset, can be jointly leveraged to improve our understanding of the brain. To this end, we propose a function-constrained structural graph variational autoencoder (FCS-GVAE) capable of incorporating information from both functional and structural connectome in an unsupervised fashion. This leads to a joint low-dimensional embedding that establishes a unified spatial coordinate system for comparing across different subjects. We evaluate our approach using the publicly available OASIS-3 Alzheimer's disease (AD) dataset and show that a variational formulation is necessary to optimally encode functional brain dynamics. Further, the proposed joint embedding approach can more accurately distinguish different patient sub-populations than approaches that do not use complementary connectome information.  ( 2 min )
    Multi-Contrast MRI Segmentation Trained on Synthetic Images. (arXiv:2207.02469v1 [eess.IV])
    In our comprehensive experiments and evaluations, we show that it is possible to generate multiple contrast (even all synthetically) and use synthetically generated images to train an image segmentation engine. We showed promising segmentation results tested on real multi-contrast MRI scans when delineating muscle, fat, bone and bone marrow, all trained on synthetic images. Based on synthetic image training, our segmentation results were as high as 93.91\%, 94.11\%, 91.63\%, 95.33\%, for muscle, fat, bone, and bone marrow delineation, respectively. Results were not significantly different from the ones obtained when real images were used for segmentation training: 94.68\%, 94.67\%, 95.91\%, and 96.82\%, respectively.  ( 2 min )
    When does SGD favor flat minima? A quantitative characterization via linear stability. (arXiv:2207.02628v1 [stat.ML])
    The observation that stochastic gradient descent (SGD) favors flat minima has played a fundamental role in understanding implicit regularization of SGD and guiding the tuning of hyperparameters. In this paper, we provide a quantitative explanation of this striking phenomenon by relating the particular noise structure of SGD to its \emph{linear stability} (Wu et al., 2018). Specifically, we consider training over-parameterized models with square loss. We prove that if a global minimum $\theta^*$ is linearly stable for SGD, then it must satisfy $\|H(\theta^*)\|_F\leq O(\sqrt{B}/\eta)$, where $\|H(\theta^*)\|_F, B,\eta$ denote the Frobenius norm of Hessian at $\theta^*$, batch size, and learning rate, respectively. Otherwise, SGD will escape from that minimum \emph{exponentially} fast. Hence, for minima accessible to SGD, the flatness -- as measured by the Frobenius norm of the Hessian -- is bounded independently of the model size and sample size. The key to obtaining these results is exploiting the particular geometry awareness of SGD noise: 1) the noise magnitude is proportional to loss value; 2) the noise directions concentrate in the sharp directions of local landscape. This property of SGD noise provably holds for linear networks and random feature models (RFMs) and is empirically verified for nonlinear networks. Moreover, the validity and practical relevance of our theoretical findings are justified by extensive numerical experiments.  ( 3 min )
    Compositional Generalization in Grounded Language Learning via Induced Model Sparsity. (arXiv:2207.02518v1 [cs.CL])
    We provide a study of how induced model sparsity can help achieve compositional generalization and better sample efficiency in grounded language learning problems. We consider simple language-conditioned navigation problems in a grid world environment with disentangled observations. We show that standard neural architectures do not always yield compositional generalization. To address this, we design an agent that contains a goal identification module that encourages sparse correlations between words in the instruction and attributes of objects, composing them together to find the goal. The output of the goal identification module is the input to a value iteration network planner. Our agent maintains a high level of performance on goals containing novel combinations of properties even when learning from a handful of demonstrations. We examine the internal representations of our agent and find the correct correspondences between words in its dictionary and attributes in the environment.  ( 2 min )
    Ultra-Low-Bitrate Speech Coding with Pretrained Transformers. (arXiv:2207.02262v1 [cs.SD])
    Speech coding facilitates the transmission of speech over low-bandwidth networks with minimal distortion. Neural-network based speech codecs have recently demonstrated significant improvements in quality over traditional approaches. While this new generation of codecs is capable of synthesizing high-fidelity speech, their use of recurrent or convolutional layers often restricts their effective receptive fields, which prevents them from compressing speech efficiently. We propose to further reduce the bitrate of neural speech codecs through the use of pretrained Transformers, capable of exploiting long-range dependencies in the input signal due to their inductive bias. As such, we use a pretrained Transformer in tandem with a convolutional encoder, which is trained end-to-end with a quantizer and a generative adversarial net decoder. Our numerical experiments show that supplementing the convolutional encoder of a neural speech codec with Transformer speech embeddings yields a speech codec with a bitrate of $600\,\mathrm{bps}$ that outperforms the original neural speech codec in synthesized speech quality when trained at the same bitrate. Subjective human evaluations suggest that the quality of the resulting codec is comparable or better than that of conventional codecs operating at three to four times the rate.  ( 2 min )
    voxel2vec: A Natural Language Processing Approach to Learning Distributed Representations for Scientific Data. (arXiv:2207.02565v1 [cs.LG])
    Relationships in scientific data, such as the numerical and spatial distribution relations of features in univariate data, the scalar-value combinations' relations in multivariate data, and the association of volumes in time-varying and ensemble data, are intricate and complex. This paper presents voxel2vec, a novel unsupervised representation learning model, which is used to learn distributed representations of scalar values/scalar-value combinations in a low-dimensional vector space. Its basic assumption is that if two scalar values/scalar-value combinations have similar contexts, they usually have high similarity in terms of features. By representing scalar values/scalar-value combinations as symbols, voxel2vec learns the similarity between them in the context of spatial distribution and then allows us to explore the overall association between volumes by transfer prediction. We demonstrate the usefulness and effectiveness of voxel2vec by comparing it with the isosurface similarity map of univariate data and applying the learned distributed representations to feature classification for multivariate data and to association analysis for time-varying and ensemble data.  ( 2 min )
    Query-Efficient Adversarial Attack Based on Latin Hypercube Sampling. (arXiv:2207.02391v1 [cs.CV])
    In order to be applicable in real-world scenario, Boundary Attacks (BAs) were proposed and ensured one hundred percent attack success rate with only decision information. However, existing BA methods craft adversarial examples by leveraging a simple random sampling (SRS) to estimate the gradient, consuming a large number of model queries. To overcome the drawback of SRS, this paper proposes a Latin Hypercube Sampling based Boundary Attack (LHS-BA) to save query budget. Compared with SRS, LHS has better uniformity under the same limited number of random samples. Therefore, the average on these random samples is closer to the true gradient than that estimated by SRS. Various experiments are conducted on benchmark datasets including MNIST, CIFAR, and ImageNet-1K. Experimental results demonstrate the superiority of the proposed LHS-BA over the state-of-the-art BA methods in terms of query efficiency. The source codes are publicly available at https://github.com/GZHU-DVL/LHS-BA.  ( 2 min )
    Distillation to Enhance the Portability of Risk Models Across Institutions with Large Patient Claims Database. (arXiv:2207.02445v1 [cs.LG])
    Artificial intelligence, and particularly machine learning (ML), is increasingly developed and deployed to support healthcare in a variety of settings. However, clinical decision support (CDS) technologies based on ML need to be portable if they are to be adopted on a broad scale. In this respect, models developed at one institution should be reusable at another. Yet there are numerous examples of portability failure, particularly due to naive application of ML models. Portability failure can lead to suboptimal care and medical errors, which ultimately could prevent the adoption of ML-based CDS in practice. One specific healthcare challenge that could benefit from enhanced portability is the prediction of 30-day readmission risk. Research to date has shown that deep learning models can be effective at modeling such risk. In this work, we investigate the practicality of model portability through a cross-site evaluation of readmission prediction models. To do so, we apply a recurrent neural network, augmented with self-attention and blended with expert features, to build readmission prediction models for two independent large scale claims datasets. We further present a novel transfer learning technique that adapts the well-known method of born-again network (BAN) training. Our experiments show that direct application of ML models trained at one institution and tested at another institution perform worse than models trained and tested at the same institution. We further show that the transfer learning approach based on the BAN produces models that are better than those trained on just a single institution's data. Notably, this improvement is consistent across both sites and occurs after a single retraining, which illustrates the potential for a cheap and general model transfer mechanism of readmission risk prediction.  ( 3 min )
    Generalization to translation shifts: a study in architectures and augmentations. (arXiv:2207.02349v1 [cs.CV])
    We provide a detailed evaluation of various image classification architectures (convolutional, vision transformer, and fully connected MLP networks) and data augmentation techniques towards generalization to large spacial translation shifts. We make the following observations: (a) In the absence of data augmentation, all architectures, including convolutional networks suffer degradation in performance when evaluated on translated test distributions. Understandably, both the in-distribution accuracy as well as degradation to shifts is significantly worse for non-convolutional architectures. (b) Across all architectures, even a minimal augmentation of $4$ pixel random crop improves the robustness of performance to much larger magnitude shifts of up to $1/4$ of image size ($8$-$16$ pixels) in the test data -- suggesting a form of meta generalization from augmentation. For non-convolutional architectures, while the absolute accuracy is still low, we see dramatic improvements in robustness to large translation shifts. (c) With sufficiently advanced augmentation ($4$ pixel crop+RandAugmentation+Erasing+MixUp) pipeline all architectures can be trained to have competitive performance, both in terms of in-distribution accuracy as well as generalization to large translation shifts.  ( 2 min )
    Improving Trustworthiness of AI Disease Severity Rating in Medical Imaging with Ordinal Conformal Prediction Sets. (arXiv:2207.02238v1 [cs.LG])
    The regulatory approval and broad clinical deployment of medical AI have been hampered by the perception that deep learning models fail in unpredictable and possibly catastrophic ways. A lack of statistically rigorous uncertainty quantification is a significant factor undermining trust in AI results. Recent developments in distribution-free uncertainty quantification present practical solutions for these issues by providing reliability guarantees for black-box models on arbitrary data distributions as formally valid finite-sample prediction intervals. Our work applies these new uncertainty quantification methods -- specifically conformal prediction -- to a deep-learning model for grading the severity of spinal stenosis in lumbar spine MRI. We demonstrate a technique for forming ordinal prediction sets that are guaranteed to contain the correct stenosis severity within a user-defined probability (confidence interval). On a dataset of 409 MRI exams processed by the deep-learning model, the conformal method provides tight coverage with small prediction set sizes. Furthermore, we explore the potential clinical applicability of flagging cases with high uncertainty predictions (large prediction sets) by quantifying an increase in the prevalence of significant imaging abnormalities (e.g. motion artifacts, metallic artifacts, and tumors) that could degrade confidence in predictive performance when compared to a random sample of cases.  ( 2 min )
    Putting the Con in Context: Identifying Deceptive Actors in the Game of Mafia. (arXiv:2207.02253v1 [cs.CL])
    While neural networks demonstrate a remarkable ability to model linguistic content, capturing contextual information related to a speaker's conversational role is an open area of research. In this work, we analyze the effect of speaker role on language use through the game of Mafia, in which participants are assigned either an honest or a deceptive role. In addition to building a framework to collect a dataset of Mafia game records, we demonstrate that there are differences in the language produced by players with different roles. We confirm that classification models are able to rank deceptive players as more suspicious than honest ones based only on their use of language. Furthermore, we show that training models on two auxiliary tasks outperforms a standard BERT-based text classification approach. We also present methods for using our trained models to identify features that distinguish between player roles, which could be used to assist players during the Mafia game.  ( 2 min )
    Information Compression and Performance Evaluation of Tic-Tac-Toe's Evaluation Function Using Singular Value Decomposition. (arXiv:2207.02449v1 [cs.LG])
    We approximated the evaluation function for the game Tic-Tac-Toe by singular value decomposition (SVD) and investigated the effect of approximation accuracy on winning rate. We first prepared the perfect evaluation function of Tic-Tac-Toe and performed low-rank approximation by considering the evaluation function as a ninth-order tensor. We found that we can reduce the amount of information of the evaluation function by 70% without significantly degrading the performance. Approximation accuracy and winning rate were strongly correlated but not perfectly proportional. We also investigated how the decomposition method of the evaluation function affects the performance. We considered two decomposition methods: simple SVD regarding the evaluation function as a matrix and the Tucker decomposition by higher-order SVD (HOSVD). At the same compression ratio, the strategy with the approximated evaluation function obtained by HOSVD exhibited a significantly higher winning rate than that obtained by SVD. These results suggest that SVD can effectively compress board game strategies and an optimal compression method that depends on the game exists.  ( 2 min )
    Many-body localized hidden Born machine. (arXiv:2207.02346v1 [quant-ph])
    Born Machines are quantum-inspired generative models that leverage the probabilistic nature of quantum states. Here, we present a new architecture called many-body localized (MBL) hidden Born machine that uses both MBL dynamics and hidden units as learning resources. We theoretically prove that MBL Born machines possess more expressive power than classical models, and the introduction of hidden units boosts its learning power. We numerically demonstrate that the MBL hidden Born machine is capable of learning a toy dataset consisting of patterns of MNIST handwritten digits, quantum data obtained from quantum many-body states, and non-local parity data. In order to understand the mechanism behind learning, we track physical quantities such as von Neumann entanglement entropy and Hamming distance during learning, and compare the learning outcomes in the MBL, thermal, and Anderson localized phases. We show that the superior learning power of the MBL phase relies importantly on both localization and interaction. Our architecture and algorithm provide novel strategies of utilizing quantum many-body systems as learning resources, and reveal a powerful connection between disorder, interaction, and learning in quantum systems.  ( 2 min )
    OpenLDN: Learning to Discover Novel Classes for Open-World Semi-Supervised Learning. (arXiv:2207.02261v1 [cs.CV])
    Semi-supervised learning (SSL) is one of the dominant approaches to address the annotation bottleneck of supervised learning. Recent SSL methods can effectively leverage a large repository of unlabeled data to improve performance while relying on a small set of labeled data. One common assumption in most SSL methods is that the labeled and unlabeled data are from the same underlying data distribution. However, this is hardly the case in many real-world scenarios, which limits their applicability. In this work, instead, we attempt to solve the recently proposed challenging open-world SSL problem that does not make such an assumption. In the open-world SSL problem, the objective is to recognize samples of known classes, and simultaneously detect and cluster samples belonging to novel classes present in unlabeled data. This work introduces OpenLDN that utilizes a pairwise similarity loss to discover novel classes. Using a bi-level optimization rule this pairwise similarity loss exploits the information available in the labeled set to implicitly cluster novel class samples, while simultaneously recognizing samples from known classes. After discovering novel classes, OpenLDN transforms the open-world SSL problem into a standard SSL problem to achieve additional performance gains using existing SSL methods. Our extensive experiments demonstrate that OpenLDN outperforms the current state-of-the-art methods on multiple popular classification benchmarks while providing a better accuracy/training time trade-off.  ( 3 min )
    GAMa: Cross-view Video Geo-localization. (arXiv:2207.02431v1 [cs.CV])
    The existing work in cross-view geo-localization is based on images where a ground panorama is matched to an aerial image. In this work, we focus on ground videos instead of images which provides additional contextual cues which are important for this task. There are no existing datasets for this problem, therefore we propose GAMa dataset, a large-scale dataset with ground videos and corresponding aerial images. We also propose a novel approach to solve this problem. At clip-level, a short video clip is matched with corresponding aerial image and is later used to get video-level geo-localization of a long video. Moreover, we propose a hierarchical approach to further improve the clip-level geolocalization. It is a challenging dataset, unaligned and limited field of view, and our proposed method achieves a Top-1 recall rate of 19.4% and 45.1% @1.0mile. Code and dataset are available at following link: https://github.com/svyas23/GAMa.  ( 2 min )
    Guiding Machine Perception with Psychophysics. (arXiv:2207.02241v1 [cs.CV])
    {G}{ustav} Fechner's 1860 delineation of psychophysics, the measurement of sensation in relation to its stimulus, is widely considered to be the advent of modern psychological science. In psychophysics, a researcher parametrically varies some aspects of a stimulus, and measures the resulting changes in a human subject's experience of that stimulus; doing so gives insight to the determining relationship between a sensation and the physical input that evoked it. This approach is used heavily in perceptual domains, including signal detection, threshold measurement, and ideal observer analysis. Scientific fields like vision science have always leaned heavily on the methods and procedures of psychophysics, but there is now growing appreciation of them by machine learning researchers, sparked by widening overlap between biological and artificial perception \cite{rojas2011automatic, scheirer2014perceptual,escalera2014chalearn,zhang2018agil, grieggs2021measuring}. Machine perception that is guided by behavioral measurements, as opposed to guidance restricted to arbitrarily assigned human labels, has significant potential to fuel further progress in artificial intelligence.  ( 2 min )
    EEPT: Early Discovery of Emerging Entities in Twitter with Semantic Similarity. (arXiv:2207.02434v1 [cs.CL])
    Some events which happen in the future could be important for companies, governments, and even our personal life. Prediction of these events before their establishment is helpful for efficient decision-making. We call such events emerging entities. They have not taken place yet, and there is no information about them in KB. However, some clues exist in different areas, especially on social media. Thus, retrieving these type of entities are possible. This paper proposes a method of early discovery of emerging entities. We use semantic clustering of short messages. To evaluate the performance of our proposal, we devise and utilize a performance evaluation metric. The results show that our proposed method finds those emerging entities of which Twitter trends are not always capable.  ( 2 min )
    Transfer Learning for Rapid Extraction of Thickness from Optical Spectra of Semiconductor Thin Films. (arXiv:2207.02209v1 [cs.LG])
    High-throughput experimentation with autonomous workflows, increasingly used to screen and optimize optoelectronic thin films, requires matching throughput of downstream characterizations. Despite being essential, thickness characterization lags in throughput. Although optical spectroscopic methods, e.g., spectrophotometry, provide quick measurements, a critical bottleneck is the ensuing manual fitting of optical oscillation models to the measured reflection and transmission. This study presents a machine-learning (ML) framework called thicknessML, which rapidly extracts film thickness from spectroscopic reflection and transmission. thicknessML leverages transfer learning to generalize to materials of different underlying optical oscillator models (i.e., different material classes).We demonstrate that thicknessML can extract film thickness from six perovskite samples in a two-stage process: (1) pre-training on a generic simulated dataset of Tauc-Lorentz oscillator, and (2) transfer learning to a simulated perovskite dataset of several literature perovskite refractive indices. Results show a pre-training thickness mean absolute percentage error (MAPE) of 5-7% and an experimental thickness MAPE of 6-19%.  ( 2 min )
    Learning Task Embeddings for Teamwork Adaptation in Multi-Agent Reinforcement Learning. (arXiv:2207.02249v1 [cs.MA])
    Successful deployment of multi-agent reinforcement learning often requires agents to adapt their behaviour. In this work, we discuss the problem of teamwork adaptation in which a team of agents needs to adapt their policies to solve novel tasks with limited fine-tuning. Motivated by the intuition that agents need to be able to identify and distinguish tasks in order to adapt their behaviour to the current task, we propose to learn multi-agent task embeddings (MATE). These task embeddings are trained using an encoder-decoder architecture optimised for reconstruction of the transition and reward functions which uniquely identify tasks. We show that a team of agents is able to adapt to novel tasks when provided with task embeddings. We propose three MATE training paradigms: independent MATE, centralised MATE, and mixed MATE which vary in the information used for the task encoding. We show that the embeddings learned by MATE identify tasks and provide useful information which agents leverage during adaptation to novel tasks.  ( 2 min )
    Linear Jamming Bandits: Sample-Efficient Learning for Non-Coherent Digital Jamming. (arXiv:2207.02365v1 [cs.LG])
    It has been shown (Amuru et al. 2015) that online learning algorithms can be effectively used to select optimal physical layer parameters for jamming against digital modulation schemes without a priori knowledge of the victim's transmission strategy. However, this learning problem involves solving a multi-armed bandit problem with a mixed action space that can grow very large. As a result, convergence to the optimal jamming strategy can be slow, especially when the victim and jammer's symbols are not perfectly synchronized. In this work, we remedy the sample efficiency issues by introducing a linear bandit algorithm that accounts for inherent similarities between actions. Further, we propose context features which are well-suited for the statistical features of the non-coherent jamming problem and demonstrate significantly improved convergence behavior compared to the prior art. Additionally, we show how prior knowledge about the victim's transmissions can be seamlessly integrated into the learning framework. We finally discuss limitations in the asymptotic regime.  ( 2 min )
    Multi-Label Retinal Disease Classification using Transformers. (arXiv:2207.02335v1 [cs.CV])
    Early detection of retinal diseases is one of the most important means of preventing partial or permanent blindness in patients. In this research, a novel multi-label classification system is proposed for the detection of multiple retinal diseases, using fundus images collected from a variety of sources. First, a new multi-label retinal disease dataset, the MuReD dataset, is constructed, using a number of publicly available datasets for fundus disease classification. Next, a sequence of post-processing steps is applied to ensure the quality of the image data and the range of diseases, present in the dataset. For the first time in fundus multi-label disease classification, a transformer-based model optimized through extensive experimentation is used for image analysis and decision making. Numerous experiments are performed to optimize the configuration of the proposed system. It is shown that the approach performs better than state-of-the-art works on the same task by 7.9% and 8.1% in terms of AUC score for disease detection and disease classification, respectively. The obtained results further support the potential applications of transformer-based architectures in the medical imaging field.  ( 3 min )
    BioTABQA: Instruction Learning for Biomedical Table Question Answering. (arXiv:2207.02419v1 [cs.CL])
    Table Question Answering (TQA) is an important but under-explored task. Most of the existing QA datasets are in unstructured text format and only few of them use tables as the context. To the best of our knowledge, none of TQA datasets exist in the biomedical domain where tables are frequently used to present information. In this paper, we first curate a table question answering dataset, BioTABQA, using 22 templates and the context from a biomedical textbook on differential diagnosis. BioTABQA can not only be used to teach a model how to answer questions from tables but also evaluate how a model generalizes to unseen questions, an important scenario for biomedical applications. To achieve the generalization evaluation, we divide the templates into 17 training and 5 cross-task evaluations. Then, we develop two baselines using single and multi-tasks learning on BioTABQA. Furthermore, we explore instructional learning, a recent technique showing impressive generalizing performance. Experimental results show that our instruction-tuned model outperforms single and multi-task baselines on an average by ~23% and ~6% across various evaluation settings, and more importantly, instruction-tuned model outperforms baselines by ~5% on cross-tasks.  ( 2 min )
    Federated and Transfer Learning: A Survey on Adversaries and Defense Mechanisms. (arXiv:2207.02337v1 [cs.LG])
    The advent of federated learning has facilitated large-scale data exchange amongst machine learning models while maintaining privacy. Despite its brief history, federated learning is rapidly evolving to make wider use more practical. One of the most significant advancements in this domain is the incorporation of transfer learning into federated learning, which overcomes fundamental constraints of primary federated learning, particularly in terms of security. This chapter performs a comprehensive survey on the intersection of federated and transfer learning from a security point of view. The main goal of this study is to uncover potential vulnerabilities and defense mechanisms that might compromise the privacy and performance of systems that use federated and transfer learning.  ( 2 min )
    Towards Realistic Semi-Supervised Learning. (arXiv:2207.02269v1 [cs.CV])
    Deep learning is pushing the state-of-the-art in many computer vision applications. However, it relies on large annotated data repositories, and capturing the unconstrained nature of the real-world data is yet to be solved. Semi-supervised learning (SSL) complements the annotated training data with a large corpus of unlabeled data to reduce annotation cost. The standard SSL approach assumes unlabeled data are from the same distribution as annotated data. Recently, ORCA [9] introduce a more realistic SSL problem, called open-world SSL, by assuming that the unannotated data might contain samples from unknown classes. This work proposes a novel approach to tackle SSL in open-world setting, where we simultaneously learn to classify known and unknown classes. At the core of our method, we utilize sample uncertainty and incorporate prior knowledge about class distribution to generate reliable pseudo-labels for unlabeled data belonging to both known and unknown classes. Our extensive experimentation showcases the effectiveness of our approach on several benchmark datasets, where it substantially outperforms the existing state-of-the-art on seven diverse datasets including CIFAR-100 (17.6%), ImageNet-100 (5.7%), and Tiny ImageNet (9.9%).  ( 2 min )
    Swin Deformable Attention U-Net Transformer (SDAUT) for Explainable Fast MRI. (arXiv:2207.02390v1 [cs.CV])
    Fast MRI aims to reconstruct a high fidelity image from partially observed measurements. Exuberant development in fast MRI using deep learning has been witnessed recently. Meanwhile, novel deep learning paradigms, e.g., Transformer based models, are fast-growing in natural language processing and promptly developed for computer vision and medical image analysis due to their prominent performance. Nevertheless, due to the complexity of the Transformer, the application of fast MRI may not be straightforward. The main obstacle is the computational cost of the self-attention layer, which is the core part of the Transformer, can be expensive for high resolution MRI inputs. In this study, we propose a new Transformer architecture for solving fast MRI that coupled Shifted Windows Transformer with U-Net to reduce the network complexity. We incorporate deformable attention to construe the explainability of our reconstruction model. We empirically demonstrate that our method achieves consistently superior performance on the fast MRI task. Besides, compared to state-of-the-art Transformer models, our method has fewer network parameters while revealing explainability. The code is publicly available at https://github.com/ayanglab/SDAUT.  ( 2 min )
    Transformers are Adaptable Task Planners. (arXiv:2207.02442v1 [cs.RO])
    Every home is different, and every person likes things done in their particular way. Therefore, home robots of the future need to both reason about the sequential nature of day-to-day tasks and generalize to user's preferences. To this end, we propose a Transformer Task Planner(TTP) that learns high-level actions from demonstrations by leveraging object attribute-based representations. TTP can be pre-trained on multiple preferences and shows generalization to unseen preferences using a single demonstration as a prompt in a simulated dishwasher loading task. Further, we demonstrate real-world dish rearrangement using TTP with a Franka Panda robotic arm, prompted using a single human demonstration.  ( 2 min )
    State-Augmented Learnable Algorithms for Resource Management in Wireless Networks. (arXiv:2207.02242v1 [cs.LG])
    We consider resource management problems in multi-user wireless networks, which can be cast as optimizing a network-wide utility function, subject to constraints on the long-term average performance of users across the network. We propose a state-augmented algorithm for solving the aforementioned radio resource management (RRM) problems, where, alongside the instantaneous network state, the RRM policy takes as input the set of dual variables corresponding to the constraints, which evolve depending on how much the constraints are violated during execution. We theoretically show that the proposed state-augmented algorithm leads to feasible and near-optimal RRM decisions. Moreover, focusing on the problem of wireless power control using graph neural network (GNN) parameterizations, we demonstrate the superiority of the proposed RRM algorithm over baseline methods across a suite of numerical experiments.  ( 2 min )
  • Open

    Stochastic normalizing flows as non-equilibrium transformations. (arXiv:2201.08862v3 [hep-lat] UPDATED)
    Normalizing flows are a class of deep generative models that provide a promising route to sample lattice field theories more efficiently than conventional Monte Carlo simulations. In this work we show that the theoretical framework of stochastic normalizing flows, in which neural-network layers are combined with Monte Carlo updates, is the same that underlies out-of-equilibrium simulations based on Jarzynski's equality, which have been recently deployed to compute free-energy differences in lattice gauge theories. We lay out a strategy to optimize the efficiency of this extended class of generative models and present examples of applications.
    Distributional neural networks for electricity price forecasting. (arXiv:2207.02832v1 [q-fin.ST])
    We present a novel approach to probabilistic electricity price forecasting (EPF) which utilizes distributional artificial neural networks. The novel network structure for EPF is based on a regularized distributional multilayer perceptron (DMLP) which contains a probability layer. Using the TensorFlow Probability framework, the neural network's output is defined to be a distribution, either normal or potentially skewed and heavy-tailed Johnson's SU (JSU). The method is compared against state-of-the-art benchmarks in a forecasting study. The study comprises forecasting involving day-ahead electricity prices in the German market. The results show evidence of the importance of higher moments when modeling electricity prices.
    State-Augmented Learnable Algorithms for Resource Management in Wireless Networks. (arXiv:2207.02242v1 [cs.LG])
    We consider resource management problems in multi-user wireless networks, which can be cast as optimizing a network-wide utility function, subject to constraints on the long-term average performance of users across the network. We propose a state-augmented algorithm for solving the aforementioned radio resource management (RRM) problems, where, alongside the instantaneous network state, the RRM policy takes as input the set of dual variables corresponding to the constraints, which evolve depending on how much the constraints are violated during execution. We theoretically show that the proposed state-augmented algorithm leads to feasible and near-optimal RRM decisions. Moreover, focusing on the problem of wireless power control using graph neural network (GNN) parameterizations, we demonstrate the superiority of the proposed RRM algorithm over baseline methods across a suite of numerical experiments.
    Evaluating Robustness to Dataset Shift via Parametric Robustness Sets. (arXiv:2205.15947v2 [cs.LG] UPDATED)
    We give a method for proactively identifying small, plausible shifts in distribution which lead to large differences in model performance. To ensure that these shifts are plausible, we parameterize them in terms of interpretable changes in causal mechanisms of observed variables. This defines a parametric robustness set of plausible distributions and a corresponding worst-case loss. While the loss under an individual parametric shift can be estimated via reweighting techniques such as importance sampling, the resulting worst-case optimization problem is non-convex, and the estimate may suffer from large variance. For small shifts, however, we can construct a local second-order approximation to the loss under shift and cast the problem of finding a worst-case shift as a particular non-convex quadratic optimization problem, for which efficient algorithms are available. We demonstrate that this second-order approximation can be estimated directly for shifts in conditional exponential family models, and we bound the approximation error. We apply our approach to a computer vision task (classifying gender from images), revealing sensitivity to shifts in non-causal attributes.
    Epistemic Neural Networks. (arXiv:2107.08924v5 [cs.LG] UPDATED)
    Intelligence relies on an agent's knowledge of what it does not know. This capability can be assessed based on the quality of joint predictions of labels across multiple inputs. Conventional neural networks lack this capability and, since most research has focused on marginal predictions, this shortcoming has been largely overlooked. We introduce the epistemic neural network (ENN) as an interface for models that represent uncertainty as required to generate useful joint predictions. While prior approaches to uncertainty modeling such as Bayesian neural networks can be expressed as ENNs, this new interface facilitates comparison of joint predictions and the design of novel architectures and algorithms. In particular, we introduce the epinet: an architecture that can supplement any conventional neural network, including large pretrained models, and can be trained with modest incremental computation to estimate uncertainty. With an epinet, conventional neural networks outperform very large ensembles, consisting of hundreds or more particles, with orders of magnitude less computation. We demonstrate this efficacy across synthetic data, ImageNet, and some reinforcement learning tasks. As part of this effort we open-source experiment code.
    Variational Flow Graphical Model. (arXiv:2207.02722v1 [stat.ML])
    This paper introduces a novel approach to embed flow-based models with hierarchical structures. The proposed framework is named Variational Flow Graphical (VFG) Model. VFGs learn the representation of high dimensional data via a message-passing scheme by integrating flow-based functions through variational inference. By leveraging the expressive power of neural networks, VFGs produce a representation of the data using a lower dimension, thus overcoming the drawbacks of many flow-based models, usually requiring a high dimensional latent space involving many trivial variables. Aggregation nodes are introduced in the VFG models to integrate forward-backward hierarchical information via a message passing scheme. Maximizing the evidence lower bound (ELBO) of data likelihood aligns the forward and backward messages in each aggregation node achieving a consistency node state. Algorithms have been developed to learn model parameters through gradient updating regarding the ELBO objective. The consistency of aggregation nodes enable VFGs to be applicable in tractable inference on graphical structures. Besides representation learning and numerical inference, VFGs provide a new approach for distribution modeling on datasets with graphical latent structures. Additionally, theoretical study shows that VFGs are universal approximators by leveraging the implicitly invertible flow-based structures. With flexible graphical structures and superior excessive power, VFGs could potentially be used to improve probabilistic inference. In the experiments, VFGs achieves improved evidence lower bound (ELBO) and likelihood values on multiple datasets.
    Improved conformalized quantile regression. (arXiv:2207.02808v1 [stat.ML])
    Conformalized quantile regression is a procedure that inherits the advantages of conformal prediction and quantile regression. That is, we use quantile regression to estimate the true conditional quantile and then apply a conformal step on a calibration set to ensure marginal coverage. In this way, we get adaptive prediction intervals that account for heteroscedasticity. However, the aforementioned conformal step lacks adaptiveness as described in (Romano et al., 2019). To overcome this limitation, instead of applying a single conformal step after estimating conditional quantiles with quantile regression, we propose to cluster the explanatory variables weighted by their permutation importance with an optimized k-means and apply k conformal steps. To show that this improved version outperforms the classic version of conformalized quantile regression and is more adaptive to heteroscedasticity, we extensively compare the prediction intervals of both in open datasets.
    Topological Information Retrieval with Dilation-Invariant Bottleneck Comparative Measures. (arXiv:2104.01672v3 [stat.ML] UPDATED)
    Appropriately representing elements in a database so that queries may be accurately matched is a central task in information retrieval; recently, this has been achieved by embedding the graphical structure of the database into a manifold in a hierarchy-preserving manner using a variety of metrics. Persistent homology is a tool commonly used in topological data analysis that is able to rigorously characterize a database in terms of both its hierarchy and connectivity structure. Computing persistent homology on a variety of embedded datasets reveals that some commonly used embeddings fail to preserve the connectivity. We show that those embeddings which successfully retain the database topology coincide in persistent homology by introducing two dilation-invariant comparative measures to capture this effect: in particular, they address the issue of metric distortion on manifolds. We provide an algorithm for their computation that exhibits greatly reduced time complexity over existing methods. We use these measures to perform the first instance of topology-based information retrieval and demonstrate its increased performance over the standard bottleneck distance for persistent homology. We showcase our approach on databases of different data varieties including text, videos, and medical images.
    Don't Pay Attention to the Noise: Learning Self-supervised Representations of Light Curves with a Denoising Time Series Transformer. (arXiv:2207.02777v1 [astro-ph.IM])
    Astrophysical light curves are particularly challenging data objects due to the intensity and variety of noise contaminating them. Yet, despite the astronomical volumes of light curves available, the majority of algorithms used to process them are still operating on a per-sample basis. To remedy this, we propose a simple Transformer model -- called Denoising Time Series Transformer (DTST) -- and show that it excels at removing the noise and outliers in datasets of time series when trained with a masked objective, even when no clean targets are available. Moreover, the use of self-attention enables rich and illustrative queries into the learned representations. We present experiments on real stellar light curves from the Transiting Exoplanet Space Satellite (TESS), showing advantages of our approach compared to traditional denoising techniques.
    Instance-Dependent Near-Optimal Policy Identification in Linear MDPs via Online Experiment Design. (arXiv:2207.02575v1 [cs.LG])
    While much progress has been made in understanding the minimax sample complexity of reinforcement learning (RL) -- the complexity of learning on the "worst-case" instance -- such measures of complexity often do not capture the true difficulty of learning. In practice, on an "easy" instance, we might hope to achieve a complexity far better than that achievable on the worst-case instance. In this work we seek to understand the "instance-dependent" complexity of learning near-optimal policies (PAC RL) in the setting of RL with linear function approximation. We propose an algorithm, \textsc{Pedel}, which achieves a fine-grained instance-dependent measure of complexity, the first of its kind in the RL with function approximation setting, thereby capturing the difficulty of learning on each particular problem instance. Through an explicit example, we show that \textsc{Pedel} yields provable gains over low-regret, minimax-optimal algorithms and that such algorithms are unable to hit the instance-optimal rate. Our approach relies on a novel online experiment design-based procedure which focuses the exploration budget on the "directions" most relevant to learning a near-optimal policy, and may be of independent interest.
    PAC Prediction Sets for Meta-Learning. (arXiv:2207.02440v1 [cs.LG])
    Uncertainty quantification is a key component of machine learning models targeted at safety-critical systems such as in healthcare or autonomous vehicles. We study this problem in the context of meta learning, where the goal is to quickly adapt a predictor to new tasks. In particular, we propose a novel algorithm to construct \emph{PAC prediction sets}, which capture uncertainty via sets of labels, that can be adapted to new tasks with only a few training examples. These prediction sets satisfy an extension of the typical PAC guarantee to the meta learning setting; in particular, the PAC guarantee holds with high probability over future tasks. We demonstrate the efficacy of our approach on four datasets across three application domains: mini-ImageNet and CIFAR10-C in the visual domain, FewRel in the language domain, and the CDC Heart Dataset in the medical domain. In particular, our prediction sets satisfy the PAC guarantee while having smaller size compared to other baselines that also satisfy this guarantee.
    When does SGD favor flat minima? A quantitative characterization via linear stability. (arXiv:2207.02628v1 [stat.ML])
    The observation that stochastic gradient descent (SGD) favors flat minima has played a fundamental role in understanding implicit regularization of SGD and guiding the tuning of hyperparameters. In this paper, we provide a quantitative explanation of this striking phenomenon by relating the particular noise structure of SGD to its \emph{linear stability} (Wu et al., 2018). Specifically, we consider training over-parameterized models with square loss. We prove that if a global minimum $\theta^*$ is linearly stable for SGD, then it must satisfy $\|H(\theta^*)\|_F\leq O(\sqrt{B}/\eta)$, where $\|H(\theta^*)\|_F, B,\eta$ denote the Frobenius norm of Hessian at $\theta^*$, batch size, and learning rate, respectively. Otherwise, SGD will escape from that minimum \emph{exponentially} fast. Hence, for minima accessible to SGD, the flatness -- as measured by the Frobenius norm of the Hessian -- is bounded independently of the model size and sample size. The key to obtaining these results is exploiting the particular geometry awareness of SGD noise: 1) the noise magnitude is proportional to loss value; 2) the noise directions concentrate in the sharp directions of local landscape. This property of SGD noise provably holds for linear networks and random feature models (RFMs) and is empirically verified for nonlinear networks. Moreover, the validity and practical relevance of our theoretical findings are justified by extensive numerical experiments.
    Adaptive deep learning for nonparametric time series regression. (arXiv:2207.02546v1 [math.ST])
    In this paper, we develop a general theory for adaptive nonparametric estimation of mean functions of nonstationary and nonlinear time series using deep neural networks (DNNs). We first consider two types of DNN estimators, non-penalized and sparse-penalized DNN estimators, and establish their generalization error bounds for general nonstationary time series. We then derive minimax lower bounds for estimating mean functions belonging to a wide class of nonlinear autoregressive (AR) models that include nonlinear generalized additive AR, single index, and threshold AR models. Building upon the results, we show that the sparse-penalized DNN estimator is adaptive and attains the minimax optimal rates up to a poly-logarithmic factor for many nonlinear AR models. Through numerical simulations, we demonstrate the usefulness of the DNN methods for estimating nonlinear AR models with intrinsic low-dimensional structures and discontinuous or rough mean functions, which is consistent with our theory.
    Neural network stochastic differential equation models with applications to financial data forecasting. (arXiv:2111.13164v5 [cs.LG] UPDATED)
    In this article, we employ a collection of stochastic differential equations with drift and diffusion coefficients approximated by neural networks to predict the trend of chaotic time series which has big jump properties. Our contributions are, first, we propose a model called L\'evy induced stochastic differential equation network, which explores compounded stochastic differential equations with $\alpha$-stable L\'evy motion to model complex time series data and solve the problem through neural network approximation. Second, we theoretically prove the convergence of our algorithm with respect to hyper-parameters of the neural network, and obtain the error bound without curse of dimensionality. Finally, we illustrate our method by applying it to real financial time series data and find the accuracy increases through the use of non-Gaussian L\'evy processes. We also present detailed comparisons in terms of data patterns, various models, different shapes of L\'evy motion and the prediction lengths.
    Trading with the Momentum Transformer: An Intelligent and Interpretable Architecture. (arXiv:2112.08534v2 [cs.LG] UPDATED)
    We introduce the Momentum Transformer, an attention-based deep learning architecture which outperforms benchmark momentum and mean-reversion trading strategies. Unlike state-of-the-art Long Short-Term Memory (LSTM) architectures, which are sequential in nature, the attention mechanism provides our architecture with a direct connection to all previous time-steps. Our architecture enables us to learn longer-term dependencies, improves performance when considering returns net of transaction costs and naturally adapts to new market regimes, such as during the SARS-CoV-2 crisis. The Momentum Transformer is inherently interpretable, providing us with greater insights into our deep learning momentum trading strategy, including how it blends different classical strategies and the past time-steps which are of the greatest significance to the model.
    Private Matrix Approximation and Geometry of Unitary Orbits. (arXiv:2207.02794v1 [cs.DS])
    Consider the following optimization problem: Given $n \times n$ matrices $A$ and $\Lambda$, maximize $\langle A, U\Lambda U^*\rangle$ where $U$ varies over the unitary group $\mathrm{U}(n)$. This problem seeks to approximate $A$ by a matrix whose spectrum is the same as $\Lambda$ and, by setting $\Lambda$ to be appropriate diagonal matrices, one can recover matrix approximation problems such as PCA and rank-$k$ approximation. We study the problem of designing differentially private algorithms for this optimization problem in settings where the matrix $A$ is constructed using users' private data. We give efficient and private algorithms that come with upper and lower bounds on the approximation error. Our results unify and improve upon several prior works on private matrix approximation problems. They rely on extensions of packing/covering number bounds for Grassmannians to unitary orbits which should be of independent interest.
    Reconstructing Nonlinear Dynamical Systems from Multi-Modal Time Series. (arXiv:2111.02922v3 [cs.LG] UPDATED)
    Empirically observed time series in physics, biology, or medicine, are commonly generated by some underlying dynamical system (DS) which is the target of scientific interest. There is an increasing interest to harvest machine learning methods to reconstruct this latent DS in a data-driven, unsupervised way. In many areas of science it is common to sample time series observations from many data modalities simultaneously, e.g. electrophysiological and behavioral time series in a typical neuroscience experiment. However, current machine learning tools for reconstructing DSs usually focus on just one data modality. Here we propose a general framework for multi-modal data integration for the purpose of nonlinear DS reconstruction and the analysis of cross-modal relations. This framework is based on dynamically interpretable recurrent neural networks as general approximators of nonlinear DSs, coupled to sets of modality-specific decoder models from the class of generalized linear models. Both an expectation-maximization and a variational inference algorithm for model training are advanced and compared. We show on nonlinear DS benchmarks that our algorithms can efficiently compensate for too noisy or missing information in one data channel by exploiting other channels, and demonstrate on experimental neuroscience data how the algorithm learns to link different data domains to the underlying dynamics.
    Many-body localized hidden Born machine. (arXiv:2207.02346v1 [quant-ph])
    Born Machines are quantum-inspired generative models that leverage the probabilistic nature of quantum states. Here, we present a new architecture called many-body localized (MBL) hidden Born machine that uses both MBL dynamics and hidden units as learning resources. We theoretically prove that MBL Born machines possess more expressive power than classical models, and the introduction of hidden units boosts its learning power. We numerically demonstrate that the MBL hidden Born machine is capable of learning a toy dataset consisting of patterns of MNIST handwritten digits, quantum data obtained from quantum many-body states, and non-local parity data. In order to understand the mechanism behind learning, we track physical quantities such as von Neumann entanglement entropy and Hamming distance during learning, and compare the learning outcomes in the MBL, thermal, and Anderson localized phases. We show that the superior learning power of the MBL phase relies importantly on both localization and interaction. Our architecture and algorithm provide novel strategies of utilizing quantum many-body systems as learning resources, and reveal a powerful connection between disorder, interaction, and learning in quantum systems.
    Linear Jamming Bandits: Sample-Efficient Learning for Non-Coherent Digital Jamming. (arXiv:2207.02365v1 [cs.LG])
    It has been shown (Amuru et al. 2015) that online learning algorithms can be effectively used to select optimal physical layer parameters for jamming against digital modulation schemes without a priori knowledge of the victim's transmission strategy. However, this learning problem involves solving a multi-armed bandit problem with a mixed action space that can grow very large. As a result, convergence to the optimal jamming strategy can be slow, especially when the victim and jammer's symbols are not perfectly synchronized. In this work, we remedy the sample efficiency issues by introducing a linear bandit algorithm that accounts for inherent similarities between actions. Further, we propose context features which are well-suited for the statistical features of the non-coherent jamming problem and demonstrate significantly improved convergence behavior compared to the prior art. Additionally, we show how prior knowledge about the victim's transmissions can be seamlessly integrated into the learning framework. We finally discuss limitations in the asymptotic regime.
    Expectation Distance-based Distributional Clustering for Noise-Robustness. (arXiv:2110.08871v3 [cs.LG] UPDATED)
    This paper presents a clustering technique that reduces the susceptibility to data noise by learning and clustering the data-distribution and then assigning the data to the cluster of its distribution and, in the process, reducing the impact of noise on clustering results. This method involves introducing a new distance among distributions, namely the expectation distance (denoted, ED), that goes beyond the state-of-art distribution distance of optimal mass transport (denoted, $W_2$ for $2$-Wasserstein): The latter essentially depends only on the marginal distributions while the former also employs the information about the joint distributions. Using the ED, the paper extends the classical $K$-means and $K$-medoids clustering to those over data-distributions (rather raw data) and introduces $K$-medoids using $W_2$. The paper also presents the closed-form expressions of the ED distance measure for the case when the uncertainty is Gaussian. The implementation results of the proposed ED and the $W_2$ distance measures to cluster real-world weather data are also presented, which involves efficiently extracting and using underlying uncertainty information in the form of means and variances (that, for example, is adequate to characterize Gaussian distributions). The results show striking performance improvement over classical clustering of raw data, with higher accuracy realized for ED. This is because while $W_2$ employs only the marginal distributions ignoring the correlations, the proposed ED also uses the joint distributions factoring the correlations into the distance measures.
    Integral Probability Metrics PAC-Bayes Bounds. (arXiv:2207.00614v2 [stat.ML] UPDATED)
    We present a PAC-Bayes-style generalization bound which enables the replacement of the KL-divergence with a variety of Integral Probability Metrics (IPM). We provide instances of this bound with the IPM being the total variation metric and the Wasserstein distance. A notable feature of the obtained bounds is that they naturally interpolate between classical uniform convergence bounds in the worst case (when the prior and posterior are far away from each other), and preferable bounds in better cases (when the posterior and prior are close). This illustrates the possibility of reinforcing classical generalization bounds with algorithm- and data-dependent components, thus making them more suitable to analyze algorithms that use a large hypothesis space.
    Instance-optimal PAC Algorithms for Contextual Bandits. (arXiv:2207.02357v1 [stat.ML])
    In the stochastic contextual bandit setting, regret-minimizing algorithms have been extensively researched, but their instance-minimizing best-arm identification counterparts remain seldom studied. In this work, we focus on the stochastic bandit problem in the $(\epsilon,\delta)$-$\textit{PAC}$ setting: given a policy class $\Pi$ the goal of the learner is to return a policy $\pi\in \Pi$ whose expected reward is within $\epsilon$ of the optimal policy with probability greater than $1-\delta$. We characterize the first $\textit{instance-dependent}$ PAC sample complexity of contextual bandits through a quantity $\rho_{\Pi}$, and provide matching upper and lower bounds in terms of $\rho_{\Pi}$ for the agnostic and linear contextual best-arm identification settings. We show that no algorithm can be simultaneously minimax-optimal for regret minimization and instance-dependent PAC for best-arm identification. Our main result is a new instance-optimal and computationally efficient algorithm that relies on a polynomial number of calls to an argmax oracle.
    Conditional Distribution Function Estimation Using Neural Networks for Censored and Uncensored Data. (arXiv:2207.02384v1 [stat.ME])
    Most work in neural networks focuses on estimating the conditional mean of a continuous response variable given a set of covariates.In this article, we consider estimating the conditional distribution function using neural networks for both censored and uncensored data. The algorithm is built upon the data structure particularly constructed for the Cox regression with time-dependent covariates. Without imposing any model assumption, we consider a loss function that is based on the full likelihood where the conditional hazard function is the only unknown nonparametric parameter, for which unconstraint optimization methods can be applied. Through simulation studies, we show the proposed method possesses desirable performance, whereas the partial likelihood method and the traditional neural networks with $L_2$ loss yield biased estimates when model assumptions are violated. We further illustrate the proposed method with several real-world data sets. The implementation of the proposed methods is made available at https://github.com/bingqing0729/NNCDE.

  • Open

    7+ Best Books to Learn Neural Networks in 2022 for Beginners (Updated)
    submitted by /u/Lakshmireddys [link] [comments]  ( 83 min )
    What are artificial intelligences that can automatically edit music, images, texts, beats in some way?
    submitted by /u/xXNOdrugsForMEXx [link] [comments]  ( 84 min )
    I got some midjourney invites left !
    I don’t got any friends to give the invites to so who needs one! submitted by /u/projhect-AI [link] [comments]  ( 84 min )
    Is there an app/site/software that uses AI image recognition to organize images by similarity? I'm looking to sort a bunch of dall-e images
    Tried to explain as much as possible in the title. I did a "run" of DALL-E and I have already used photoshop's macros to crop each of them in a different file bc I feel like there's an interesting experience in watching it go through similar but different iteractions, but I would like it to be sorted by similarity to make the most impact. Can any of you recommend me a way to do that? The first result I found in google pinged the antivirus so I felt like getting recommendations was the way to go. Here's an example of that kind of images I'm talking about https://imgur.com/a/miG2WWZ submitted by /u/quiteawhile [link] [comments]  ( 84 min )
    Elon Musk: "I hope that the AI is nice to us ... I've lost a lot of sleep thinking about AI as an existential risk ... I think there should probably should be a regulatory agency that oversees advanced AI, because it's a public safety risk." (2-minute clip)
    submitted by /u/Farnectarine4825 [link] [comments]  ( 84 min )
    AI Dream 61 - EPIC Nebula Exploration by AI
    submitted by /u/LordPewPew777 [link] [comments]  ( 84 min )
    Meta's latest open source AI can translate 200 languages
    submitted by /u/much_successes [link] [comments]  ( 85 min )
    Want to animate your photos from midjourney in 3D, high resolution 4k? Check out my new tutorial!
    submitted by /u/nalr00n [link] [comments]  ( 84 min )
    No Language Left Behind: Translating 200 languages with a single model - by Meta AI
    submitted by /u/OnlyProggingForFun [link] [comments]  ( 84 min )
  • Open

    7+ Best Books to Learn Neural Networks in 2022 for Beginners (Updated)
    submitted by /u/Lakshmireddys [link] [comments]  ( 84 min )
    A Tutorial on Using Using Neural Style PT to Transfer the Style of One Image to Another
    View the tutorial here: HERE This tutorial teaches you how to transfer the style of one image to another image using neural-style-pt. Below is a imgur gallery showing off the transformation process. https://imgur.com/gallery/iMlkkQi Let me know if you have any questions or comments. submitted by /u/mshriver2 [link] [comments]  ( 84 min )
  • Open

    Break through language barriers with Amazon Transcribe, Amazon Translate, and Amazon Polly
    Imagine a surgeon taking video calls with patients across the globe without the need of a human translator. What if a fledgling startup could easily expand their product across borders and into new geographical markets by offering fluid, accurate, multilingual customer support and sales, all without the need of a live human translator? What happens […]  ( 10 min )
  • Open

    Dijkstra extends Pythagoras
    Suppose a triangle has sides a, b, and c. Label the angles opposite these three sides α, β, and γ respectively. Edsger Dijkstra published (EWD975-0) a note proving the following extension of the Pythagorean theorem: sgn(α + β – γ) = sgn(a² + b² – c²). Here the sgn function is -1, 0, or 1 […] Dijkstra extends Pythagoras first appeared on John D. Cook.  ( 4 min )
  • Open

    [D] How would you measure the correlation of the gradient across iterations?
    One simple thing one could do is take the dot product between the current and the n-1 gradient. But this will of course not be very meaningful as what really matters is a (sort-of) average correlation across several iterations, which will not be revealed from doing such a local comparison (using gradients from step n and n-1). Ideally it would be a calculation that would not require keeping around old gradients. Any ideas? submitted by /u/fasttosmile [link] [comments]  ( 85 min )
    [D] Handling OOV in sequence generation
    What are some methods to handle OOV words when generating sequences? For example for some n-gram implementations, I've seen all tokens removed from the candidate list of words to be sampled from given the prior n-gram, and if there are no other candidates the generated text is ended. Curious to learn about some other methods to deal with OOV. submitted by /u/MLJungle [link] [comments]  ( 85 min )
    [R] CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
    Paper: https://arxiv.org/pdf/2207.01780.pdf Github: https://github.com/salesforce/CodeRL Abstract: Program synthesis or code generation aims to generate a program that satisfies a problem specification. Recent approaches using large-scale pretrained language models (LMs) have shown promising results, yet they have some critical limitations. In particular, they often follow a standard supervised fine-tuning procedure to train a code generation model only from the pairs of natural-language problem descriptions and ground-truth programs. Such paradigm largely ignores some important but potentially useful signals in the problem specification such as unit tests, which thus often results in poor performance when solving complex unseen coding tasks. To address the limitations, we propose "CodeRL", a new framework for program synthesis tasks through pretrained LMs and deep reinforcement learning (RL). Specifically, during training, we treat the code-generating LM as an actor network, and introduce a critic network that is trained to predict the functional correctness of generated programs and provide dense feedback signals to the actor. During inference, we introduce a new generation procedure with a critical sampling strategy that allows a model to automatically regenerate programs based on feedback from example unit tests and critic scores. For the model backbones, we extended the encoder-decoder architecture of CodeT5 with enhanced learning objectives, larger model sizes, and better pretraining data. Our method not only achieves new SOTA results on the challenging APPS benchmark, but also shows strong zero-shot transfer capability with new SOTA results on the simpler MBPP benchmark. https://preview.redd.it/goglny8a30a91.jpg?width=1218&format=pjpg&auto=webp&s=a6f50319637cf85fed2de1d08b407478f6a227aa https://preview.redd.it/vav9glra30a91.jpg?width=1234&format=pjpg&auto=webp&s=19ef106847c090fab438338fad912f1afd75db1a submitted by /u/Singularian2501 [link] [comments]  ( 86 min )
    [D] Why aren't there much people working on causal machine learning?
    It seems Judea Pearl, Yoshua Bengio, Elias Bareinboim and a handful of other researchers are only people who are working on causal inference and machine learning. Is causal machine learning still a niche field? Also, do you know any researcher working on causal machine learning at Berkeley? submitted by /u/After_Philosopher572 [link] [comments]  ( 87 min )
    [D] Object Detection trained on simulated renderings unable to converge on real images - why?
    I wrote a program in Unity that generated millions of fake images using the HDRP rendering pipeline. For starters I only want to detect a bottle of "ITO EN" ice-tea. Here is an example (left is real, right is the fake rendering). I have a simple 3 layer resnet CNN with 3 blocks each, and use a Global Average Pooling layer at the end to visualize the detection. Using the simulation dataset only I get an accuracy of 97% or higher. Using the real dataset I only get ~70% accuracy. I wanna add that this is not a result of over-training, (a) because I use validation set and stop training if it hasn't improved and d (b) the test set performs very well. This is infuriating, because the image dataset is extremely diverse and I use a ton of image transformations in order to provide a very high level of diversity. I also use various levels of lighting, bloom, camera exposures, motion blur, changing materials for all assets, as well as changing the properties for the target (the bottle), such as glossiness, reflection, emissive lighting, and so on. Here is an example for the rendered dataset that is used for training, and here is an example for the real dataset. Anyone got an idea why this isn't working out? submitted by /u/tmuxed [link] [comments]  ( 88 min )
    [D] How to correctly transform Cityscapes Masks to Bounding Boxes?
    As the title suggests, I would like to know the correct way to pre-process the cityscapes dataset for object detection. There are multiple ways how this can be done. There is a version in Detectron2, in MM Detection, there is this. Which one is the correct way, without getting errors in the labels? Anybody worked with this before? Would be glad if anybody might have an idea. submitted by /u/SeucheAchat9115 [link] [comments]  ( 86 min )
    [P] Tutorial: Serverless MLOps pipelines with Vertex AI and ZenML
    At ZenML, we created a guide to easily run MLOps pipelines on Google Cloud Platform with Vertex AI. I thought I'd share it here because I think it might be useful for people who are just starting MLOps on GCP. Blog post: https://blog.zenml.io/vertex-ai-blog/ Full video: https://youtu.be/qgvmvexGv_c Why is this better than going through the Vertex AI SDK? ZenML steps and pipeline can be written with a simple decorator pattern that is easily approachable for a #datascientist. ZenML takes cares of storing and versioning pythonic objects between steps of a ZenML provides first-class integrations into other MLOps tools that you can leverage natively in your pipelines. For example, you can track experiments on MLFlow easily. ZenML pipelines can be run locally first, and then deployed instantly. You can run a ZenML pipeline not only on Vertex, but also #Airflow, #Kubeflow, #Kubernetes, or whereever else you'd like!📷Watch the full video: https://www.youtube.com/watch?v=qgvmvexGv_c&ab_channel=ZenML I bet the GCP Vertex AI folk here might like the above video. It isn't just about ZenML either but more of a broader look into the different components that go into running ML in production on GCP (Container registry, Cloud Storage, Secret Manager, Vertex, Cloud SQL) Would love to hear more feedback on the video or blog! submitted by /u/htahir1 [link] [comments]  ( 86 min )
    [P] Some new Sherlock Holmes stories (GPT-3)
    I thought I share some of the Sherlock Holmes stories I created with various prompts using GPT-3. Might be fun for some fans, but overall the stories are all a bit superficial, although some of them made me laugh (the fourth one in particular). ​ John Watson was having a cup of tea in his flat when he heard a knock at the door. He got up to answer it, and found Sherlock Holmes standing there, looking rather grave. "Watson, I'm afraid I have some bad news," said Sherlock. "I've just been to Baker Street, where I found Mrs. Hudson in a state of hysterics. It seems that a Mr. Bartholomew Jones was found dead in his study, and Mrs. Hudson is convinced that it was murder." "That is rather shocking," said Watson. "Do you have any idea who might have done it?" "I have some suspicions," …  ( 100 min )
  • Open

    MLGO: A Machine Learning Framework for Compiler Optimization
    Posted by Yundi Qian, Software Engineer, Google Research and Mircea Trofin, Software Engineer, Google Core The question of how to compile faster and smaller code arose together with the birth of modem computers. Better code optimization can significantly reduce the operational cost of large datacenter applications. The size of compiled code matters the most to mobile and embedded systems or software deployed on secure boot partitions, where the compiled binary must fit in tight code size budgets. With advances in the field, the headroom has been heavily squeezed with increasingly complicated heuristics, impeding maintenance and further improvements. Recent research has shown that machine learning (ML) can unlock more opportunities in compiler optimization by replacing complicated heuri…  ( 25 min )
  • Open

    "Offline RL Policies Should be Trained to be Adaptive", Ghosh et al 2022
    submitted by /u/gwern [link] [comments]  ( 84 min )
  • Open

    Art by Artificial Intelligence: AI Generated Paintings
    AI has brought a new life to art.  ( 7 min )
    Your Predictions Are Only As Good As Your Data
    Testing Data Vs Training Data In Machine Learning  ( 14 min )
2022-08-05T01:12:42.761Z osmosfeed 1.15.1